<a href="https://www.kaggle.com/code/akarshu121/document-image-classification-with-docformer?scriptVersionId=97796815" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<img src="https://images.unsplash.com/photo-1532153975070-2e9ab71f1b14?ixlib=rb-1.2.1&dl=annie-spratt-5cFwQ-WMcJU-unsplash.jpg&w=1920&q=80&fm=jpg&crop=entropy&cs=tinysrgb">


## 1. Introduction: 
* This notebook is a tutorial to the multi-modal architecture DocFormer (mainly for the purpose of Document Understanding).
* We would take in, the test-images of the RVL-CDIP Dataset, and then would train the model on a subset of the dataset
* We would also be logging the metrics with the help of Weights and Biases

## A small Introduction about the Model:

<img src = "https://github.com/uakarsh/docformer/raw/master/images/docformer-architecture.png">

DocFormer is a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). In addition, DocFormer is pre-trained in an unsupervised fashion using carefully designed tasks which encourage multi-modal interaction. DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer. DocFormer also shares learned spatial embeddings across modalities which makes it easy for the model to correlate text to visual tokens and vice versa. DocFormer is evaluated on 4 different datasets each with strong baselines. DocFormer achieves state-of-the-art results on all of them, sometimes beating models 4x its size (in no. of parameters).

For more understanding of the model and its code implementation, one can visit [here](https://github.com/uakarsh/docformer). So, let us go on to see what this model has to offer

The report for this entire run is attached [here](https://wandb.ai/iakarshu/RVL%20CDIP%20with%20DocFormer%20New%20Version/reports/Performance-of-DocFormer-with-RVL-CDIP-Test-Dataset--VmlldzoyMTI3NTM4)

<img src = "https://drive.google.com/u/1/uc?id=1IOyYXbU8bi5FDq59Z4RI1Qkoc54CzZto&export=download" >




### An Interactive Demo for the same can be found on 🤗 space [here](https://huggingface.co/spaces/iakarshu/docformer_for_document_classification)

### Installing the Libraries ⚙️:

In [60]:
## Installing the dependencies (might take some time)

!pip3 install -q pytesseract
!sudo apt install  -q tesseract-ocr
!pip install  -q transformers
!pip install  -q pytorch-lightning
!pip install  -q einops
!pip install  -q tqdm
!pip install  -q 'Pillow==7.1.2'
!pip install  -q datasets
!pip install wandb
!pip install torchmetrics

[31m  ERROR: Command errored out with exit status 1:
   command: /home/ec2-user/anaconda3/envs/pytorch_p38/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-e6f4huu9/tesseract-ocr_f319f768741c40bdb9f8b04772b4f79c/setup.py'"'"'; __file__='"'"'/tmp/pip-install-e6f4huu9/tesseract-ocr_f319f768741c40bdb9f8b04772b4f79c/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-wrxbms8t
       cwd: /tmp/pip-install-e6f4huu9/tesseract-ocr_f319f768741c40bdb9f8b04772b4f79c/
  Complete output (18 lines):
  running bdist_wheel
  running build
  running build_py
  file tesseract_ocr.py (for module tesseract_ocr) not found
  file tesseract_ocr.py (for module tesseract_ocr) not found
  running build_ext
  buildi

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


In [5]:
## Cloning the repository
!git clone https://github.com/uakarsh/docformer.git

Cloning into 'docformer'...
remote: Enumerating objects: 1278, done.[K
remote: Counting objects: 100% (230/230), done.[K
remote: Compressing objects: 100% (179/179), done.[K
remote: Total 1278 (delta 173), reused 47 (delta 45), pack-reused 1048[K
Receiving objects: 100% (1278/1278), 4.36 MiB | 7.82 MiB/s, done.
Resolving deltas: 100% (686/686), done.


In [1]:
## Logging into wandb

import wandb
# from kaggle_secrets import UserSecretsClient
# user_secrets = UserSecretsClient()
# secret_value_0 = user_secrets.get_secret("wandb_api")
# wandb.login(key=secret_value_0)

## 2. Libraries 📘:

In [2]:
## Importing the libraries

import warnings
warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", RuntimeWarning)

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.utils.data import Dataset,DataLoader

import torch.nn.functional as F
import torchvision.models as models

## Adding the path of docformer to system path
import sys
sys.path.append('/home/ec2-user/docformer/src/docformer/')

## Importing the functions from the DocFormer Repo
from dataset import create_features
from modeling import DocFormerEncoder,ResNetFeatureExtractor,DocFormerEmbeddings,LanguageFeatureExtractor
from transformers import BertTokenizerFast

In [3]:
## Hyperparameters

seed = 42
target_size = (500, 384)

## Setting some hyperparameters

device = 'cuda' if torch.cuda.is_available() else 'cpu'

## One can change this configuration and try out new combination
config = {
  "coordinate_size": 96,              ## (768/8), 8 for each of the 8 coordinates of x, y
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "image_feature_pool_shape": [7, 7, 256],
  "intermediate_ff_size_factor": 4,
  "max_2d_position_embeddings": 1024,
  "max_position_embeddings": 128,
  "max_relative_positions": 8,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "shape_size": 96,
  "vocab_size": 30522,
  "layer_norm_eps": 1e-12,
}

## A small note 🗒️: 
Here, for the purpose of Demo I would be using only 250 Images per class, and would train the model on it. Definintely for a data hungry model such as transformers, such a small data is not enough, but let us see what are the results on it.

In [4]:
from tqdm.auto import tqdm
import json
  
# Opening JSON file
f = open('/home/ec2-user/data_page.json')
data_map = json.load(f)

## For the purpose of prediction
id2label = []
label2id = {}

curr_class = 0
## Preparing the Dataset
img_path = '/home/ec2-user/original_images/'
dict_of_img_labels = {'img':[], 'label':[]}

max_sample_per_class = 10000

for img in tqdm(os.listdir(img_path)):
    try:
        label = data_map[img]
    except:
        continue
    
    count = 0
    if label not in label2id:
        label2id[label] = curr_class
        curr_class+=1
        id2label.append(label)
        
#     for img in os.listdir(img_path):
# #         if count>max_sample_per_class:
#             break
            
    curr_img_path = os.path.join(img_path, img)
    dict_of_img_labels['img'].append(curr_img_path)
    dict_of_img_labels['label'].append(label2id[label])
    count+=1

  0%|          | 0/789582 [00:00<?, ?it/s]

In [5]:
len(dict_of_img_labels['label'])
# curr_class

505962

In [6]:
dict_of_img_labels['label']

[0,
 1,
 2,
 1,
 0,
 3,
 4,
 5,
 1,
 6,
 7,
 1,
 1,
 8,
 1,
 0,
 9,
 1,
 1,
 10,
 1,
 11,
 1,
 12,
 9,
 2,
 1,
 6,
 1,
 1,
 13,
 14,
 1,
 1,
 1,
 1,
 3,
 15,
 1,
 3,
 6,
 1,
 16,
 17,
 17,
 1,
 18,
 2,
 2,
 0,
 12,
 1,
 13,
 1,
 1,
 1,
 9,
 12,
 8,
 1,
 19,
 12,
 1,
 12,
 11,
 12,
 4,
 1,
 6,
 1,
 11,
 20,
 1,
 1,
 1,
 11,
 12,
 18,
 6,
 15,
 8,
 1,
 21,
 2,
 1,
 22,
 4,
 15,
 23,
 8,
 1,
 1,
 22,
 15,
 24,
 7,
 1,
 1,
 9,
 1,
 6,
 1,
 1,
 16,
 6,
 4,
 13,
 1,
 12,
 8,
 1,
 1,
 4,
 3,
 11,
 1,
 25,
 1,
 1,
 3,
 21,
 1,
 8,
 10,
 2,
 1,
 12,
 13,
 3,
 12,
 6,
 1,
 7,
 12,
 1,
 1,
 17,
 26,
 1,
 1,
 4,
 13,
 12,
 1,
 12,
 16,
 1,
 12,
 4,
 9,
 15,
 1,
 1,
 17,
 12,
 1,
 8,
 2,
 27,
 1,
 1,
 12,
 28,
 1,
 12,
 1,
 19,
 29,
 1,
 30,
 13,
 1,
 1,
 1,
 28,
 9,
 17,
 1,
 12,
 1,
 1,
 8,
 1,
 31,
 20,
 4,
 17,
 1,
 29,
 1,
 0,
 1,
 1,
 20,
 30,
 1,
 7,
 20,
 16,
 13,
 11,
 8,
 1,
 6,
 11,
 1,
 3,
 13,
 1,
 1,
 1,
 1,
 1,
 1,
 24,
 2,
 1,
 15,
 6,
 16,
 6,
 18,
 1,
 28,
 17,
 2,
 1,
 28,
 3,
 3

In [7]:
len(data_map)

516495

In [8]:
os.listdir('/home/ec2-user/original_images/')

['562c1754-3229-49a3-ba16-6e843b3ad55b_PastBills_893508468.jpg',
 'apt.deb',
 '079c9fad-c93e-4605-b30f-2c3a24b0f867_14971611189861768560719_565621397.jpg',
 '56098b48-05fa-40c7-94c2-9fa9bb810c66_XfinityBill_551848210.jpg',
 '1cb47b04-ea6e-4ac9-b6a1-9d2b7e72b567_11548854450._CB508893070_1_879901058.jpg',
 '344158b4-6be4-49eb-adef-66477bb5ed8c_20170706_084226.jpg',
 'e88c2cee-94ae-4cd8-bb80-7125bf9c61ed_billingStatement-current_83968190.jpg',
 '2378e864-2493-40ac-b865-abf5006c38b4_JPEG_20170819_17451139019528.jpg',
 '9c29b2f9-6824-4e57-b118-44785c2a53be_PastBills-4_814092232.jpg',
 '4f2fa1f6-2780-41d8-b7ee-06b38ab053e3_2017-08-18-18-14-18-549.jpg',
 '5380830a-eaa8-4e56-9644-e6c568270d1d_1497265918578563915785_566519453.jpg',
 '3e9479aa-d24a-4b7d-ae36-81dd45a6deb9_20170806_041025_Film1-1.jpg',
 '2a3aa45f-a682-4054-9f49-c82cb884d9dd_PastBills_675802323.jpg',
 '32d8447a-9daf-42b7-a56f-a7d1e802c363_ECUUtilityBill_763320697.jpg',
 '56ca3269-4899-41c5-b5ab-4c2fc4eefcea_20170610_124841.jpg',
 '

In [9]:
import pandas as pd
df = pd.DataFrame(dict_of_img_labels)

In [10]:
from sklearn.model_selection import train_test_split as tts
train_df, valid_df = tts(df, random_state = seed, stratify = df['label'], shuffle = True)

In [11]:
train_df = train_df.reset_index().drop(columns = ['index'], axis = 1)
valid_df = valid_df.reset_index().drop(columns = ['index'], axis = 1)

## 3. Making the dataset 💽:

The main idea behind making the dataset is, to pre-process the input into a given format, and then provide the input to the model. So, simply just the image path, and the other configurations, and boom 💥, you would get the desired pre-processed input

In [12]:
## Creating the dataset

class RVLCDIPData(Dataset):
    
    def __init__(self, image_list, label_list, target_size, tokenizer, max_len = 512, transform = None):
        
        self.image_list = image_list
        self.label_list = label_list
        self.target_size = target_size
        self.tokenizer = tokenizer
        self.max_len = max_len
        self.transform = transform
        
    def __len__(self):
        return len(self.image_list)
    
    def __getitem__(self, idx):
        img_path = self.image_list[idx]
        label = self.label_list[idx]
        
        ## More on this, in the repo mentioned previously
        final_encoding = create_features(
            img_path,
            self.tokenizer,
            add_batch_dim=False,
            target_size=self.target_size,
            max_seq_length=self.max_len,
            path_to_save=None,
            save_to_disk=False,
            apply_mask_for_mlm=False,
            extras_for_debugging=False,
            use_ocr = True
    )
        if self.transform is not None:
            ## Note that, ToTensor is already applied on the image
            final_encoding['resized_scaled_img'] = self.transform(final_encoding['resized_scaled_img'])
        
        
        keys_to_reshape = ['x_features', 'y_features', 'resized_and_aligned_bounding_boxes']
        for key in keys_to_reshape:
            final_encoding[key] = final_encoding[key][:self.max_len]
            
        final_encoding['label'] = torch.as_tensor(label).long()
        return final_encoding

In [13]:
## Defining the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")

In [14]:
from torchvision import transforms

## Normalization to these mean and std (I have seen some tutorials used this, and also in image reconstruction, so used it)
transform = transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                              

In [15]:
train_ds = RVLCDIPData(train_df['img'].tolist(), train_df['label'].tolist(),
                      target_size, tokenizer, config['max_position_embeddings'], transform)
val_ds = RVLCDIPData(valid_df['img'].tolist(), valid_df['label'].tolist(),
                      target_size, tokenizer,config['max_position_embeddings'],  transform)

### Collate Function:

Definitely collate function is an amazing function for using the dataloader as per our wish. More on collate function can be known from [here](https://stackoverflow.com/questions/65279115/how-to-use-collate-fn-with-dataloaders)

In [16]:
def collate_fn(data_bunch):

  '''
  A function for the dataloader to return a batch dict of given keys

  data_bunch: List of dictionary
  '''

  dict_data_bunch = {}

  for i in data_bunch:
    for (key, value) in i.items():
      if key not in dict_data_bunch:
        dict_data_bunch[key] = []
      dict_data_bunch[key].append(value)

  for key in list(dict_data_bunch.keys()):
      dict_data_bunch[key] = torch.stack(dict_data_bunch[key], axis = 0)

  return dict_data_bunch

## 4. Defining the DataModule 📖

* A datamodule is a shareable, reusable class that encapsulates all the steps needed to process data:

* A DataModule is simply a collection of a train_dataloader(s), val_dataloader(s), test_dataloader(s) and predict_dataloader(s) along with the matching transforms and data processing/downloads steps required.




In [17]:
import pytorch_lightning as pl

class DataModule(pl.LightningDataModule):

  def __init__(self, train_dataset, val_dataset,  batch_size = 4):

    super(DataModule, self).__init__()
    self.train_dataset = train_dataset
    self.val_dataset = val_dataset
    self.batch_size = batch_size

  def train_dataloader(self):
    return DataLoader(self.train_dataset, batch_size = self.batch_size, 
                      collate_fn = collate_fn, shuffle = True)
  
  def val_dataloader(self):
    return DataLoader(self.val_dataset, batch_size = self.batch_size,
                                  collate_fn = collate_fn, shuffle = False)

In [18]:
datamodule = DataModule(train_ds, val_ds)

## 5. Modeling Part 🏎️

1. Firstly, we would define the pytorch model with our configurations, in which the class labels would be ranging from 0 to 15
2. Secondly, we would encode it in the PyTorch Lightening module, and boom 💥 our work of defining the model is done

In [19]:
class DocFormerForClassification(nn.Module):
  
    def __init__(self, config):
      super(DocFormerForClassification, self).__init__()

      self.resnet = ResNetFeatureExtractor(hidden_dim = config['max_position_embeddings'])
      self.embeddings = DocFormerEmbeddings(config)
      self.lang_emb = LanguageFeatureExtractor()
      self.config = config
      self.dropout = nn.Dropout(config['hidden_dropout_prob'])
      self.linear_layer = nn.Linear(in_features = config['hidden_size'], out_features = len(id2label))  ## Number of Classes
      self.encoder = DocFormerEncoder(config)

    def forward(self, batch_dict):

      x_feat = batch_dict['x_features']
      y_feat = batch_dict['y_features']

      token = batch_dict['input_ids']
      img = batch_dict['resized_scaled_img']

      v_bar_s, t_bar_s = self.embeddings(x_feat,y_feat)
      v_bar = self.resnet(img)
      t_bar = self.lang_emb(token)
      out = self.encoder(t_bar,v_bar,t_bar_s,v_bar_s)
      out = self.linear_layer(out)
      out = out[:, 0, :]
      return out

In [20]:
## Defining pytorch lightning model
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import torchmetrics

class DocFormer(pl.LightningModule):

  def __init__(self, config , lr = 5e-5):
    super(DocFormer, self).__init__()
    
    self.save_hyperparameters()
    self.config = config
    self.docformer = DocFormerForClassification(config)
    
    self.num_classes = len(id2label)
    self.train_accuracy_metric = torchmetrics.Accuracy()
    self.val_accuracy_metric = torchmetrics.Accuracy()
    self.f1_metric = torchmetrics.F1Score(num_classes=self.num_classes)
    self.precision_macro_metric = torchmetrics.Precision(
            average="macro", num_classes=self.num_classes
        )
    self.recall_macro_metric = torchmetrics.Recall(
            average="macro", num_classes=self.num_classes
        )
    self.precision_micro_metric = torchmetrics.Precision(average="micro")
    self.recall_micro_metric = torchmetrics.Recall(average="micro")

  def forward(self, batch_dict):
    logits = self.docformer(batch_dict)
    return logits

  def training_step(self, batch, batch_idx):
    logits = self.forward(batch)

    loss = nn.CrossEntropyLoss()(logits, batch['label'])
    preds = torch.argmax(logits, 1)

    ## Calculating the accuracy score
    train_acc = self.train_accuracy_metric(preds, batch["label"])

    ## Logging
    self.log('train/loss', loss,prog_bar = True, on_epoch=True, logger=True, on_step=True)
    self.log('train/acc', train_acc, prog_bar = True, on_epoch=True, logger=True, on_step=True)

    return loss
  
  def validation_step(self, batch, batch_idx):
    logits = self.forward(batch)
    loss = nn.CrossEntropyLoss()(logits, batch['label'])
    preds = torch.argmax(logits, 1)
    
    labels = batch['label']
    # Metrics
    valid_acc = self.val_accuracy_metric(preds, labels)
    precision_macro = self.precision_macro_metric(preds, labels)
    recall_macro = self.recall_macro_metric(preds, labels)
    precision_micro = self.precision_micro_metric(preds, labels)
    recall_micro = self.recall_micro_metric(preds, labels)
    f1 = self.f1_metric(preds, labels)

    # Logging metrics
    self.log("valid/loss", loss, prog_bar=True, on_step=True, logger=True)
    self.log("valid/acc", valid_acc, prog_bar=True, on_epoch=True, logger=True, on_step=True)
    self.log("valid/precision_macro", precision_macro, prog_bar=True, on_epoch=True, logger=True, on_step=True)
    self.log("valid/recall_macro", recall_macro, prog_bar=True, on_epoch=True, logger=True, on_step=True)
    self.log("valid/precision_micro", precision_micro, prog_bar=True, on_epoch=True, logger=True, on_step=True)
    self.log("valid/recall_micro", recall_micro, prog_bar=True, on_epoch=True, logger=True, on_step=True)
    self.log("valid/f1", f1, prog_bar=True, on_epoch=True)
    
    return {"label": batch['label'], "logits": logits}

  def validation_epoch_end(self, outputs):
        labels = torch.cat([x["label"] for x in outputs])
        logits = torch.cat([x["logits"] for x in outputs])
        preds = torch.argmax(logits, 1)

        wandb.log({"cm": wandb.sklearn.plot_confusion_matrix(labels.cpu().numpy(), preds.cpu().numpy())})
        self.logger.experiment.log(
            {"roc": wandb.plot.roc_curve(labels.cpu().numpy(), logits.cpu().numpy())}
        )
        
  def configure_optimizers(self):
    return torch.optim.AdamW(self.parameters(), lr = self.hparams['lr'])

## 6. Summing it up and running the entire procedure 🏃

In [21]:
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
from pytorch_lightning.loggers import WandbLogger

def main():
    datamodule = DataModule(train_ds, val_ds)
    docformer = DocFormer(config)

    checkpoint_callback = ModelCheckpoint(
        dirpath="./models", monitor="valid/loss", mode="min"
    )
    early_stopping_callback = EarlyStopping(
        monitor="valid/loss", patience=3, verbose=True, mode="min"
    )
    
    wandb.init(config=config, project="RVL CDIP with DocFormer New Version")
    wandb_logger = WandbLogger(project="RVL CDIP with DocFormer New Version", entity="iakarshu")
    ## https://www.tutorialexample.com/implement-reproducibility-in-pytorch-lightning-pytorch-lightning-tutorial/
    pl.seed_everything(seed, workers=True)
    trainer = pl.Trainer(
        default_root_dir="logs",
        gpus=(1 if torch.cuda.is_available() else 0),
        max_epochs=1,
        fast_dev_run=False,
        logger=wandb_logger,
        callbacks=[checkpoint_callback, early_stopping_callback],
        deterministic=True
    )
    trainer.fit(docformer, datamodule)

In [22]:
if __name__ == "__main__":
    main()

Some weights of the model checkpoint at microsoft/layoutlm-base-uncased were not used when initializing LayoutLMForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing LayoutLMForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LayoutLMForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LayoutLMForTokenClassification were not initialized from the model checkpoint at microsoft

Global seed set to 42
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                   | Type                       | Params
----------------------------------------------------------------------
0 | docformer              | DocFormerForClassification | 174 M 
1 | train_accuracy_metric  | Accuracy                   | 0     
2 | val_accuracy_metric    | Accuracy                   | 0     
3 | f1_metric              | F1Score                    | 0     
4 | precision_macro_metric | Precision                  | 0     
5 | recall_macro_metric    | Recall                     | 0     
6 | precision_micro_metric | Precision                  | 0     
7 | recall_micro_metric    | Recall                     | 0     
----------------------------------------------------------------------
150 M     Trainable params
23.4

Sanity Checking: 0it [00:00, ?it/s]

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_u2vz50g0.tsv'

## References:

1. [MLOps Repo](https://github.com/graviraja/MLOps-Basics) (For the integration of model and data with PyTorch Lightening) 
2. [PyTorch Lightening Docs](https://pytorch-lightning.readthedocs.io/en/stable/index.html) For all the doubts and bugs
3. [My Repo](https://github.com/uakarsh/docformer) For downloading the model and pre-processing steps
4. Unspash for Images
5. Google for other stuffs

In [26]:
import pytesseract
from PIL import Image

In [None]:
ocr_df = pytesseract.image_to_data(Image.open("/home/ec2-user/original_images/73309166-67db-486e-923d-9c9f6d4bea4f.png"), output_type="data.frame")