<a href="https://colab.research.google.com/github/lisstasy/Receipt_Scanner/blob/main/experiments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**LLM to convert a grocery receipt image to json**

# Project Outline: Receipt Data Extraction and Parsing

      1. Problem Statement:

Develop a system to extract data from receipt images and parse the extracted data into a structured JSON format using Pydantic objects.

      2. Project Components:

  **a. Data Extraction:**

Utilize OCR to extract relevant text information from receipt images.
Implement image preprocessing techniques to enhance OCR accuracy (e.g., resizing, noise reduction, contrast adjustment).

  **b. Data Parsing:**

Define Pydantic models to represent the structured data schema for the receipt.
Write parsers to interpret the extracted text data and map it to the defined Pydantic models.
Implement logic to handle variations in receipt formats (e.g., different layouts, fonts, languages).

  **c. Integration:**

Create a Python script or application to orchestrate the data extraction and parsing process.
Integrate the OCR engine and Pydantic models within the script/application.
Ensure scalability and modularity for potential future enhancements.

    3. Technology Stack:

Python: Utilize Python as the primary programming language for development.
Pydantic: Define and validate data schemas using Pydantic models.
OpenCV or PIL: Employ image processing libraries for image manipulation and preprocessing.
Tesseract OCR: Use Tesseract or other OCR engines for text extraction from images.
JSON: Output the parsed data into a structured JSON format.

    4. Development Workflow:

  **a. Data Collection:**

Gather a diverse dataset of receipt images to train and validate the OCR model.
Annotate the dataset with ground truth labels for supervised learning (optional).
  
  **b. Model Training (if applicable):**

Train or fine-tune the OCR model using the annotated dataset (if applicable).

  **c. Implementation:**

Develop Python functions/classes to handle image preprocessing, OCR, and data parsing.
Define Pydantic models to represent the structured data schema for receipts.
Write unit tests to ensure the correctness of individual components.

  **d. Deployment:**

Deploy the system as a standalone application or integrate it into an existing workflow.
Provide documentation and instructions for users to interact with the system.

    5. Future Enhancements:

Explore deep learning-based approaches for more accurate text extraction from receipts.
Implement natural language processing (NLP) techniques to extract additional semantic information from receipt text.
Develop a user-friendly interface for interacting with the system.

    6. Conclusion:

The project aims to automate the process of extracting and parsing data from receipt images, facilitating tasks such as expense tracking, inventory management, and financial analysis.

# Set up

## Libraries

In [None]:
import warnings, logging
warnings.simplefilter('ignore')
logging.disable(logging.WARNING)

In [None]:
# enhance text display
!pip install rich
from rich import (print, inspect, pretty)
pretty.install()



In [None]:
# Import necessary libraries
from PIL import Image
import numpy as np
import cv2

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install -q torch transformers accelerate bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.6/297.6 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## Images

In [None]:
image_1_path ='/content/drive/MyDrive/receipts/IMG_7386.jpg'
image_2_path = '/content/drive/MyDrive/receipts/IMG_7318.jpg'
image_3_path = '/content/drive/MyDrive/receipts/IMG_7319.jpg'
image_4_path = '/content/drive/MyDrive/receipts/IMG_7320.jpg'
image_5_path = '/content/drive/MyDrive/receipts/IMG_7321.jpg'
image_6_path = '/content/drive/MyDrive/receipts/IMG_7323.jpg'

In [None]:
image_1 = Image.open(image_1_path).convert("RGB")
image_2 = Image.open(image_2_path).convert("RGB")
image_3 = Image.open(image_3_path).convert("RGB")

image_cv1= cv2.imread(image_1_path)
image_cv2= cv2.imread(image_2_path)
image_cv3= cv2.imread(image_3_path)
image_cv4= cv2.imread(image_4_path)
image_cv5= cv2.imread(image_5_path)
image_cv6= cv2.imread(image_6_path)

#image_array = np.array(image.convert('RGB'))
#display(image.resize((300,400)))


# Schemas


In [None]:
from pydantic import BaseModel, Field
from typing import List, Union
from datetime import date, time
from enum import Enum

In [None]:
##Schema without description

class ProductCategory(str, Enum):
    fruits = 'fruits'
    vegetables = 'vegetables'
    protein_foods = 'protein_foods'
    dairy = 'dairy'
    grains = 'grains'
    nuts_and_seeds = 'nuts_and_seeds'
    beverages = 'beverages'
    snacks = 'snacks'
    condiments = 'condiments'
    frozen_foods = 'frozen_foods'
    bakery = 'bakery'
    canned_goods = 'canned_goods'
    household = 'household'
    personal_care = 'personal_care'
    pet_supplies = 'pet_supplies'
    other = 'other'

class Item(BaseModel):
    name: str
    unit: float
    price: float
    amount: float
    category: ProductCategory

class PaymentMethodEnum(str, Enum):
    tarjeta = 'tarjeta'
    efectivo = 'efectivo'

class Receipt(BaseModel):
    store: str
    address: str
    city: str
    phone: str
    receipt_no: str
    date: str
    time: str
    items: List[Item]
    total: float
    number_items: int
    payment_method: PaymentMethodEnum


In [None]:
##Schema with description

class ItemInfo(BaseModel):
    name: str = Field(..., description="Name of the item")
    unit: float = Field(..., description="Quantity of the item")
    price: float = Field(..., description="Price per unit of the item")
    amount: float = Field(..., description="Total amount for the item")

class PaymentMethodEnum(str, Enum):
    tarjeta = 'tarjeta'
    efectivo = 'efectivo'

class ReceiptInfo(BaseModel):
    store: str = Field(..., description="Store name")
    address: str = Field(..., description="Address of the store")
    city: str = Field(..., description="City where the store is located")
    phone: str = Field(..., description="Phone number of the store")
    receipt_no: str = Field(..., description="Receipt number")
    date: str = Field(..., description="Date of the receipt in DD/MM/YYYY format")
    time: str = Field(..., description="Time of the transaction")
    items: List[ItemInfo] = Field(..., description="List of items purchased")
    total: float = Field(..., description="Total amount of the receipt")
    number_items: int = Field(..., description="Number of items in the receipt")
    payment_method: PaymentMethodEnum = Field(..., description="Payment method used")


# Examples

In [None]:
example_cat_1= {
    "store": "HiperDino",
    "address": "9238-SD Bernardo de la torre",
    "city": "Tafira Baja",
    "phone": "928493638",
    "receipt_no": "2024/923813-00060866",
    "date": "15/04/2024",
    "time": "16:01",
    "items": [
        {"name": "FRESA TARINA 500 GR", "unit": 1, "price": 1.59, "amount": 1.59, "category": "fruits"},
        {"name": "HIPERDINO ACEITUNA R/ANCHOA LATA 350", "unit": 1, "price": 0.95, "amount": 0.95, "category": "canned_goods"},
        {"name": "DESPERADOS CERVEZA TOQUE TEQUILA BOT", "unit": 1, "price": 1.05, "amount": 1.05, "category": "beverages"},
        {"name": "HIPERDINO CENTRO JAMON SERRANO BODEG", "unit": 0.310, "price": 13.62, "amount": 4.22, "category": "protein_foods"},
        {"name": "MONTESANO JAMON COCIDO SELECCION KG", "unit": 0.308, "price": 8.74, "amount": 2.15, "category": "protein_foods"}
    ],
    "total": 9.96,
    "number_items": 5,
    "payment_method": "tarjeta"
}


example_cat_2 = {
    "store": "SPAR TAFIRA",
    "address": "C/. Bruno Naranjo DIAZ 9A-9B",
    "city": "Tafira Baja",
    "phone": "928 351 616",
    "receipt_no": "014\\002-18965",
    "date": "06/04/2024",
    "time": "15:23",
    "items": [
        {"name": "CLIPPER MANZ.1.5L.", "unit": 1, "price": 1.49, "amount": 1.49, "category": "beverages"},
        {"name": "PLATANO PRIMERA GR", "unit": 1.40, "price": 1.99, "amount": 2.79, "category": "fruits"},
        {"name": "MANZANA PINK LADY GR", "unit": 1, "price": 2.99, "amount": 2.99, "category": "fruits"},
        {"name": "SALSA.BARI.PES.GEN.1", "unit": 1, "price": 3.10, "amount": 3.10, "category": "canned_goods"},
        {"name": "GOFIO B.LUGAR MIL.FU", "unit": 1, "price": 1.85, "amount": 1.85, "category": "grains"},
        {"name": "ZUM.DISF.D.SIMON PIN", "unit": 1, "price": 1.75, "amount": 1.75, "category": "beverages"},
        {"name": "LECHE.GRNJ.FLR.UHT.", "unit": 1, "price": 1.15, "amount": 1.15, "category": "dairy"}
    ],
    "total": 15.12,
    "number_items": 7,
    "payment_method": "tarjeta"
}

example_cat_3= {
  "store": "SPAR TAFIRA",
  "address": "C/. BRUNO NARANJO DIAZ 9A-9B",
  "city": "TAFIRA BAJA",
  "phone": "928 351 616",
  "receipt_no": "014\001-42453",
  "date": "08/04/2024",
  "time": "10:47",
  "items": [
    {"name": "LECHE.GRNJ.FLR.UHT", "unit": 1, "price": 1.15, "amount": 1.15, "category": "dairy"},
    {"name": "PUERROS GRANEL", "unit": 0.425, "price": 2.99, "amount": 1.27, "category": "vegetables"},
    {"name": "HUEVOS FRESCOS 12U", "unit": 1, "price": 2.99, "amount": 2.99, "category": "protein_foods"},
    {"name": "ESPINACAS SPAR", "unit": 1, "price": 1.15, "amount": 1.15, "category": "vegetables"},
    {"name": "AGUA YUGUINAT NAT.8L", "unit": 1, "price": 1.49, "amount": 1.49, "category": "beverages"}
  ],
  "total": 8.05,
  "number_items": 5,
  "payment_method": "tarjeta"
}

example_cat_4 = {
    "store": "MERCADONA",
    "address": "AVDA. PINTOR FELO MONZON (C.C. 7 PALMAS) S/N",
    "city": "35019 LAS PALMAS DE GRAN CANARIA",
    "phone": "928411755",
    "receipt_no": "2185-013-6970Z2",
    "date": "03/04/2024",
    "time": "21:22",
    "items": [
        { "name": "DETERG HIPO COLONIA", "unit": 1, "price": 3.30, "amount": 3.30, "category": "household"},
        { "name": "SOLOMILLO POLLO CONG", "unit": 3, "price": 4.50, "amount": 13.50, "category": "protein_foods"},
        { "name": "JAMONCITO BARBACOA", "unit": 1, "price": 2.32, "amount": 2.32, "category": "protein_foods"},
        { "name": "JAMONCITO BARBACOA", "unit": 1, "price": 2.76, "amount": 2.76, "category": "protein_foods"},
        { "name": "NUEZ NATURAL", "unit": 1, "price": 2.00, "amount": 2.00, "category": "nuts_and_seeds"},
        { "name": "QUESU COTIAGE", "unit": 2, "price": 1.25, "amount": 2.50, "category": "dairy"},
        { "name": "POLLO ENTERO LIMPIO", "unit": 1, "price": 6.52, "amount": 6.52, "category": "protein_foods"},
        { "name": "PAPEL VEGETAL 30H", "unit": 1, "price": 1.70, "amount": 1.70, "category": "household"},
        { "name": "BEBIDA AVELLANAS", "unit": 1, "price": 1.30, "amount": 1.30, "category": "beverages"},
        { "name": "INFUSION DORMIR", "unit": 1, "price": 1.05, "amount": 1.05, "category": "beverages"},
        { "name": "LECHE DE COCO", "unit": 1, "price": 1.40, "amount": 1.40, "category": "beverages"},
        { "name": "QUESO UNTAR LIGHT", "unit": 1, "price": 1.35, "amount": 1.35, "category": "dairy"},
        { "name": "RULITO CABRA", "unit": 1, "price": 2.45, "amount": 2.45, "category": "dairy"},
        { "name": "GRIEGO LIGERO", "unit": 1, "price": 1.65, "amount": 1.65, "category": "dairy"},
        { "name": "BOLSA PLASTICO", "unit": 1, "price": 0.15, "amount": 0.15, "category": "household"}
        ],
    "total": 43.95,
    "number_items": 15,
    "payment_method": "tarjeta"
    }


In [None]:
example_cat_5 = {
    'store': 'MERCADONA',
    'address': 'C/ Republica Dominicana S/N, 35010 Las Palmas de Gran Canaria',
    'city': 'Las Palmas de Gran Canaria',
    'phone': '928226288',
    'receipt_no': '2109-017-467040',
    'date': '06/04/2024',
    'time': '19:56',
    'items': [
        {
            'name': 'CERVEZA NEGRA P-6',
            'unit': 1,
            'price': 5.4,
            'amount': 5.4,
            'category': 'beverages'
        },
        {'name': 'BOLSA PAPEL', 'unit': 1, 'price': 0.1, 'amount': 0.1, 'category': 'household'},
        {
            'name': 'CHICLES MENTA FUERTE',
            'unit': 1,
            'price': 0.95,
            'amount': 0.95,
            'category': 'snacks'
        },
        {'name': 'HUMMUS CLASICO', 'unit': 1, 'price': 1.05, 'amount': 1.05, 'category': 'snacks'},
        {'name': 'KEFIRFRESA-PLATANO', 'unit': 1, 'price': 0.9, 'amount': 0.9, 'category': 'dairy'},
        {
            'name': 'ROLLON INVIS.HOMBRE',
            'unit': 1,
            'price': 0.85,
            'amount': 0.85,
            'category': 'personal_care'
        },
        {'name': 'CHIA', 'unit': 1, 'price': 1.5, 'amount': 1.5, 'category': 'nuts_and_seeds'},
        {'name': 'NACHOS TEX MEX', 'unit': 1, 'price': 0.95, 'amount': 0.95, 'category': 'snacks'}
    ],
    'total': 11.7,
    'number_items': 8,
    'payment_method': 'tarjeta'
}

example_cat_6 = {
    'store': 'HiperDino',
    'address': '9033-SD Mesa y Lopez',
    'city': 'Las Palmas de Gran Canaria',
    'phone': '928222758',
    'receipt_no': '2024/903314-00027051',
    'date': '30/03/2024',
    'time': '19:34',
    'items': [
        {
            'name': 'COCA-COLA REFRESCO COLA PET 2 L',
            'unit': 1,
            'price': 1.99,
            'amount': 1.99,
            'category': 'beverages'
        },
        {
            'name': 'BOLSA REUTILIZABLE 85% RECICLADA 48',
            'unit': 1,
            'price': 0.15,
            'amount': 0.15,
            'category': 'household'
        },
        {
            'name': 'PRESIDENT NATA FRESCA CREMOSA 20CL',
            'unit': 1,
            'price': 2.45,
            'amount': 2.45,
            'category': 'dairy'
        },
        {
            'name': 'TROPICAL CERVEZA PILSEN LATA 33 CL',
            'unit': 6,
            'price': 0.69,
            'amount': 4.14,
            'category': 'beverages'
        },
        {
            'name': 'ANOJO NACIO.TAPA/ESPAL/BABILLA FILET',
            'unit': 0.586,
            'price': 14.75,
            'amount': 7.0,
            'category': 'protein_foods'
        },
        {'name': 'LIMAS, EL KILO', 'unit': 0.07, 'price': 5.95, 'amount': 0.42, 'category': 'fruits'}
    ],
    'total': 16.15,
    'number_items': 11,
    'payment_method': 'efectivo'
}

In [None]:
example_1= {
    "store": "HiperDino",
    "address": "9238-SD Bernardo de la torre",
    "city": "Tafira Baja",
    "phone": "928493638",
    "receipt_no": "2024/923813-00060866",
    "date": "15/04/2024",
    "time": "16:01",
    "items": [
        {"name": "fresa tarina 500 gr", "unit": 1, "price": 1.59, "amount": 1.59},
        {"name": "hiperdino aceituna r/anchoa lata 350", "unit": 1, "price": 0.95, "amount": 0.95},
        {"name": "desperados cerveza toque tequila bot", "unit": 1, "price": 1.05, "amount": 1.05},
        {"name": "hiperdino centro jamon serrano bodeg", "unit": 0.310, "price": 13.62, "amount": 4.22},
        {"name": "montesano jamon cocido seleccion kg", "unit": 0.308, "price": 8.74, "amount": 2.15}
    ],
    "total": 9.96,
    "number_items": 5,
    "payment_method": "tarjeta"
}


example_2 = {
    "store": "SPAR TAFIRA",
    "address": "C/. Bruno Naranjo DIAZ 9A-9B",
    "city": "Tafira Baja",
    "phone": "928 351 616",
    "receipt_no": "014\\002-18965",
    "date": "06/04/2024",
    "time": "15:23",
    "items": [
        {"name": "CLIPPER MANZ.1.5L.", "unit": 1, "price": 1.49, "amount": 1.49},
        {"name": "PLATANO PRIMERA GR", "unit": 1.40, "price": 1.99, "amount": 2.79},
        {"name": "MANZANA PINK LADY GR", "unit": 1, "price": 2.99, "amount": 2.99},
        {"name": "SALSA.BARI.PES.GEN.1", "unit": 1, "price": 3.10, "amount": 3.10},
        {"name": "GOFIO B.LUGAR MIL.FU", "unit": 1, "price": 1.85, "amount": 1.85},
        {"name": "ZUM.DISF.D.SIMON PIN", "unit": 1, "price": 1.75, "amount": 1.75},
        {"name": "LECHE.GRNJ.FLR.UHT.", "unit": 1, "price": 1.15, "amount": 1.15}
    ],
    "total": 15.12,
    "number_items": 7,
    "payment_method": "tarjeta"
}

example_3= {
  "store": "SPAR TAFIRA",
  "address": "C/. BRUNO NARANJO DIAZ 9A-9B",
  "city": "TAFIRA BAJA",
  "phone": "928 351 616",
  "receipt_no": "014\001-42453",
  "date": "08/04/2024",
  "time": "10:47",
  "items": [
    {"name": "LECHE.GRNJ.FLR.UHT", "unit": 1, "price": 1.15, "amount": 1.15},
    {"name": "PUERROS GRANEL", "unit": 0.425, "price": 2.99, "amount": 1.27},
    {"name": "HUEVOS FRESCOS 12U", "unit": 1, "price": 2.99, "amount": 2.99},
    {"name": "ESPINACAS SPAR", "unit": 1, "price": 1.15, "amount": 1.15},
    {"name": "AGUA YUGUINAT NAT.8L", "unit": 1, "price": 1.49, "amount": 1.49}
  ],
  "total": 8.05,
  "number_items": 5,
  "payment_method": "tarjeta"
}

example_4 = {
    "store": "MERCADONA",
    "address": "AVDA. PINTOR FELO MONZON (C.C. 7 PALMAS) S/N",
    "city": "35019 LAS PALMAS DE GRAN CANARIA",
    "phone": "928411755",
    "receipt_no": "2185-013-6970Z2",
    "date": "03/04/2024",
    "time": "21:22",
    "items": [
        { "name": "DETERG HIPO COLONIA", "unit": 1, "price": 3.30, "amount": 3.30 },
        { "name": "SOLOMILLO POLLO CONG", "unit": 3, "price": 4.50, "amount": 13.50 },
        { "name": "JAMONCITO BARBACOA", "unit": 1, "price": 2.32, "amount": 2.32 },
        { "name": "JAMONCITO BARBACOA", "unit": 1, "price": 2.76, "amount": 2.76 },
        { "name": "NUEZ NATURAL", "unit": 1, "price": 2.00, "amount": 2.00 },
        { "name": "QUESU COTIAGE", "unit": 2, "price": 1.25, "amount": 2.50 },
        { "name": "POLLO ENTERO LIMPIO", "unit": 1, "price": 6.52, "amount": 6.52 },
        { "name": "PAPEL VEGETAL 30H", "unit": 1, "price": 1.70, "amount": 1.70 },
        { "name": "BEBIDA AVELLANAS", "unit": 1, "price": 1.30, "amount": 1.30 },
        { "name": "INFUSION DORMIR", "unit": 1, "price": 1.05, "amount": 1.05 },
        { "name": "LECHE DE COCO", "unit": 1, "price": 1.40, "amount": 1.40 },
        { "name": "QUESO UNTAR LIGHT", "unit": 1, "price": 1.35, "amount": 1.35 },
        { "name": "RULITO CABRA", "unit": 1, "price": 2.45, "amount": 2.45 },
        { "name": "GRIEGO LIGERO", "unit": 1, "price": 1.65, "amount": 1.65 },
        { "name": "BOLSA PLASTICO", "unit": 1, "price": 0.15, "amount": 0.15 }
        ],
    "total": 43.95,
    "number_items": 15,
    "payment_method": "tarjeta"
    }



# Convert Receipt with OCR and LLM


## OCR

### Unstructured Partition

So far, the best extraction was produced by 'ocr_only' strategy.

In [None]:
!pip install unstructured

Collecting unstructured
  Downloading unstructured-0.13.2-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting emoji (from unstructured)
  Downloading emoji-2.11.0-py2.py3-none-any.whl (433 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m433.8/433.8 kB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from unstructured)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2024.2.7-py3-none-any.whl (274 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m274.7/274.7 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langdetect (

In [None]:
from unstructured_client import UnstructuredClient
from unstructured_client.models import shared
from unstructured_client.models.errors import SDKError

from unstructured.staging.base import dict_to_elements

In [None]:
from Utils import Utils
utils = Utils()

DLAI_API_KEY = utils.get_dlai_api_key()
DLAI_API_URL = utils.get_dlai_url()

s = UnstructuredClient(
    api_key_auth=DLAI_API_KEY,
    server_url=DLAI_API_URL,
)

In [None]:
filename = "example_files/Scan 14 Apr 2024.pdf"

with open(filename, "rb") as f:
    files=shared.Files(
        content=f.read(),
        file_name=filename,
    )

req = shared.PartitionParameters(
    files=files,
    languages=["es"],
    strategy="ocr_only",
)

try:
    resp = s.general.partition(req)
    elements = dict_to_elements(resp.elements)
except SDKError as e:
    print(e)

In [None]:
req = shared.PartitionParameters(
    files=image_1,
    languages=["es"],
    strategy="ocr_only",
)

try:
    resp = s.general.partition(req)
    elements = dict_to_elements(resp.elements)
except SDKError as e:
    print(e)

NameError: name 's' is not defined

In [None]:
for element in elements[:100]:
    print(f"{element.category.upper()}: {element.text}")

### Paddleocr

In [None]:
!pip install paddleocr paddlepaddle

Collecting paddlepaddle
  Downloading paddlepaddle-2.6.1-cp310-cp310-manylinux1_x86_64.whl (125.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.9/125.9 MB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpx (from paddlepaddle)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
Collecting astor (from paddlepaddle)
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting httpcore==1.* (from httpx->paddlepaddle)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->paddlepaddle)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
Installing

In [None]:
from paddleocr import PaddleOCR, draw_ocr
#from ast import literal_eval

In [None]:
paddleocr = PaddleOCR(lang="es",ocr_version="PP-OCRv4",show_log = False,use_gpu=True)

def paddle_scan(paddleocr,img_path_or_nparray):
    result = paddleocr.ocr(img_path_or_nparray,cls=True)
    result = result[0]
    boxes = [line[0] for line in result]       #boundign box
    txts = [line[1][0] for line in result]     #raw text
    scores = [line[1][1] for line in result]   # scores
    return  txts, result

download https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar to /root/.paddleocr/whl/det/en/en_PP-OCRv3_det_infer/en_PP-OCRv3_det_infer.tar


100%|██████████| 4.00M/4.00M [00:16<00:00, 250kiB/s] 


download https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar to /root/.paddleocr/whl/rec/latin/latin_PP-OCRv3_rec_infer/latin_PP-OCRv3_rec_infer.tar


100%|██████████| 10.2M/10.2M [00:17<00:00, 569kiB/s] 


download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar


100%|██████████| 2.19M/2.19M [00:14<00:00, 147kiB/s]


In [None]:
# perform ocr scan
receipt_texts, receipt_boxes = paddle_scan(paddleocr, image_cv5)
print(50*"--","\ntext only:\n",receipt_texts)
print(50*"--","\nocr boxes:\n",receipt_boxes)

In [None]:
receipt_texts_1 = [
    'HiperDino',
    'Las mcjores precios de Canarias',
    'DINOSOL SUPERMERCADOS. S.L',
    'C.I.F.B61742565',
    '9238-SD BERNARD0 DE LA T0RRE',
    'Te1éfono:928493638',
    'Centro Vend. Documento',
    'Fecha',
    'Hora',
    '9238 7868352024/923813-0006086615/04/2024 16:01',
    'ARTICULO',
    'IMPORTE',
    'FRESA TARRINA 500 GR',
    '1,59',
    'HIPERDINO ACEITUNA R/ANCHOA LATA 350',
    '0,95',
    'DESPERADOS CERVEZA TOQUE TEQUILA BOT',
    '1,05',
    'HIPERDINO CENTRO JAMON SERRANO BODEG',
    '0.310x13,62€/kg',
    '4,22',
    'MONTESANO JAMON COCIDO SELECCION KG',
    '0,308 x 8,74 €/kg',
    'Dto.0,54€',
    '2,15',
    'Total Articulos: 5',
    'TOTAL COMPRA:',
    '9,96',
    'Detalle de pagos',
    'EFECTIVO',
    '0,00',
    'TARJETA CREDITO',
    '9,96',
    'EMPLEAD0:12789.TICKET_P.E.203659',
    'HORA:160142',
    'FECHA-15/04/2024',
    'IMP0RTE9,96',
    'TARJETAxxxxxxxx*xxx5597',
    '087663',
    'CAPTURA CHIP / AUTORIZACION:',
    'LABEL: Mastercard',
    'ARC: 00',
    'ATC:004F',
    'AID:A0000000041010',
    'AUTENTICACION: Contact1ess EMV',
    'DCC INTERNACIONAL/REDSYS PCI',
    'COM. PE: 154197156',
    'TER. PE: 00000001',
    'SES. PE:15042024001'
]

receipt_boxes_1 = [
    [[[553.0, 283.0], [1521.0, 298.0], [1519.0, 411.0], [551.0, 397.0]], ('HiperDino', 0.9557597637176514)],
    [
        [[809.0, 414.0], [1517.0, 421.0], [1517.0, 488.0], [808.0, 481.0]],
        ('Las mcjores precios de Canarias', 0.8508884906768799)
    ],
    [
        [[305.0, 541.0], [1243.0, 555.0], [1243.0, 612.0], [304.0, 598.0]],
        ('DINOSOL SUPERMERCADOS. S.L', 0.9383440017700195)
    ],
    [[[478.0, 614.0], [1086.0, 622.0], [1086.0, 678.0], [478.0, 671.0]], ('C.I.F.B61742565', 0.9678167700767517)],
    [
        [[305.0, 681.0], [1290.0, 692.0], [1289.0, 752.0], [304.0, 741.0]],
        ('9238-SD BERNARD0 DE LA T0RRE', 0.9484425187110901)
    ],
    [
        [[445.0, 755.0], [1116.0, 762.0], [1116.0, 822.0], [444.0, 815.0]],
        ('Te1éfono:928493638', 0.9503382444381714)
    ],
    [
        [[27.0, 885.0], [682.0, 896.0], [681.0, 962.0], [26.0, 952.0]],
        ('Centro Vend. Documento', 0.9584930539131165)
    ],
    [[[996.0, 902.0], [1143.0, 902.0], [1143.0, 965.0], [996.0, 965.0]], ('Fecha', 0.9965404272079468)],
    [[[1290.0, 897.0], [1415.0, 907.0], [1410.0, 970.0], [1285.0, 961.0]], ('Hora', 0.9691758751869202)],
    [
        [[27.0, 962.0], [1430.0, 976.0], [1430.0, 1043.0], [27.0, 1029.0]],
        ('9238 7868352024/923813-0006086615/04/2024 16:01', 0.9697328805923462)
    ],
    [[[47.0, 1112.0], [324.0, 1112.0], [324.0, 1169.0], [47.0, 1169.0]], ('ARTICULO', 0.9956818222999573)],
    [[[1320.0, 1126.0], [1567.0, 1126.0], [1567.0, 1186.0], [1320.0, 1186.0]], ('IMPORTE', 0.9965335130691528)],
    [
        [[44.0, 1179.0], [735.0, 1186.0], [735.0, 1243.0], [43.0, 1236.0]],
        ('FRESA TARRINA 500 GR', 0.9315673112869263)
    ],
    [[[1420.0, 1196.0], [1567.0, 1196.0], [1567.0, 1256.0], [1420.0, 1256.0]], ('1,59', 0.9963756799697876)],
    [
        [[47.0, 1242.0], [1284.0, 1260.0], [1283.0, 1320.0], [47.0, 1303.0]],
        ('HIPERDINO ACEITUNA R/ANCHOA LATA 350', 0.9492071866989136)
    ],
    [[[1417.0, 1263.0], [1567.0, 1263.0], [1567.0, 1323.0], [1417.0, 1323.0]], ('0,95', 0.971403181552887)],
    [
        [[51.0, 1302.0], [1280.0, 1323.0], [1279.0, 1390.0], [50.0, 1369.0]],
        ('DESPERADOS CERVEZA TOQUE TEQUILA BOT', 0.9558606743812561)
    ],
    [[[1417.0, 1330.0], [1564.0, 1330.0], [1564.0, 1393.0], [1417.0, 1393.0]], ('1,05', 0.9815376996994019)],
    [
        [[54.0, 1369.0], [1284.0, 1390.0], [1282.0, 1457.0], [53.0, 1436.0]],
        ('HIPERDINO CENTRO JAMON SERRANO BODEG', 0.9541582465171814)
    ],
    [[[91.0, 1436.0], [715.0, 1447.0], [714.0, 1507.0], [90.0, 1496.0]], ('0.310x13,62€/kg', 0.8647465109825134)],
    [[[1410.0, 1457.0], [1564.0, 1457.0], [1564.0, 1530.0], [1410.0, 1530.0]], ('4,22', 0.9961657524108887)],
    [
        [[61.0, 1496.0], [1247.0, 1517.0], [1246.0, 1584.0], [60.0, 1563.0]],
        ('MONTESANO JAMON COCIDO SELECCION KG', 0.9631489515304565)
    ],
    [
        [[97.0, 1567.0], [692.0, 1570.0], [691.0, 1637.0], [97.0, 1633.0]],
        ('0,308 x 8,74 €/kg', 0.8895600438117981)
    ],
    [[[766.0, 1576.0], [1203.0, 1587.0], [1202.0, 1647.0], [765.0, 1636.0]], ('Dto.0,54€', 0.914892315864563)],
    [[[1396.0, 1588.0], [1554.0, 1579.0], [1558.0, 1652.0], [1400.0, 1661.0]], ('2,15', 0.9918473362922668)],
    [
        [[63.0, 1698.0], [684.0, 1683.0], [686.0, 1750.0], [65.0, 1764.0]],
        ('Total Articulos: 5', 0.9652906060218811)
    ],
    [
        [[712.0, 1816.0], [1138.0, 1835.0], [1133.0, 1942.0], [707.0, 1923.0]],
        ('TOTAL COMPRA:', 0.9575488567352295)
    ],
    [[[1386.0, 1828.0], [1550.0, 1843.0], [1538.0, 1974.0], [1374.0, 1958.0]], ('9,96', 0.993824303150177)],
    [
        [[69.0, 2025.0], [616.0, 2000.0], [619.0, 2067.0], [72.0, 2092.0]],
        ('Detalle de pagos', 0.9854116439819336)
    ],
    [[[870.0, 2057.0], [1137.0, 2069.0], [1135.0, 2125.0], [868.0, 2114.0]], ('EFECTIVO', 0.9955223798751831)],
    [[[1382.0, 2071.0], [1536.0, 2086.0], [1529.0, 2160.0], [1374.0, 2145.0]], ('0,00', 0.9603860974311829)],
    [
        [[650.0, 2114.0], [1130.0, 2128.0], [1128.0, 2189.0], [648.0, 2174.0]],
        ('TARJETA CREDITO', 0.9745450615882874)
    ],
    [[[1382.0, 2136.0], [1537.0, 2154.0], [1529.0, 2224.0], [1374.0, 2206.0]], ('9,96', 0.9873324632644653)],
    [
        [[74.0, 2242.0], [1263.0, 2242.0], [1263.0, 2318.0], [74.0, 2318.0]],
        ('EMPLEAD0:12789.TICKET_P.E.203659', 0.9293738007545471)
    ],
    [[[689.0, 2301.0], [1257.0, 2315.0], [1256.0, 2372.0], [688.0, 2358.0]], ('HORA:160142', 0.9271934628486633)],
    [
        [[80.0, 2312.0], [688.0, 2308.0], [688.0, 2365.0], [80.0, 2369.0]],
        ('FECHA-15/04/2024', 0.9704382419586182)
    ],
    [[[80.0, 2372.0], [554.0, 2365.0], [555.0, 2422.0], [81.0, 2429.0]], ('IMP0RTE9,96', 0.9311807155609131)],
    [
        [[83.0, 2432.0], [932.0, 2425.0], [933.0, 2482.0], [84.0, 2489.0]],
        ('TARJETAxxxxxxxx*xxx5597', 0.7477181553840637)
    ],
    [[[1048.0, 2488.0], [1254.0, 2500.0], [1251.0, 2560.0], [1045.0, 2548.0]], ('087663', 0.9982572197914124)],
    [
        [[90.0, 2499.0], [996.0, 2499.0], [996.0, 2546.0], [90.0, 2546.0]],
        ('CAPTURA CHIP / AUTORIZACION:', 0.9617992639541626)
    ],
    [
        [[84.0, 2559.0], [645.0, 2559.0], [645.0, 2616.0], [84.0, 2616.0]],
        ('LABEL: Mastercard', 0.9425782561302185)
    ],
    [[[73.0, 2687.0], [316.0, 2678.0], [318.0, 2749.0], [75.0, 2757.0]], ('ARC: 00', 0.9868200421333313)],
    [[[721.0, 2687.0], [1024.0, 2675.0], [1027.0, 2745.0], [724.0, 2757.0]], ('ATC:004F', 0.9622146487236023)],
    [
        [[73.0, 2759.0], [705.0, 2752.0], [705.0, 2813.0], [74.0, 2820.0]],
        ('AID:A0000000041010', 0.9751736521720886)
    ],
    [
        [[63.0, 2827.0], [1058.0, 2799.0], [1060.0, 2876.0], [65.0, 2904.0]],
        ('AUTENTICACION: Contact1ess EMV', 0.9853928685188293)
    ],
    [
        [[60.0, 2897.0], [991.0, 2872.0], [993.0, 2939.0], [61.0, 2964.0]],
        ('DCC INTERNACIONAL/REDSYS PCI', 0.9965552091598511)
    ],
    [
        [[56.0, 2964.0], [701.0, 2946.0], [702.0, 3013.0], [58.0, 3031.0]],
        ('COM. PE: 154197156', 0.9532525539398193)
    ],
    [
        [[53.0, 3034.0], [663.0, 3012.0], [666.0, 3083.0], [55.0, 3104.0]],
        ('TER. PE: 00000001', 0.9727330803871155)
    ],
    [
        [[49.0, 3104.0], [757.0, 3076.0], [759.0, 3143.0], [52.0, 3171.0]],
        ('SES. PE:15042024001', 0.9732759594917297)
    ]
]

In [None]:
receipt_texts_2 = [
    'SPAR TAFIRA',
    'C/.BRUNO NARANJO DIAZ9A-B',
    'TLF.:928351616-FAX:928351004',
    'NIFB02868248',
    'SUPERMERCAD0S DABEL2021,S.L',
    'TAFIRA BAJA',
    'FACTURA SIMPLIFICADA',
    'Nro.014002-18965',
    'Fecha:06-04-202415:23',
    'Cajerc:10074',
    'CANT.',
    'PVP IMPORTE',
    'DESCRIPCION',
    '1,49',
    '1,49',
    'CLIPPER MANZ.1.5L.',
    '1',
    '1,40',
    '1,99',
    'PLATANO PRIMERA GRAN',
    '2,79',
    '2,99',
    '2.99',
    'MANZANA PINK LADY GR',
    '3,10',
    '3,10',
    'SALSA.BARI.PES.GEN.1',
    '1,85',
    '1,85',
    'GOFIO B.LUGAR MIL.FU',
    '1',
    '1,75',
    '1,75',
    'ZUM.DISF.D.SIMON PIN',
    '1',
    '1,15',
    '1,15',
    'LECHE.GRNJ.FLR.UHT.',
    '1',
    'Lineas : 7',
    'Total F',
    '15,12',
    '"TARJETA',
    '15.12',
    'Entregado',
    'Cambio',
    '0,00',
    'Operacion',
    ': VENTA',
    '06/04/202415:24',
    'Fecha',
    'Comercio',
    '249060518',
    'ARC',
    '00',
    'A0000000031010',
    'AID',
    'Visa DEBIT',
    'App Labe1',
    '************761',
    'Tarjeta',
    '15,12EUR',
    'Importe',
    '-Copia para al'
]

receipt_boxes_2 = [
    [[[578.0, 158.0], [931.0, 172.0], [928.0, 240.0], [575.0, 226.0]], ('SPAR TAFIRA', 0.9683101177215576)],
    [
        [[324.0, 223.0], [1201.0, 256.0], [1198.0, 324.0], [321.0, 290.0]],
        ('C/.BRUNO NARANJO DIAZ9A-B', 0.9126052856445312)
    ],
    [
        [[212.0, 301.0], [1334.0, 318.0], [1332.0, 395.0], [211.0, 378.0]],
        ('TLF.:928351616-FAX:928351004', 0.9645776748657227)
    ],
    [[[545.0, 384.0], [993.0, 399.0], [990.0, 467.0], [542.0, 452.0]], ('NIFB02868248', 0.9573373794555664)],
    [
        [[281.0, 453.0], [1213.0, 473.0], [1212.0, 548.0], [279.0, 527.0]],
        ('SUPERMERCAD0S DABEL2021,S.L', 0.9348011612892151)
    ],
    [[[575.0, 536.0], [931.0, 554.0], [928.0, 622.0], [571.0, 604.0]], ('TAFIRA BAJA', 0.970614492893219)],
    [
        [[443.0, 695.0], [1083.0, 706.0], [1082.0, 771.0], [442.0, 760.0]],
        ('FACTURA SIMPLIFICADA', 0.9670684933662415)
    ],
    [[[120.0, 854.0], [806.0, 854.0], [806.0, 919.0], [120.0, 919.0]], ('Nro.014002-18965', 0.9575783014297485)],
    [
        [[114.0, 932.0], [904.0, 935.0], [904.0, 1000.0], [114.0, 997.0]],
        ('Fecha:06-04-202415:23', 0.9403641819953918)
    ],
    [[[988.0, 939.0], [1410.0, 932.0], [1412.0, 1003.0], [989.0, 1010.0]], ('Cajerc:10074', 0.9790463447570801)],
    [[[797.0, 1084.0], [956.0, 1084.0], [956.0, 1159.0], [797.0, 1159.0]], ('CANT.', 0.9110283851623535)],
    [
        [[1048.0, 1080.0], [1415.0, 1088.0], [1413.0, 1166.0], [1047.0, 1158.0]],
        ('PVP IMPORTE', 0.9454063177108765)
    ],
    [[[104.0, 1091.0], [475.0, 1091.0], [475.0, 1155.0], [104.0, 1155.0]], ('DESCRIPCION', 0.9966978430747986)],
    [[[1041.0, 1240.0], [1200.0, 1240.0], [1200.0, 1324.0], [1041.0, 1324.0]], ('1,49', 0.9852297306060791)],
    [[[1279.0, 1235.0], [1419.0, 1244.0], [1414.0, 1328.0], [1273.0, 1319.0]], ('1,49', 0.9922319054603577)],
    [
        [[94.0, 1250.0], [695.0, 1243.0], [696.0, 1307.0], [95.0, 1314.0]],
        ('CLIPPER MANZ.1.5L.', 0.9311488270759583)
    ],
    [[[927.0, 1246.0], [979.0, 1246.0], [979.0, 1311.0], [927.0, 1311.0]], ('1', 0.9985104203224182)],
    [[[823.0, 1327.0], [972.0, 1327.0], [972.0, 1401.0], [823.0, 1401.0]], ('1,40', 0.9051637053489685)],
    [[[1057.0, 1324.0], [1197.0, 1324.0], [1197.0, 1401.0], [1057.0, 1401.0]], ('1,99', 0.9836426973342896)],
    [
        [[91.0, 1334.0], [777.0, 1330.0], [777.0, 1395.0], [91.0, 1398.0]],
        ('PLATANO PRIMERA GRAN', 0.9556494951248169)
    ],
    [[[1278.0, 1327.0], [1421.0, 1327.0], [1421.0, 1401.0], [1278.0, 1401.0]], ('2,79', 0.9701625108718872)],
    [[[1057.0, 1408.0], [1197.0, 1408.0], [1197.0, 1486.0], [1057.0, 1486.0]], ('2,99', 0.9760459661483765)],
    [[[1278.0, 1408.0], [1428.0, 1408.0], [1428.0, 1492.0], [1278.0, 1492.0]], ('2.99', 0.8942591547966003)],
    [
        [[84.0, 1421.0], [767.0, 1414.0], [768.0, 1479.0], [85.0, 1486.0]],
        ('MANZANA PINK LADY GR', 0.9861181974411011)
    ],
    [[[1052.0, 1500.0], [1197.0, 1491.0], [1201.0, 1565.0], [1057.0, 1574.0]], ('3,10', 0.9743518829345703)],
    [[[1284.0, 1495.0], [1428.0, 1495.0], [1428.0, 1570.0], [1284.0, 1570.0]], ('3,10', 0.9775949120521545)],
    [
        [[87.0, 1508.0], [764.0, 1498.0], [765.0, 1563.0], [88.0, 1573.0]],
        ('SALSA.BARI.PES.GEN.1', 0.9844762086868286)
    ],
    [[[1057.0, 1583.0], [1200.0, 1583.0], [1200.0, 1654.0], [1057.0, 1654.0]], ('1,85', 0.9779074192047119)],
    [[[1291.0, 1579.0], [1428.0, 1579.0], [1428.0, 1654.0], [1291.0, 1654.0]], ('1,85', 0.9706394672393799)],
    [
        [[91.0, 1596.0], [770.0, 1586.0], [771.0, 1650.0], [92.0, 1661.0]],
        ('GOFIO B.LUGAR MIL.FU', 0.9443401098251343)
    ],
    [[[930.0, 1589.0], [969.0, 1589.0], [969.0, 1647.0], [930.0, 1647.0]], ('1', 0.9942540526390076)],
    [[[1060.0, 1670.0], [1203.0, 1670.0], [1203.0, 1741.0], [1060.0, 1741.0]], ('1,75', 0.9847022294998169)],
    [[[1290.0, 1665.0], [1427.0, 1656.0], [1432.0, 1733.0], [1295.0, 1742.0]], ('1,75', 0.9037696123123169)],
    [
        [[88.0, 1680.0], [764.0, 1673.0], [764.0, 1738.0], [88.0, 1745.0]],
        ('ZUM.DISF.D.SIMON PIN', 0.9542239904403687)
    ],
    [[[927.0, 1676.0], [969.0, 1676.0], [969.0, 1735.0], [927.0, 1735.0]], ('1', 0.998862624168396)],
    [[[1060.0, 1748.0], [1210.0, 1748.0], [1210.0, 1832.0], [1060.0, 1832.0]], ('1,15', 0.9914188981056213)],
    [[[1291.0, 1751.0], [1431.0, 1751.0], [1431.0, 1825.0], [1291.0, 1825.0]], ('1,15', 0.9535517692565918)],
    [
        [[84.0, 1767.0], [721.0, 1760.0], [722.0, 1825.0], [85.0, 1832.0]],
        ('LECHE.GRNJ.FLR.UHT.', 0.924342691898346)
    ],
    [[[930.0, 1761.0], [976.0, 1761.0], [976.0, 1829.0], [930.0, 1829.0]], ('1', 0.9988415837287903)],
    [[[91.0, 1851.0], [429.0, 1851.0], [429.0, 1919.0], [91.0, 1919.0]], ('Lineas : 7', 0.9390141367912292)],
    [[[686.0, 1921.0], [961.0, 1930.0], [956.0, 2076.0], [682.0, 2067.0]], ('Total F', 0.9057044982910156)],
    [[[1246.0, 1915.0], [1447.0, 1900.0], [1460.0, 2065.0], [1258.0, 2080.0]], ('15,12', 0.9402429461479187)],
    [[[195.0, 2065.0], [504.0, 2065.0], [504.0, 2143.0], [195.0, 2143.0]], ('"TARJETA', 0.8845714330673218)],
    [[[1264.0, 2056.0], [1438.0, 2048.0], [1442.0, 2132.0], [1268.0, 2140.0]], ('15.12', 0.939841628074646)],
    [[[693.0, 2071.0], [1002.0, 2071.0], [1002.0, 2139.0], [693.0, 2139.0]], ('Entregado', 0.9893643260002136)],
    [[[689.0, 2152.0], [901.0, 2152.0], [901.0, 2223.0], [689.0, 2223.0]], ('Cambio', 0.9739894270896912)],
    [[[1291.0, 2146.0], [1444.0, 2146.0], [1444.0, 2220.0], [1291.0, 2220.0]], ('0,00', 0.9574496150016785)],
    [[[94.0, 2311.0], [399.0, 2304.0], [401.0, 2372.0], [95.0, 2379.0]], ('Operacion', 0.987079381942749)],
    [[[490.0, 2309.0], [733.0, 2297.0], [736.0, 2365.0], [493.0, 2377.0]], (': VENTA', 0.9161697030067444)],
    [
        [[495.0, 2372.0], [1106.0, 2379.0], [1105.0, 2457.0], [494.0, 2450.0]],
        ('06/04/202415:24', 0.9829444885253906)
    ],
    [[[90.0, 2389.0], [271.0, 2381.0], [274.0, 2452.0], [93.0, 2461.0]], ('Fecha', 0.9836984872817993)],
    [[[93.0, 2464.0], [371.0, 2449.0], [375.0, 2517.0], [97.0, 2532.0]], ('Comercio', 0.9831650257110596)],
    [[[501.0, 2456.0], [878.0, 2456.0], [878.0, 2521.0], [501.0, 2521.0]], ('249060518', 0.9958345890045166)],
    [[[101.0, 2528.0], [221.0, 2528.0], [221.0, 2602.0], [101.0, 2602.0]], ('ARC', 0.9975335597991943)],
    [[[507.0, 2531.0], [644.0, 2531.0], [644.0, 2592.0], [507.0, 2592.0]], ('00', 0.9864296913146973)],
    [
        [[515.0, 2598.0], [1038.0, 2609.0], [1036.0, 2674.0], [513.0, 2663.0]],
        ('A0000000031010', 0.9740484952926636)
    ],
    [[[97.0, 2608.0], [219.0, 2593.0], [228.0, 2668.0], [106.0, 2682.0]], ('AID', 0.9973187446594238)],
    [[[509.0, 2666.0], [902.0, 2681.0], [899.0, 2749.0], [506.0, 2734.0]], ('Visa DEBIT', 0.9253913760185242)],
    [[[115.0, 2688.0], [409.0, 2665.0], [415.0, 2733.0], [121.0, 2756.0]], ('App Labe1', 0.9369841814041138)],
    [
        [[522.0, 2808.0], [1097.0, 2836.0], [1094.0, 2901.0], [519.0, 2873.0]],
        ('************761', 0.8199026584625244)
    ],
    [[[109.0, 2839.0], [358.0, 2803.0], [369.0, 2880.0], [120.0, 2916.0]], ('Tarjeta', 0.9906787276268005)],
    [[[531.0, 2872.0], [877.0, 2901.0], [871.0, 2970.0], [525.0, 2940.0]], ('15,12EUR', 0.9739892482757568)],
    [[[120.0, 2923.0], [360.0, 2873.0], [375.0, 2944.0], [135.0, 2994.0]], ('Importe', 0.968542754650116)],
    [
        [[401.0, 3016.0], [836.0, 3023.0], [835.0, 3101.0], [400.0, 3094.0]],
        ('-Copia para al', 0.9756090044975281)
    ]
]

In [None]:
receipt_texts_3 = [
    'SPAR TAFIRA',
    'C/. BRUNO NARANJO DIAZ 9A-9B',
    'TLF.: 928 351 616 - FAX: 928 351 004',
    'NIF:BO2868248',
    'SUPERMERCAD0S DABEL 2021, S.L.',
    'TAFIRA BAJA',
    'FACTURA SIMPLIFICADA',
    'Nro.:014001-42453',
    'Cajero13807',
    'Fecha:08-04-202410:47',
    'CANT.',
    'DESCRIPCION',
    'PVP IMPORTE',
    '1,15',
    '1',
    '1,15',
    'LECHE.GRNJ.FLR.UHT',
    '0,425',
    'PUERROS GRANEL',
    '2,99',
    '1,27',
    '1',
    '2,99',
    'HUEVOS FRESCOS 12U',
    '2,99',
    '1,15',
    '1,15',
    'ESPINACAS SPAR',
    '1',
    '1,49',
    '1,49',
    'AGUA YUGUINAT NAT.8L',
    '1',
    'Lineas : 5',
    '8,05',
    'Total€',
    '-TARJETA',
    '8,05',
    'Entregado',
    'Cambio',
    '0,00',
    'Operacion',
    ': YENTA',
    'Fecha',
    '08/04/202410:48',
    'Comercio',
    '249060518',
    'ARC',
    ': 00',
    'AID',
    'A0000000041010',
    'App Label',
    ':DEBIT MASTERCARD',
    'Autorizacicn:PE30RO',
    'Tarjeta',
    '************408',
    'Importe',
    '8,05EUR'
]

receipt_boxes_3 = [
    [[[700.0, 245.0], [1129.0, 213.0], [1135.0, 294.0], [706.0, 325.0]], ('SPAR TAFIRA', 0.9983623623847961)],
    [
        [[401.0, 365.0], [1511.0, 289.0], [1516.0, 366.0], [406.0, 441.0]],
        ('C/. BRUNO NARANJO DIAZ 9A-9B', 0.944857656955719)
    ],
    [
        [[265.0, 463.0], [1691.0, 384.0], [1696.0, 468.0], [269.0, 547.0]],
        ('TLF.: 928 351 616 - FAX: 928 351 004', 0.9607812166213989)
    ],
    [[[668.0, 539.0], [1221.0, 515.0], [1224.0, 592.0], [671.0, 616.0]], ('NIF:BO2868248', 0.9480723142623901)],
    [
        [[380.0, 641.0], [1545.0, 603.0], [1548.0, 686.0], [383.0, 725.0]],
        ('SUPERMERCAD0S DABEL 2021, S.L.', 0.911349892616272)
    ],
    [[[719.0, 732.0], [1146.0, 719.0], [1148.0, 796.0], [721.0, 808.0]], ('TAFIRA BAJA', 0.9757936596870422)],
    [
        [[575.0, 924.0], [1341.0, 909.0], [1342.0, 985.0], [577.0, 1001.0]],
        ('FACTURA SIMPLIFICADA', 0.9818708300590515)
    ],
    [
        [[219.0, 1109.0], [1000.0, 1102.0], [1000.0, 1178.0], [220.0, 1186.0]],
        ('Nro.:014001-42453', 0.9138452410697937)
    ],
    [
        [[1224.0, 1189.0], [1756.0, 1197.0], [1755.0, 1273.0], [1223.0, 1265.0]],
        ('Cajero13807', 0.9820167422294617)
    ],
    [
        [[212.0, 1204.0], [1115.0, 1192.0], [1116.0, 1265.0], [213.0, 1277.0]],
        ('Fecha:08-04-202410:47', 0.9625601768493652)
    ],
    [[[993.0, 1367.0], [1198.0, 1367.0], [1198.0, 1458.0], [993.0, 1458.0]], ('CANT.', 0.9783948659896851)],
    [[[219.0, 1382.0], [630.0, 1382.0], [630.0, 1458.0], [219.0, 1458.0]], ('DESCRIPCION', 0.9935283660888672)],
    [
        [[1302.0, 1375.0], [1752.0, 1375.0], [1752.0, 1451.0], [1302.0, 1451.0]],
        ('PVP IMPORTE', 0.9275673031806946)
    ],
    [[[1302.0, 1553.0], [1475.0, 1553.0], [1475.0, 1647.0], [1302.0, 1647.0]], ('1,15', 0.9868576526641846)],
    [[[1158.0, 1564.0], [1205.0, 1564.0], [1205.0, 1633.0], [1158.0, 1633.0]], ('1', 0.9951280355453491)],
    [[[1579.0, 1560.0], [1756.0, 1560.0], [1756.0, 1644.0], [1579.0, 1644.0]], ('1,15', 0.990277886390686)],
    [
        [[219.0, 1571.0], [920.0, 1560.0], [921.0, 1632.0], [220.0, 1644.0]],
        ('LECHE.GRNJ.FLR.UHT', 0.9467834830284119)
    ],
    [[[1000.0, 1651.0], [1216.0, 1651.0], [1216.0, 1742.0], [1000.0, 1742.0]], ('0,425', 0.9831347465515137)],
    [
        [[219.0, 1662.0], [745.0, 1662.0], [745.0, 1735.0], [219.0, 1735.0]],
        ('PUERROS GRANEL', 0.9591296315193176)
    ],
    [[[1313.0, 1655.0], [1471.0, 1655.0], [1471.0, 1738.0], [1313.0, 1738.0]], ('2,99', 0.9422935843467712)],
    [[[1579.0, 1658.0], [1752.0, 1658.0], [1752.0, 1742.0], [1579.0, 1742.0]], ('1,27', 0.9853314757347107)],
    [[[1158.0, 1753.0], [1220.0, 1753.0], [1220.0, 1836.0], [1158.0, 1836.0]], ('1', 0.9992625117301941)],
    [[[1309.0, 1746.0], [1479.0, 1746.0], [1479.0, 1840.0], [1309.0, 1840.0]], ('2,99', 0.9851952791213989)],
    [
        [[212.0, 1756.0], [899.0, 1756.0], [899.0, 1833.0], [212.0, 1833.0]],
        ('HUEVOS FRESCOS 12U', 0.9335262775421143)
    ],
    [[[1572.0, 1753.0], [1752.0, 1753.0], [1752.0, 1836.0], [1572.0, 1836.0]], ('2,99', 0.9410834312438965)],
    [[[1317.0, 1844.0], [1482.0, 1844.0], [1482.0, 1938.0], [1317.0, 1938.0]], ('1,15', 0.9835388660430908)],
    [[[1576.0, 1839.0], [1757.0, 1849.0], [1752.0, 1943.0], [1571.0, 1933.0]], ('1,15', 0.9866871237754822)],
    [
        [[201.0, 1859.0], [744.0, 1851.0], [745.0, 1923.0], [202.0, 1931.0]],
        ('ESPINACAS SPAR', 0.9640835523605347)
    ],
    [[[1155.0, 1855.0], [1220.0, 1855.0], [1220.0, 1927.0], [1155.0, 1927.0]], ('1', 0.9950574636459351)],
    [[[1324.0, 1942.0], [1489.0, 1942.0], [1489.0, 2036.0], [1324.0, 2036.0]], ('1,49', 0.9849990010261536)],
    [[[1583.0, 1942.0], [1752.0, 1942.0], [1752.0, 2036.0], [1583.0, 2036.0]], ('1,49', 0.9936288595199585)],
    [
        [[198.0, 1953.0], [967.0, 1945.0], [968.0, 2022.0], [198.0, 2029.0]],
        ('AGUA YUGUINAT NAT.8L', 0.9467948079109192)
    ],
    [[[1169.0, 1953.0], [1216.0, 1953.0], [1216.0, 2022.0], [1169.0, 2022.0]], ('1', 0.9951707720756531)],
    [[[194.0, 2051.0], [586.0, 2051.0], [586.0, 2127.0], [194.0, 2127.0]], ('Lineas : 5', 0.9632811546325684)],
    [[[1584.0, 2134.0], [1768.0, 2121.0], [1780.0, 2299.0], [1596.0, 2312.0]], ('8,05', 0.9636508226394653)],
    [[[903.0, 2146.0], [1202.0, 2146.0], [1202.0, 2298.0], [903.0, 2298.0]], ('Total€', 0.883273184299469)],
    [[[316.0, 2286.0], [678.0, 2303.0], [674.0, 2390.0], [312.0, 2373.0]], ('-TARJETA', 0.9355770945549011)],
    [[[1604.0, 2295.0], [1774.0, 2295.0], [1774.0, 2389.0], [1604.0, 2389.0]], ('8,05', 0.9634414911270142)],
    [[[902.0, 2306.0], [1279.0, 2298.0], [1281.0, 2385.0], [904.0, 2393.0]], ('Entregado', 0.9867023825645447)],
    [[[899.0, 2400.0], [1148.0, 2400.0], [1148.0, 2491.0], [899.0, 2491.0]], ('Cambio', 0.9771128296852112)],
    [[[1608.0, 2393.0], [1781.0, 2393.0], [1781.0, 2491.0], [1608.0, 2491.0]], ('0,00', 0.9853395223617554)],
    [[[203.0, 2574.0], [565.0, 2582.0], [563.0, 2670.0], [201.0, 2661.0]], ('Operacion', 0.978510320186615)],
    [[[668.0, 2592.0], [955.0, 2605.0], [950.0, 2696.0], [664.0, 2682.0]], (': YENTA', 0.8834457397460938)],
    [[[205.0, 2676.0], [410.0, 2676.0], [410.0, 2756.0], [205.0, 2756.0]], ('Fecha', 0.989087700843811)],
    [
        [[673.0, 2698.0], [1393.0, 2706.0], [1392.0, 2793.0], [672.0, 2785.0]],
        ('08/04/202410:48', 0.9735637307167053)
    ],
    [[[203.0, 2763.0], [526.0, 2772.0], [524.0, 2859.0], [201.0, 2850.0]], ('Comercio', 0.9832934737205505)],
    [[[674.0, 2792.0], [1116.0, 2804.0], [1113.0, 2895.0], [672.0, 2883.0]], ('249060518', 0.9831523895263672)],
    [[[209.0, 2858.0], [345.0, 2858.0], [345.0, 2942.0], [209.0, 2942.0]], ('ARC', 0.995657742023468)],
    [[[669.0, 2898.0], [838.0, 2898.0], [838.0, 2982.0], [669.0, 2982.0]], (': 00', 0.9109969139099121)],
    [[[207.0, 2948.0], [336.0, 2937.0], [343.0, 3024.0], [214.0, 3034.0]], ('AID', 0.9893055558204651)],
    [
        [[671.0, 2988.0], [1317.0, 3004.0], [1315.0, 3092.0], [668.0, 3076.0]],
        ('A0000000041010', 0.9860761761665344)
    ],
    [[[219.0, 3033.0], [557.0, 3072.0], [548.0, 3159.0], [209.0, 3121.0]], ('App Label', 0.9252751469612122)],
    [
        [[664.0, 3076.0], [1397.0, 3095.0], [1394.0, 3194.0], [661.0, 3174.0]],
        (':DEBIT MASTERCARD', 0.9694098234176636)
    ],
    [
        [[210.0, 3121.0], [1006.0, 3177.0], [998.0, 3290.0], [203.0, 3234.0]],
        ('Autorizacicn:PE30RO', 0.8969648480415344)
    ],
    [[[220.0, 3222.0], [489.0, 3262.0], [476.0, 3353.0], [206.0, 3313.0]], ('Tarjeta', 0.9932218790054321)],
    [
        [[676.0, 3273.0], [1384.0, 3261.0], [1385.0, 3349.0], [677.0, 3361.0]],
        ('************408', 0.8897014260292053)
    ],
    [[[219.0, 3320.0], [493.0, 3352.0], [483.0, 3443.0], [209.0, 3411.0]], ('Importe', 0.9463316202163696)],
    [[[678.0, 3352.0], [1077.0, 3365.0], [1074.0, 3455.0], [675.0, 3443.0]], ('8,05EUR', 0.9592725038528442)]
]

In [None]:
receipt_texts_4 = [
    'S.A.',
    'MERCADONA.',
    'A-46103834',
    'AVDA. PINTOR FELO MONZON (C.C. 7 PALMAS)',
    'S/N',
    '35019 LAS PALMAS DE GRAN CANARIA',
    '928411755',
    'TELEFONO:',
    '03/04/202421:220P:144041',
    'FACTURA SIMPLIFICADA:2185-013-6970Z2',
    'Imp.)',
    'P.Unit',
    'Descripción',
    '3,30',
    '1 DETERG HIPO COLONIA',
    '13,50',
    '4,50',
    '3 SOLOMILLO POLLO CONG',
    '2,32',
    '1 JAMONCITO BARBACOA',
    '2,76',
    '1 JAMONCITO BARBACOA',
    '2,00',
    '1 NUEZ NATURAL',
    '1,25',
    '2,50',
    '2 QUESU COTIAGE',
    '6,52',
    '1 POLLO ENTERO LIMPIO',
    '1,70',
    '1 PAPEL VEGETAL 30H',
    '1.30',
    '1 BEBIDA AVELLANAS',
    '1,05',
    '1 INFUSION DORMIR',
    '1,40',
    '1 LECHE DE COCO',
    '1,35',
    '1 QUESO UNTAR LIGHT',
    '1 RULITO CABRA',
    '2,45',
    '1 GRIEGO LIGERO',
    '1,65',
    '1 BOLSA PLASTICO',
    '0,15',
    'TOTAL @)',
    '43,95',
    'TARJETA BANCARIA',
    '43,95',
    'COMERCIANTE MINORISTA',
    'TARJBANCARIA',
    '******915',
    'N.C072850332',
    'AUT:1LPOXG',
    'AIDA0000000041010',
    'ARC:3030',
    ')',
    'Importe43,95',
    'DEBIT MASTERCARD'
]

receipt_boxes_4 = [
    [[[967.0, 114.0], [1240.0, 106.0], [1242.0, 175.0], [969.0, 183.0]], ('S.A.', 0.9014008045196533)],
    [[[234.0, 161.0], [984.0, 113.0], [989.0, 182.0], [238.0, 229.0]], ('MERCADONA.', 0.9483870267868042)],
    [[[531.0, 225.0], [884.0, 204.0], [888.0, 273.0], [535.0, 294.0]], ('A-46103834', 0.991621196269989)],
    [
        [[80.0, 411.0], [1428.0, 341.0], [1432.0, 410.0], [83.0, 480.0]],
        ('AVDA. PINTOR FELO MONZON (C.C. 7 PALMAS)', 0.9473438262939453)
    ],
    [[[671.0, 456.0], [798.0, 456.0], [798.0, 532.0], [671.0, 532.0]], ('S/N', 0.9671873450279236)],
    [
        [[211.0, 560.0], [1297.0, 516.0], [1300.0, 585.0], [214.0, 628.0]],
        ('35019 LAS PALMAS DE GRAN CANARIA', 0.9507570266723633)
    ],
    [[[836.0, 616.0], [1167.0, 608.0], [1169.0, 677.0], [838.0, 685.0]], ('928411755', 0.9976734519004822)],
    [[[346.0, 640.0], [862.0, 615.0], [865.0, 680.0], [349.0, 705.0]], ('TELEFONO:', 0.9368199110031128)],
    [
        [[277.0, 723.0], [1201.0, 688.0], [1204.0, 756.0], [279.0, 792.0]],
        ('03/04/202421:220P:144041', 0.9747995734214783)
    ],
    [
        [[119.0, 814.0], [1379.0, 764.0], [1381.0, 833.0], [121.0, 883.0]],
        ('FACTURA SIMPLIFICADA:2185-013-6970Z2', 0.9677932262420654)
    ],
    [[[1241.0, 1298.0], [1502.0, 1288.0], [1505.0, 1368.0], [1244.0, 1378.0]], ('Imp.)', 0.9242268800735474)],
    [[[928.0, 1310.0], [1192.0, 1296.0], [1197.0, 1375.0], [932.0, 1389.0]], ('P.Unit', 0.8853976726531982)],
    [[[227.0, 1335.0], [608.0, 1323.0], [610.0, 1392.0], [229.0, 1404.0]], ('Descripción', 0.9011156558990479)],
    [[[1346.0, 1373.0], [1515.0, 1373.0], [1515.0, 1457.0], [1346.0, 1457.0]], ('3,30', 0.977954089641571)],
    [
        [[165.0, 1415.0], [878.0, 1399.0], [880.0, 1468.0], [167.0, 1484.0]],
        ('1 DETERG HIPO COLONIA', 0.9477400183677673)
    ],
    [[[1329.0, 1462.0], [1508.0, 1452.0], [1513.0, 1535.0], [1334.0, 1545.0]], ('13,50', 0.9961566925048828)],
    [[[1035.0, 1474.0], [1193.0, 1458.0], [1202.0, 1542.0], [1043.0, 1558.0]], ('4,50', 0.9753559231758118)],
    [
        [[165.0, 1499.0], [917.0, 1483.0], [918.0, 1551.0], [167.0, 1567.0]],
        ('3 SOLOMILLO POLLO CONG', 0.9427966475486755)
    ],
    [[[1353.0, 1540.0], [1515.0, 1540.0], [1515.0, 1624.0], [1353.0, 1624.0]], ('2,32', 0.9112967252731323)],
    [
        [[173.0, 1586.0], [847.0, 1566.0], [849.0, 1635.0], [175.0, 1655.0]],
        ('1 JAMONCITO BARBACOA', 0.9652339816093445)
    ],
    [[[1361.0, 1632.0], [1515.0, 1632.0], [1515.0, 1704.0], [1361.0, 1704.0]], ('2,76', 0.9717540740966797)],
    [
        [[181.0, 1666.0], [847.0, 1650.0], [849.0, 1719.0], [182.0, 1735.0]],
        ('1 JAMONCITO BARBACOA', 0.9593173861503601)
    ],
    [[[1352.0, 1709.0], [1515.0, 1698.0], [1521.0, 1782.0], [1357.0, 1793.0]], ('2,00', 0.9682219624519348)],
    [[[173.0, 1742.0], [643.0, 1734.0], [644.0, 1810.0], [174.0, 1818.0]], ('1 NUEZ NATURAL', 0.976260781288147)],
    [[[1056.0, 1799.0], [1211.0, 1799.0], [1211.0, 1886.0], [1056.0, 1886.0]], ('1,25', 0.9884630441665649)],
    [[[1355.0, 1794.0], [1517.0, 1779.0], [1525.0, 1866.0], [1363.0, 1881.0]], ('2,50', 0.9717379808425903)],
    [
        [[173.0, 1830.0], [685.0, 1817.0], [687.0, 1886.0], [175.0, 1898.0]],
        ('2 QUESU COTIAGE', 0.9140510559082031)
    ],
    [[[1365.0, 1875.0], [1527.0, 1875.0], [1527.0, 1959.0], [1365.0, 1959.0]], ('6,52', 0.891019344329834)],
    [
        [[185.0, 1909.0], [894.0, 1897.0], [895.0, 1966.0], [186.0, 1978.0]],
        ('1 POLLO ENTERO LIMPIO', 0.9314266443252563)
    ],
    [[[1373.0, 1959.0], [1535.0, 1959.0], [1535.0, 2042.0], [1373.0, 2042.0]], ('1,70', 0.987417995929718)],
    [
        [[181.0, 1989.0], [832.0, 1977.0], [833.0, 2053.0], [182.0, 2066.0]],
        ('1 PAPEL VEGETAL 30H', 0.9128144979476929)
    ],
    [[[1377.0, 2046.0], [1535.0, 2046.0], [1535.0, 2130.0], [1377.0, 2130.0]], ('1.30', 0.9163311123847961)],
    [
        [[189.0, 2065.0], [798.0, 2061.0], [798.0, 2137.0], [189.0, 2141.0]],
        ('1 BEBIDA AVELLANAS', 0.9666225910186768)
    ],
    [[[1380.0, 2130.0], [1538.0, 2130.0], [1538.0, 2213.0], [1380.0, 2213.0]], ('1,05', 0.9747074842453003)],
    [
        [[189.0, 2153.0], [763.0, 2149.0], [764.0, 2217.0], [189.0, 2221.0]],
        ('1 INFUSION DORMIR', 0.9575845003128052)
    ],
    [[[1384.0, 2221.0], [1538.0, 2221.0], [1538.0, 2293.0], [1384.0, 2293.0]], ('1,40', 0.9857410192489624)],
    [
        [[188.0, 2233.0], [697.0, 2224.0], [698.0, 2301.0], [190.0, 2309.0]],
        ('1 LECHE DE COCO', 0.936829149723053)
    ],
    [[[1384.0, 2305.0], [1538.0, 2305.0], [1538.0, 2377.0], [1384.0, 2377.0]], ('1,35', 0.9695622324943542)],
    [
        [[192.0, 2320.0], [832.0, 2308.0], [833.0, 2373.0], [194.0, 2385.0]],
        ('1 QUESO UNTAR LIGHT', 0.9217395782470703)
    ],
    [[[196.0, 2396.0], [670.0, 2388.0], [671.0, 2457.0], [197.0, 2465.0]], ('1 RULITO CABRA', 0.924544632434845)],
    [[[1384.0, 2392.0], [1542.0, 2392.0], [1542.0, 2464.0], [1384.0, 2464.0]], ('2,45', 0.9538676142692566)],
    [
        [[192.0, 2472.0], [709.0, 2464.0], [710.0, 2544.0], [194.0, 2552.0]],
        ('1 GRIEGO LIGERO', 0.9372926354408264)
    ],
    [[[1388.0, 2472.0], [1546.0, 2472.0], [1546.0, 2556.0], [1388.0, 2556.0]], ('1,65', 0.9863594770431519)],
    [
        [[208.0, 2556.0], [740.0, 2556.0], [740.0, 2620.0], [208.0, 2620.0]],
        ('1 BOLSA PLASTICO', 0.9462106227874756)
    ],
    [[[1384.0, 2556.0], [1554.0, 2556.0], [1554.0, 2639.0], [1384.0, 2639.0]], ('0,15', 0.9750890135765076)],
    [[[806.0, 2712.0], [1122.0, 2712.0], [1122.0, 2849.0], [806.0, 2849.0]], ('TOTAL @)', 0.8402095437049866)],
    [[[1357.0, 2723.0], [1558.0, 2723.0], [1558.0, 2868.0], [1357.0, 2868.0]], ('43,95', 0.9814945459365845)],
    [
        [[588.0, 2832.0], [1131.0, 2846.0], [1127.0, 2975.0], [585.0, 2962.0]],
        ('TARJETA BANCARIA', 0.9548004865646362)
    ],
    [[[1361.0, 2852.0], [1562.0, 2852.0], [1562.0, 2993.0], [1361.0, 2993.0]], ('43,95', 0.97918301820755)],
    [
        [[464.0, 3030.0], [1161.0, 3055.0], [1159.0, 3119.0], [462.0, 3095.0]],
        ('COMERCIANTE MINORISTA', 0.9861618876457214)
    ],
    [[[148.0, 3163.0], [618.0, 3180.0], [615.0, 3256.0], [146.0, 3240.0]], ('TARJBANCARIA', 0.9844738841056824)],
    [[[645.0, 3190.0], [1296.0, 3207.0], [1295.0, 3271.0], [643.0, 3255.0]], ('******915', 0.6731351613998413)],
    [[[144.0, 3240.0], [606.0, 3252.0], [604.0, 3332.0], [142.0, 3319.0]], ('N.C072850332', 0.9555943012237549)],
    [
        [[1160.0, 3262.0], [1551.0, 3280.0], [1547.0, 3371.0], [1155.0, 3353.0]],
        ('AUT:1LPOXG', 0.8813847303390503)
    ],
    [
        [[144.0, 3308.0], [776.0, 3325.0], [773.0, 3416.0], [142.0, 3399.0]],
        ('AIDA0000000041010', 0.9720386266708374)
    ],
    [[[1233.0, 3345.0], [1548.0, 3363.0], [1543.0, 3447.0], [1229.0, 3429.0]], ('ARC:3030', 0.9701257944107056)],
    [[[794.0, 3427.0], [906.0, 3427.0], [906.0, 3533.0], [794.0, 3533.0]], (')', 0.6959380507469177)],
    [[[161.0, 3499.0], [681.0, 3487.0], [683.0, 3567.0], [163.0, 3579.0]], ('Importe43,95', 0.9383442401885986)],
    [
        [[994.0, 3517.0], [1555.0, 3542.0], [1551.0, 3633.0], [990.0, 3608.0]],
        ('DEBIT MASTERCARD', 0.9626994132995605)
    ]
]

In [None]:
receipt_texts_5 = [
    'MERCADONA,S.A.',
    'A-46103834',
    'C/ REPUBLICA DOMINICANA S/N',
    '35010 LAS PALMAS DE GRAN CANARIA',
    '928226288',
    'TELEFONO:',
    '06/04/202419:560P:255158',
    'FACTURA SIMPLIFICADA:2109-017-467040',
    'P.Unit',
    'Imp.€',
    'Descripción',
    'A',
    '5,40',
    '1 CERVEZA NEGRA P-6',
    '0,10',
    '1BOLSA PAPEL',
    '0.95',
    '1 CHICLES MENTA FUERTE',
    '1,05',
    '1HUMMUS CLASICO',
    '1 KEFIRFRESA-PLATANO',
    '0,90',
    '1 ROLLON INVIS.HOMBRE',
    '0,85',
    '1 CHIA',
    '1,50',
    '1 NACHOS TEX MEX',
    '0,95',
    'TOTAL(@',
    '11,70',
    'TARJETA BANCARIA',
    '11,70',
    'COMERCIANTE MINORISTA',
    '************4008',
    'TARJ,BANCARIA:',
    'AUT:F6RYLE',
    'N.C072885346',
    'AIDA0000000041010',
    'ARC:3030',
    '1)',
    'MASTERCARD',
    'DEBIT MASTERCARD',
    'Importe: 11,70'
]

receipt_boxes_5 = [
    [[[234.0, 271.0], [1404.0, 221.0], [1409.0, 320.0], [239.0, 370.0]], ('MERCADONA,S.A.', 0.9581691026687622)],
    [[[583.0, 358.0], [1001.0, 342.0], [1004.0, 423.0], [586.0, 439.0]], ('A-46103834', 0.947623074054718)],
    [
        [[276.0, 559.0], [1338.0, 528.0], [1341.0, 612.0], [279.0, 643.0]],
        ('C/ REPUBLICA DOMINICANA S/N', 0.9636136293411255)
    ],
    [
        [[205.0, 657.0], [1451.0, 623.0], [1453.0, 707.0], [207.0, 741.0]],
        ('35010 LAS PALMAS DE GRAN CANARIA', 0.9480319619178772)
    ],
    [[[939.0, 737.0], [1294.0, 729.0], [1296.0, 806.0], [941.0, 814.0]], ('928226288', 0.9969536662101746)],
    [[[362.0, 756.0], [705.0, 743.0], [708.0, 820.0], [365.0, 833.0]], ('TELEFONO:', 0.9830234050750732)],
    [
        [[280.0, 854.0], [1335.0, 827.0], [1337.0, 900.0], [282.0, 927.0]],
        ('06/04/202419:560P:255158', 0.9553447365760803)
    ],
    [
        [[101.0, 953.0], [1522.0, 922.0], [1524.0, 999.0], [102.0, 1029.0]],
        ('FACTURA SIMPLIFICADA:2109-017-467040', 0.9649688005447388)
    ],
    [[[1011.0, 1525.0], [1303.0, 1525.0], [1303.0, 1601.0], [1011.0, 1601.0]], ('P.Unit', 0.8359570503234863)],
    [[[1344.0, 1521.0], [1613.0, 1521.0], [1613.0, 1598.0], [1344.0, 1598.0]], ('Imp.€', 0.9332208633422852)],
    [[[231.0, 1533.0], [646.0, 1524.0], [648.0, 1612.0], [233.0, 1620.0]], ('Descripción', 0.9538726210594177)],
    [[[1276.0, 1539.0], [1340.0, 1539.0], [1340.0, 1580.0], [1276.0, 1580.0]], ('A', 0.7331032156944275)],
    [[[1456.0, 1605.0], [1625.0, 1605.0], [1625.0, 1689.0], [1456.0, 1689.0]], ('5,40', 0.9552683234214783)],
    [
        [[164.0, 1624.0], [868.0, 1612.0], [869.0, 1689.0], [166.0, 1700.0]],
        ('1 CERVEZA NEGRA P-6', 0.9312987327575684)
    ],
    [[[1456.0, 1696.0], [1625.0, 1696.0], [1625.0, 1780.0], [1456.0, 1780.0]], ('0,10', 0.9803319573402405)],
    [[[168.0, 1719.0], [649.0, 1703.0], [652.0, 1779.0], [170.0, 1796.0]], ('1BOLSA PAPEL', 0.949714720249176)],
    [[[1456.0, 1787.0], [1625.0, 1787.0], [1625.0, 1871.0], [1456.0, 1871.0]], ('0.95', 0.910907506942749)],
    [
        [[172.0, 1799.0], [977.0, 1791.0], [977.0, 1868.0], [173.0, 1875.0]],
        ('1 CHICLES MENTA FUERTE', 0.9475932717323303)
    ],
    [[[1460.0, 1882.0], [1625.0, 1882.0], [1625.0, 1963.0], [1460.0, 1963.0]], ('1,05', 0.9701275825500488)],
    [
        [[176.0, 1893.0], [759.0, 1886.0], [760.0, 1948.0], [176.0, 1956.0]],
        ('1HUMMUS CLASICO', 0.9690414071083069)
    ],
    [
        [[180.0, 1974.0], [943.0, 1974.0], [943.0, 2046.0], [180.0, 2046.0]],
        ('1 KEFIRFRESA-PLATANO', 0.9499281048774719)
    ],
    [[[1456.0, 1974.0], [1625.0, 1974.0], [1625.0, 2054.0], [1456.0, 2054.0]], ('0,90', 0.9549869298934937)],
    [
        [[180.0, 2061.0], [940.0, 2061.0], [940.0, 2134.0], [180.0, 2134.0]],
        ('1 ROLLON INVIS.HOMBRE', 0.9483566284179688)
    ],
    [[[1456.0, 2065.0], [1625.0, 2065.0], [1625.0, 2145.0], [1456.0, 2145.0]], ('0,85', 0.8961768746376038)],
    [[[180.0, 2141.0], [408.0, 2141.0], [408.0, 2218.0], [180.0, 2218.0]], ('1 CHIA', 0.997718870639801)],
    [[[1464.0, 2152.0], [1628.0, 2152.0], [1628.0, 2236.0], [1464.0, 2236.0]], ('1,50', 0.9812042713165283)],
    [
        [[183.0, 2233.0], [760.0, 2233.0], [760.0, 2305.0], [183.0, 2305.0]],
        ('1 NACHOS TEX MEX', 0.9800795316696167)
    ],
    [[[1460.0, 2247.0], [1625.0, 2247.0], [1625.0, 2327.0], [1460.0, 2327.0]], ('0,95', 0.9298561811447144)],
    [[[830.0, 2399.0], [1150.0, 2409.0], [1146.0, 2544.0], [826.0, 2534.0]], ('TOTAL(@', 0.8294388651847839)],
    [[[1417.0, 2413.0], [1634.0, 2424.0], [1626.0, 2570.0], [1409.0, 2559.0]], ('11,70', 0.9901212453842163)],
    [
        [[585.0, 2538.0], [1161.0, 2547.0], [1159.0, 2675.0], [583.0, 2666.0]],
        ('TARJETA BANCARIA', 0.9704402685165405)
    ],
    [[[1417.0, 2550.0], [1631.0, 2567.0], [1619.0, 2714.0], [1404.0, 2697.0]], ('11,70', 0.9377082586288452)],
    [
        [[446.0, 2746.0], [1195.0, 2758.0], [1193.0, 2831.0], [445.0, 2819.0]],
        ('COMERCIANTE MINORISTA', 0.9812877178192139)
    ],
    [
        [[634.0, 2907.0], [1375.0, 2930.0], [1372.0, 3003.0], [632.0, 2980.0]],
        ('************4008', 0.8021823167800903)
    ],
    [
        [[100.0, 2926.0], [639.0, 2910.0], [641.0, 2983.0], [103.0, 2999.0]],
        ('TARJ,BANCARIA:', 0.9236226677894592)
    ],
    [
        [[1187.0, 2997.0], [1600.0, 3030.0], [1594.0, 3106.0], [1181.0, 3073.0]],
        ('AUT:F6RYLE', 0.9622871279716492)
    ],
    [[[96.0, 3017.0], [593.0, 2998.0], [596.0, 3064.0], [99.0, 3083.0]], ('N.C072885346', 0.9876346588134766)],
    [
        [[97.0, 3090.0], [773.0, 3071.0], [776.0, 3147.0], [99.0, 3167.0]],
        ('AIDA0000000041010', 0.9767710566520691)
    ],
    [[[1247.0, 3085.0], [1596.0, 3110.0], [1591.0, 3186.0], [1241.0, 3161.0]], ('ARC:3030', 0.9223328828811646)],
    [[[790.0, 3163.0], [913.0, 3163.0], [913.0, 3276.0], [790.0, 3276.0]], ('1)', 0.5860849618911743)],
    [[[104.0, 3277.0], [459.0, 3264.0], [461.0, 3337.0], [107.0, 3350.0]], ('MASTERCARD', 0.9970512390136719)],
    [
        [[1004.0, 3332.0], [1586.0, 3376.0], [1579.0, 3464.0], [997.0, 3420.0]],
        ('DEBIT MASTERCARD', 0.9483160972595215)
    ],
    [
        [[112.0, 3360.0], [669.0, 3348.0], [671.0, 3421.0], [113.0, 3433.0]],
        ('Importe: 11,70', 0.9313837885856628)
    ]
]

In [None]:
receipt_texts_6 = [
    'HiperDino',
    'Los melorns precios de Canarlas',
    'DINOSOLSUPERMERCADOS,S.L',
    'C.I.F.B61742565',
    '9033-SD MESA Y L0PEZ',
    'Te1éfono:928222758',
    'Docunento',
    'Fecha',
    'Hora',
    'Centro Yend...',
    '90337286502024/903314-0002705130/03/202419:34',
    '-IMPORTE',
    'ARTICULO',
    '1,99',
    'COCA-COLA REFRESCO COLA PET 2 L',
    '48',
    '0,15',
    'BOLSA REUTILIZABLE 85% RECICLADA',
    '2,45',
    'PRESIDENT NATA FRESCA CREMOSA 2OCL',
    'TROPICAL CERVEZA PILSEN LATA 33 CL',
    '4,14',
    '60.69€',
    'ANOJO NACIO.TAPA/ESPAL/BABILLA FILET',
    '7,00',
    '0,586x14,75€/kgDto.1,64€',
    'LIMAS, EL KILO',
    '0,42',
    '0,070x5,95@/kg',
    'Total Articulos: 11',
    '16,15',
    'TOTALCOMPRA:',
    'Detalle de pagos',
    '0,00',
    'EFECTIVO',
    '16,15',
    'DATAFONO',
    'i GRACIAS POR SU VISITA !',
    'Comerciante minoristaFACTURA SIMPLIFICADA',
    'PROMOCIONES.APLICADAS',
    'Promociones de_precio',
    '1,64',
    'TOTAL PROMOCIONES',
    '1,64'
]

receipt_boxes_6 = [
    [[[639.0, 394.0], [1655.0, 427.0], [1650.0, 565.0], [635.0, 532.0]], ('HiperDino', 0.9948645830154419)],
    [
        [[917.0, 549.0], [1647.0, 574.0], [1645.0, 632.0], [915.0, 607.0]],
        ('Los melorns precios de Canarlas', 0.8174239993095398)
    ],
    [
        [[367.0, 692.0], [1374.0, 688.0], [1375.0, 756.0], [368.0, 759.0]],
        ('DINOSOLSUPERMERCADOS,S.L', 0.9675049185752869)
    ],
    [[[558.0, 770.0], [1215.0, 770.0], [1215.0, 827.0], [558.0, 827.0]], ('C.I.F.B61742565', 0.9311434030532837)],
    [
        [[517.0, 844.0], [1289.0, 840.0], [1290.0, 908.0], [517.0, 911.0]],
        ('9033-SD MESA Y L0PEZ', 0.9236869812011719)
    ],
    [
        [[524.0, 921.0], [1255.0, 921.0], [1255.0, 989.0], [524.0, 989.0]],
        ('Te1éfono:928222758', 0.9477601647377014)
    ],
    [[[500.0, 1077.0], [772.0, 1077.0], [772.0, 1134.0], [500.0, 1134.0]], ('Docunento', 0.9594316482543945)],
    [[[1123.0, 1070.0], [1279.0, 1070.0], [1279.0, 1134.0], [1123.0, 1134.0]], ('Fecha', 0.9928757548332214)],
    [[[1432.0, 1070.0], [1565.0, 1070.0], [1565.0, 1134.0], [1432.0, 1134.0]], ('Hora', 0.9933515191078186)],
    [[[85.0, 1087.0], [503.0, 1076.0], [504.0, 1134.0], [86.0, 1145.0]], ('Centro Yend...', 0.9016883969306946)],
    [
        [[85.0, 1161.0], [1588.0, 1144.0], [1589.0, 1215.0], [86.0, 1232.0]],
        ('90337286502024/903314-0002705130/03/202419:34', 0.9911031723022461)
    ],
    [[[1447.0, 1292.0], [1726.0, 1300.0], [1724.0, 1371.0], [1445.0, 1363.0]], ('-IMPORTE', 0.9229931235313416)],
    [[[98.0, 1320.0], [397.0, 1312.0], [399.0, 1380.0], [100.0, 1388.0]], ('ARTICULO', 0.9966269135475159)],
    [[[1568.0, 1370.0], [1725.0, 1370.0], [1725.0, 1444.0], [1568.0, 1444.0]], ('1,99', 0.9750573635101318)],
    [
        [[102.0, 1394.0], [1238.0, 1380.0], [1239.0, 1441.0], [103.0, 1455.0]],
        ('COCA-COLA REFRESCO COLA PET 2 L', 0.9323121905326843)
    ],
    [[[1351.0, 1455.0], [1432.0, 1455.0], [1432.0, 1509.0], [1351.0, 1509.0]], ('48', 0.9972485899925232)],
    [[[1568.0, 1448.0], [1722.0, 1448.0], [1722.0, 1512.0], [1568.0, 1512.0]], ('0,15', 0.9393821954727173)],
    [
        [[98.0, 1468.0], [1289.0, 1454.0], [1290.0, 1512.0], [99.0, 1526.0]],
        ('BOLSA REUTILIZABLE 85% RECICLADA', 0.9519724249839783)
    ],
    [[[1560.0, 1517.0], [1722.0, 1507.0], [1726.0, 1585.0], [1565.0, 1594.0]], ('2,45', 0.9626799821853638)],
    [
        [[98.0, 1536.0], [1340.0, 1522.0], [1341.0, 1589.0], [99.0, 1603.0]],
        ('PRESIDENT NATA FRESCA CREMOSA 2OCL', 0.944674015045166)
    ],
    [
        [[98.0, 1610.0], [1350.0, 1596.0], [1351.0, 1660.0], [99.0, 1674.0]],
        ('TROPICAL CERVEZA PILSEN LATA 33 CL', 0.951836884021759)
    ],
    [[[1565.0, 1657.0], [1725.0, 1657.0], [1725.0, 1731.0], [1565.0, 1731.0]], ('4,14', 0.9805392026901245)],
    [[[279.0, 1684.0], [657.0, 1684.0], [657.0, 1742.0], [279.0, 1742.0]], ('60.69€', 0.8960645794868469)],
    [
        [[102.0, 1752.0], [1422.0, 1734.0], [1422.0, 1802.0], [103.0, 1819.0]],
        ('ANOJO NACIO.TAPA/ESPAL/BABILLA FILET', 0.9532400965690613)
    ],
    [[[1565.0, 1799.0], [1722.0, 1799.0], [1722.0, 1876.0], [1565.0, 1876.0]], ('7,00', 0.9879884719848633)],
    [
        [[129.0, 1816.0], [1354.0, 1805.0], [1354.0, 1886.0], [130.0, 1897.0]],
        ('0,586x14,75€/kgDto.1,64€', 0.9004581570625305)
    ],
    [
        [[102.0, 1893.0], [633.0, 1893.0], [633.0, 1961.0], [102.0, 1961.0]],
        ('LIMAS, EL KILO', 0.9331079125404358)
    ],
    [[[1565.0, 1941.0], [1722.0, 1941.0], [1722.0, 2015.0], [1565.0, 2015.0]], ('0,42', 0.937898576259613)],
    [
        [[139.0, 1964.0], [772.0, 1964.0], [772.0, 2032.0], [139.0, 2032.0]],
        ('0,070x5,95@/kg', 0.9399409294128418)
    ],
    [
        [[109.0, 2103.0], [806.0, 2099.0], [806.0, 2167.0], [109.0, 2170.0]],
        ('Total Articulos: 11', 0.9266980886459351)
    ],
    [[[1512.0, 2215.0], [1720.0, 2205.0], [1727.0, 2344.0], [1519.0, 2354.0]], ('16,15', 0.9660555720329285)],
    [
        [[806.0, 2245.0], [1285.0, 2237.0], [1287.0, 2355.0], [808.0, 2363.0]],
        ('TOTALCOMPRA:', 0.9809215068817139)
    ],
    [
        [[109.0, 2444.0], [704.0, 2440.0], [704.0, 2511.0], [109.0, 2515.0]],
        ('Detalle de pagos', 0.9589704275131226)
    ],
    [[[1546.0, 2482.0], [1707.0, 2469.0], [1713.0, 2543.0], [1553.0, 2557.0]], ('0,00', 0.9358032941818237)],
    [[[990.0, 2504.0], [1283.0, 2504.0], [1283.0, 2565.0], [990.0, 2565.0]], ('EFECTIVO', 0.9952208399772644)],
    [[[1516.0, 2553.0], [1704.0, 2540.0], [1710.0, 2614.0], [1521.0, 2627.0]], ('16,15', 0.9707363247871399)],
    [[[979.0, 2573.0], [1281.0, 2561.0], [1284.0, 2632.0], [982.0, 2643.0]], ('DATAFONO', 0.9956679344177246)],
    [
        [[452.0, 2693.0], [1347.0, 2683.0], [1348.0, 2740.0], [453.0, 2751.0]],
        ('i GRACIAS POR SU VISITA !', 0.9499263763427734)
    ],
    [
        [[289.0, 2747.0], [1500.0, 2733.0], [1501.0, 2801.0], [290.0, 2815.0]],
        ('Comerciante minoristaFACTURA SIMPLIFICADA', 0.9507773518562317)
    ],
    [
        [[149.0, 2859.0], [887.0, 2848.0], [888.0, 2909.0], [150.0, 2920.0]],
        ('PROMOCIONES.APLICADAS', 0.9757267236709595)
    ],
    [
        [[150.0, 2909.0], [891.0, 2906.0], [892.0, 2963.0], [150.0, 2967.0]],
        ('Promociones de_precio', 0.9617913365364075)
    ],
    [[[1523.0, 2904.0], [1667.0, 2891.0], [1672.0, 2955.0], [1528.0, 2968.0]], ('1,64', 0.9159317016601562)],
    [
        [[646.0, 2945.0], [1243.0, 2975.0], [1239.0, 3069.0], [642.0, 3039.0]],
        ('TOTAL PROMOCIONES', 0.9657850861549377)
    ],
    [[[1507.0, 2956.0], [1674.0, 2956.0], [1674.0, 3081.0], [1507.0, 3081.0]], ('1,64', 0.9385150671005249)]
]

### DocOCR

In [None]:
docOCR = {
    'pages': [{'page_idx': 0, 'dimensions': (3107, 1769), 'orientation': {'value': None, 'confidence': None}, 'language': {'value': None, 'confidence': None}, 'blocks': [{'geometry': ((0.3250501254239684, 0.052734375), (0.527443117580554, 0.0791015625)), 'lines': [{'geometry': ((0.3250501254239684, 0.052734375), (0.527443117580554, 0.0791015625)), 'words': [{'value': 'SPAR', 'confidence': 0.995099663734436, 'geometry': ((0.3250501254239684, 0.052734375), (0.40394908846806105, 0.076171875))}, {'value': 'TAFIRA', 'confidence': 0.5254078507423401, 'geometry': ((0.41595545240955345, 0.0537109375), (0.527443117580554, 0.0791015625))}]}], 'artefacts': []}, {'geometry': ((0.17582817357970604, 0.0712890625), (0.6852410436687394, 0.109375)), 'lines': [{'geometry': ((0.17582817357970604, 0.0712890625), (0.6852410436687394, 0.109375)), 'words': [{'value': 'C/.', 'confidence': 0.9864653944969177, 'geometry': ((0.17582817357970604, 0.0712890625), (0.23757518813595252, 0.1005859375))}, {'value': 'BRUNO', 'confidence': 0.9941843152046204, 'geometry': ((0.24958155207744487, 0.07421875), (0.3490628533069531, 0.099609375))}, {'value': 'NARANJO', 'confidence': 0.9896984100341797, 'geometry': ((0.3644996069460147, 0.078125), (0.4897088309072923, 0.1005859375))}, {'value': 'DIAZ9A-9B', 'confidence': 0.5217687487602234, 'geometry': ((0.5, 0.0771484375), (0.6852410436687394, 0.109375))}]}], 'artefacts': []}, {'geometry': ((0.11579635387224418, 0.0986328125), (0.19812570661390616, 0.126953125)), 'lines': [{'geometry': ((0.11579635387224418, 0.0986328125), (0.19812570661390616, 0.126953125)), 'words': [{'value': 'TLF.:', 'confidence': 0.9675514698028564, 'geometry': ((0.11579635387224418, 0.0986328125), (0.19812570661390616, 0.126953125))}]}], 'artefacts': []}, {'geometry': ((0.31990454087761444, 0.1083984375), (0.3301957099703222, 0.1142578125)), 'lines': [{'geometry': ((0.31990454087761444, 0.1083984375), (0.3301957099703222, 0.1142578125)), 'words': [{'value': '1', 'confidence': 0.5195997357368469, 'geometry': ((0.31990454087761444, 0.1083984375), (0.3301957099703222, 0.1142578125))}]}], 'artefacts': []}, {'geometry': ((0.43996818029253815, 0.1015625), (0.7435576685274166, 0.1259765625)), 'lines': [{'geometry': ((0.43996818029253815, 0.1015625), (0.7435576685274166, 0.1259765625)), 'words': [{'value': '-', 'confidence': 0.9134853482246399, 'geometry': ((0.43996818029253815, 0.11328125), (0.4519745442340305, 0.119140625))}, {'value': 'FAX:', 'confidence': 0.9982134699821472, 'geometry': ((0.4691264927218768, 0.1015625), (0.541164676370831, 0.1259765625))}, {'value': '328', 'confidence': 0.5403745174407959, 'geometry': ((0.5651774042538157, 0.1064453125), (0.6114876651710006, 0.123046875))}, {'value': '5', 'confidence': 0.5067386031150818, 'geometry': ((0.6423611724491238, 0.1123046875), (0.6715194848784625, 0.125))}, {'value': 'JU4', 'confidence': 0.44694796204566956, 'geometry': ((0.7041081870053703, 0.1103515625), (0.7435576685274166, 0.1220703125))}]}], 'artefacts': []}, {'geometry': ((0.3061829820873375, 0.1240234375), (0.5617470145562464, 0.2001953125)), 'lines': [{'geometry': ((0.3061829820873375, 0.1240234375), (0.5617470145562464, 0.150390625)), 'words': [{'value': 'NIF:', 'confidence': 0.9995003938674927, 'geometry': ((0.3061829820873375, 0.1240234375), (0.3799363605850763, 0.1484375))}, {'value': 'B02868248', 'confidence': 0.830794095993042, 'geometry': ((0.3970883090729226, 0.1259765625), (0.5617470145562464, 0.150390625))}]}, {'geometry': ((0.3250501254239684, 0.173828125), (0.5257279227317694, 0.2001953125)), 'words': [{'value': 'TAFIRA', 'confidence': 0.9992335438728333, 'geometry': ((0.3250501254239684, 0.173828125), (0.4382529854437535, 0.19921875))}, {'value': 'BAJA', 'confidence': 0.9987276792526245, 'geometry': ((0.4502593493852459, 0.177734375), (0.5257279227317694, 0.2001953125))}]}], 'artefacts': []}, {'geometry': ((0.1586762250918598, 0.146484375), (0.700677797307801, 0.177734375)), 'lines': [{'geometry': ((0.1586762250918598, 0.146484375), (0.700677797307801, 0.177734375)), 'words': [{'value': 'SUPERMERCADOS', 'confidence': 0.9978907704353333, 'geometry': ((0.1586762250918598, 0.146484375), (0.40394908846806105, 0.173828125))}, {'value': 'DABEL', 'confidence': 0.6027066111564636, 'geometry': ((0.41595545240955345, 0.15234375), (0.5068607793951385, 0.1748046875))}, {'value': '2021,', 'confidence': 0.9469994902610779, 'geometry': ((0.5222975330342001, 0.15234375), (0.6114876651710006, 0.17578125))}, {'value': 'S.L.', 'confidence': 0.7101585268974304, 'geometry': ((0.623494029112493, 0.150390625), (0.700677797307801, 0.177734375))}]}], 'artefacts': []}, {'geometry': ((0.2530119417750141, 0.224609375), (0.6132028600197852, 0.248046875)), 'lines': [{'geometry': ((0.2530119417750141, 0.224609375), (0.6132028600197852, 0.248046875)), 'words': [{'value': 'FACTURA', 'confidence': 0.9984753131866455, 'geometry': ((0.2530119417750141, 0.224609375), (0.38336675028264555, 0.24609375))}, {'value': 'SIMPLIFICADA', 'confidence': 0.9566319584846497, 'geometry': ((0.4005186987704918, 0.2265625), (0.6132028600197852, 0.248046875))}]}], 'artefacts': []}, {'geometry': ((0.0660557032574901, 0.2744140625), (0.1380938869064443, 0.2978515625)), 'lines': [{'geometry': ((0.0660557032574901, 0.2744140625), (0.1380938869064443, 0.2978515625)), 'words': [{'value': 'Nro.', 'confidence': 0.9982191920280457, 'geometry': ((0.0660557032574901, 0.2744140625), (0.1380938869064443, 0.2978515625))}]}], 'artefacts': []}, {'geometry': ((0.17582817357970604, 0.2744140625), (0.45540493393159975, 0.298828125)), 'lines': [{'geometry': ((0.17582817357970604, 0.2744140625), (0.45540493393159975, 0.298828125)), 'words': [{'value': ':', 'confidence': 0.9996176362037659, 'geometry': ((0.17582817357970604, 0.2783203125), (0.19641051176512153, 0.2978515625))}, {'value': '0141002-18965', 'confidence': 0.8270542025566101, 'geometry': ((0.21184726540418314, 0.2744140625), (0.45540493393159975, 0.298828125))}]}], 'artefacts': []}, {'geometry': ((0.06262531355992085, 0.2998046875), (0.5102911690927078, 0.3232421875)), 'lines': [{'geometry': ((0.06262531355992085, 0.2998046875), (0.5102911690927078, 0.3232421875)), 'words': [{'value': 'Fecha', 'confidence': 0.9991544485092163, 'geometry': ((0.06262531355992085, 0.2998046875), (0.15696103024307517, 0.322265625))}, {'value': ':', 'confidence': 0.9997920989990234, 'geometry': ((0.17754336842849067, 0.3046875), (0.19298012206755227, 0.322265625))}, {'value': '06-04-2024', 'confidence': 0.9974750280380249, 'geometry': ((0.21356246025296777, 0.30078125), (0.4005186987704918, 0.3212890625))}, {'value': '15:23', 'confidence': 0.5147274136543274, 'geometry': ((0.41938584210712265, 0.30078125), (0.5102911690927078, 0.3232421875))}]}], 'artefacts': []}, {'geometry': ((0.5600318197074619, 0.302734375), (0.7984439036885246, 0.3740234375)), 'lines': [{'geometry': ((0.5600318197074619, 0.302734375), (0.7950135139909553, 0.3251953125)), 'words': [{'value': 'Cajero:', 'confidence': 0.9981601238250732, 'geometry': ((0.5600318197074619, 0.302734375), (0.6835258488199548, 0.3251953125))}, {'value': '10074', 'confidence': 0.9981125593185425, 'geometry': ((0.7092537715517242, 0.302734375), (0.7950135139909553, 0.322265625))}]}, {'geometry': ((0.596050911531939, 0.3525390625), (0.7984439036885246, 0.3740234375)), 'words': [{'value': 'PVP', 'confidence': 0.9959189891815186, 'geometry': ((0.596050911531939, 0.3525390625), (0.6526523415418315, 0.373046875))}, {'value': 'IMPORTE', 'confidence': 0.9976862072944641, 'geometry': ((0.6698042900296778, 0.3525390625), (0.7984439036885246, 0.3740234375))}]}], 'artefacts': []}, {'geometry': ((0.06091011871113622, 0.353515625), (0.2667335005652911, 0.37109375)), 'lines': [{'geometry': ((0.06091011871113622, 0.353515625), (0.2667335005652911, 0.37109375)), 'words': [{'value': 'DESCRIPCION', 'confidence': 0.6904217004776001, 'geometry': ((0.06091011871113622, 0.353515625), (0.2667335005652911, 0.37109375))}]}], 'artefacts': []}, {'geometry': ((0.4519745442340305, 0.3515625), (0.5445950660684002, 0.373046875)), 'lines': [{'geometry': ((0.4519745442340305, 0.3515625), (0.5445950660684002, 0.373046875)), 'words': [{'value': 'CANT.', 'confidence': 0.9987971782684326, 'geometry': ((0.4519745442340305, 0.3515625), (0.5445950660684002, 0.373046875))}]}], 'artefacts': []}, {'geometry': ((0.04890375476964387, 0.40234375), (0.4365377905949689, 0.58984375)), 'lines': [{'geometry': ((0.05576453416478239, 0.40234375), (0.3919427245265687, 0.4228515625)), 'words': [{'value': 'CLIPPER', 'confidence': 0.9989248514175415, 'geometry': ((0.05576453416478239, 0.404296875), (0.18783453752119844, 0.4228515625))}, {'value': 'MANZ.1.5L.', 'confidence': 0.9872568845748901, 'geometry': ((0.2067016808578293, 0.40234375), (0.3919427245265687, 0.4228515625))}]}, {'geometry': ((0.05404933931599776, 0.427734375), (0.4365377905949689, 0.4501953125)), 'words': [{'value': 'PLATANO', 'confidence': 0.9989739060401917, 'geometry': ((0.05404933931599776, 0.431640625), (0.1861193426724138, 0.4501953125))}, {'value': 'PRIMERA', 'confidence': 0.9871561527252197, 'geometry': ((0.2067016808578293, 0.4306640625), (0.3387716842142453, 0.44921875))}, {'value': 'GRAN', 'confidence': 0.998450517654419, 'geometry': ((0.3559236327020916, 0.427734375), (0.4365377905949689, 0.4501953125))}]}, {'geometry': ((0.0506189496184285, 0.455078125), (0.4365377905949689, 0.4794921875)), 'words': [{'value': 'MANZANA', 'confidence': 0.9354597330093384, 'geometry': ((0.0506189496184285, 0.4580078125), (0.1861193426724138, 0.4794921875))}, {'value': 'PINK', 'confidence': 0.9995766878128052, 'geometry': ((0.20155609631147542, 0.4560546875), (0.2821702542043527, 0.478515625))}, {'value': 'LADY', 'confidence': 0.999377429485321, 'geometry': ((0.3010373975409836, 0.4560546875), (0.3799363605850763, 0.478515625))}, {'value': 'GR', 'confidence': 0.9998779296875, 'geometry': ((0.3936579193753533, 0.455078125), (0.4365377905949689, 0.4775390625))}]}, {'geometry': ((0.05233414446721313, 0.484375), (0.4279618163510458, 0.505859375)), 'words': [{'value': 'SALSA.BARI.PES.GEN.I', 'confidence': 0.559032142162323, 'geometry': ((0.05233414446721313, 0.484375), (0.4279618163510458, 0.505859375))}]}, {'geometry': ((0.04890375476964387, 0.51171875), (0.43310740089739963, 0.53515625)), 'words': [{'value': 'GOFIO', 'confidence': 0.7128666639328003, 'geometry': ((0.04890375476964387, 0.513671875), (0.14666986115036745, 0.53515625))}, {'value': 'B.LUGAR', 'confidence': 0.9992106556892395, 'geometry': ((0.1638218096382137, 0.5126953125), (0.299322202692199, 0.5341796875))}, {'value': 'MIL.FU', 'confidence': 0.996026337146759, 'geometry': ((0.31818934602882987, 0.51171875), (0.43310740089739963, 0.533203125))}]}, {'geometry': ((0.05233414446721313, 0.5380859375), (0.4348225957461843, 0.5615234375)), 'words': [{'value': 'ZUM.DISF.D.SIMON', 'confidence': 0.8773426413536072, 'geometry': ((0.05233414446721313, 0.541015625), (0.35763882755087617, 0.5615234375))}, {'value': 'PIN', 'confidence': 0.9868991374969482, 'geometry': ((0.3730755811899378, 0.5380859375), (0.4348225957461843, 0.5615234375))}]}, {'geometry': ((0.05233414446721313, 0.568359375), (0.4056642833168457, 0.58984375)), 'words': [{'value': 'LECHE.GRNJ.FLR.UHT.', 'confidence': 0.24732787907123566, 'geometry': ((0.05233414446721313, 0.568359375), (0.4056642833168457, 0.58984375))}]}], 'artefacts': []}, {'geometry': ((0.472556882419446, 0.40234375), (0.5480254557659695, 0.5859375)), 'lines': [{'geometry': ((0.5257279227317694, 0.40234375), (0.5480254557659695, 0.4228515625)), 'words': [{'value': '1', 'confidence': 0.9997158646583557, 'geometry': ((0.5257279227317694, 0.40234375), (0.5480254557659695, 0.4228515625))}]}, {'geometry': ((0.472556882419446, 0.4287109375), (0.5480254557659695, 0.4521484375)), 'words': [{'value': '1,40', 'confidence': 0.9187146425247192, 'geometry': ((0.472556882419446, 0.4287109375), (0.5480254557659695, 0.4521484375))}]}, {'geometry': ((0.5291583124293386, 0.458984375), (0.5445950660684002, 0.4765625)), 'words': [{'value': '1', 'confidence': 0.8148260712623596, 'geometry': ((0.5291583124293386, 0.458984375), (0.5445950660684002, 0.4765625))}]}, {'geometry': ((0.5291583124293386, 0.4853515625), (0.5445950660684002, 0.5029296875)), 'words': [{'value': '1', 'confidence': 0.9833357334136963, 'geometry': ((0.5291583124293386, 0.4853515625), (0.5445950660684002, 0.5029296875))}]}, {'geometry': ((0.5291583124293386, 0.51171875), (0.5445950660684002, 0.53125)), 'words': [{'value': '1', 'confidence': 0.9980706572532654, 'geometry': ((0.5291583124293386, 0.51171875), (0.5445950660684002, 0.53125))}]}, {'geometry': ((0.5291583124293386, 0.5673828125), (0.5445950660684002, 0.5859375)), 'words': [{'value': '1', 'confidence': 0.9998283386230469, 'geometry': ((0.5291583124293386, 0.5673828125), (0.5445950660684002, 0.5859375))}]}], 'artefacts': []}, {'geometry': ((0.5874749372880158, 0.3994140625), (0.6800954591223856, 0.591796875)), 'lines': [{'geometry': ((0.5874749372880158, 0.3994140625), (0.6783802642736009, 0.4296875)), 'words': [{'value': '1,49', 'confidence': 0.9256067872047424, 'geometry': ((0.5874749372880158, 0.3994140625), (0.6783802642736009, 0.4296875))}]}, {'geometry': ((0.5994813012295082, 0.4287109375), (0.6749498745760316, 0.453125)), 'words': [{'value': '1,99', 'confidence': 0.9997525811195374, 'geometry': ((0.5994813012295082, 0.4287109375), (0.6749498745760316, 0.453125))}]}, {'geometry': ((0.5994813012295082, 0.455078125), (0.6749498745760316, 0.4794921875)), 'words': [{'value': '2,99', 'confidence': 0.9923906326293945, 'geometry': ((0.5994813012295082, 0.455078125), (0.6749498745760316, 0.4794921875))}]}, {'geometry': ((0.5977661063807236, 0.4814453125), (0.6783802642736009, 0.5087890625)), 'words': [{'value': '3,10', 'confidence': 0.9706373810768127, 'geometry': ((0.5977661063807236, 0.4814453125), (0.6783802642736009, 0.5087890625))}]}, {'geometry': ((0.5994813012295082, 0.5087890625), (0.6783802642736009, 0.5361328125)), 'words': [{'value': '1,85', 'confidence': 0.998205840587616, 'geometry': ((0.5994813012295082, 0.5087890625), (0.6783802642736009, 0.5361328125))}]}, {'geometry': ((0.5994813012295082, 0.537109375), (0.6800954591223856, 0.5634765625)), 'words': [{'value': '1,75', 'confidence': 0.9998788833618164, 'geometry': ((0.5994813012295082, 0.537109375), (0.6800954591223856, 0.5634765625))}]}, {'geometry': ((0.6011964960782928, 0.564453125), (0.6800954591223856, 0.591796875)), 'words': [{'value': '1,15', 'confidence': 0.9816519618034363, 'geometry': ((0.6011964960782928, 0.564453125), (0.6800954591223856, 0.591796875))}]}], 'artefacts': []}, {'geometry': ((0.7092537715517242, 0.40234375), (0.8207414367227247, 0.716796875)), 'lines': [{'geometry': ((0.7264057200395704, 0.40234375), (0.8001590985373093, 0.4267578125)), 'words': [{'value': '1,49', 'confidence': 0.9992368817329407, 'geometry': ((0.7264057200395704, 0.40234375), (0.8001590985373093, 0.4267578125))}]}, {'geometry': ((0.7229753303420011, 0.427734375), (0.8035894882348784, 0.4541015625)), 'words': [{'value': '2,79', 'confidence': 0.9989591240882874, 'geometry': ((0.7229753303420011, 0.427734375), (0.8035894882348784, 0.4541015625))}]}, {'geometry': ((0.7246905251907858, 0.4541015625), (0.8035894882348784, 0.4814453125)), 'words': [{'value': '2,99', 'confidence': 0.9962424635887146, 'geometry': ((0.7246905251907858, 0.4541015625), (0.8035894882348784, 0.4814453125))}]}, {'geometry': ((0.7246905251907858, 0.4814453125), (0.8053046830836631, 0.5078125)), 'words': [{'value': '3,10', 'confidence': 0.9997454285621643, 'geometry': ((0.7246905251907858, 0.4814453125), (0.8053046830836631, 0.5078125))}]}, {'geometry': ((0.7298361097371396, 0.5087890625), (0.8087350727812324, 0.5361328125)), 'words': [{'value': '1,85', 'confidence': 0.9839196801185608, 'geometry': ((0.7298361097371396, 0.5087890625), (0.8087350727812324, 0.5361328125))}]}, {'geometry': ((0.7298361097371396, 0.5361328125), (0.8087350727812324, 0.5634765625)), 'words': [{'value': '1,75', 'confidence': 0.9650628566741943, 'geometry': ((0.7298361097371396, 0.5361328125), (0.8087350727812324, 0.5634765625))}]}, {'geometry': ((0.7332664994347089, 0.5634765625), (0.8087350727812324, 0.5908203125)), 'words': [{'value': '1,15', 'confidence': 0.9982710480690002, 'geometry': ((0.7332664994347089, 0.5634765625), (0.8087350727812324, 0.5908203125))}]}, {'geometry': ((0.7092537715517242, 0.615234375), (0.8207414367227247, 0.6728515625)), 'words': [{'value': '15,12', 'confidence': 0.9976614713668823, 'geometry': ((0.7092537715517242, 0.615234375), (0.8207414367227247, 0.6728515625))}]}, {'geometry': ((0.7161145509468626, 0.658203125), (0.8155958521763709, 0.689453125)), 'words': [{'value': '15,12', 'confidence': 0.9242404103279114, 'geometry': ((0.7161145509468626, 0.658203125), (0.8155958521763709, 0.689453125))}]}, {'geometry': ((0.7332664994347089, 0.6904296875), (0.8155958521763709, 0.716796875)), 'words': [{'value': '0,00', 'confidence': 0.9986023902893066, 'geometry': ((0.7332664994347089, 0.6904296875), (0.8155958521763709, 0.716796875))}]}], 'artefacts': []}, {'geometry': ((0.05233414446721313, 0.595703125), (0.24100557783352178, 0.619140625)), 'lines': [{'geometry': ((0.05233414446721313, 0.595703125), (0.24100557783352178, 0.619140625)), 'words': [{'value': 'Lineas', 'confidence': 0.9689716696739197, 'geometry': ((0.05233414446721313, 0.595703125), (0.1638218096382137, 0.6181640625))}, {'value': ':', 'confidence': 0.9999046325683594, 'geometry': ((0.18440414782362918, 0.599609375), (0.1998409014626908, 0.619140625))}, {'value': '7', 'confidence': 0.9998273849487305, 'geometry': ((0.21870804479932165, 0.595703125), (0.24100557783352178, 0.6181640625))}]}], 'artefacts': []}, {'geometry': ((0.3867971399802148, 0.6201171875), (0.5668925991026004, 0.7158203125)), 'lines': [{'geometry': ((0.3867971399802148, 0.6201171875), (0.5394494815220463, 0.669921875)), 'words': [{'value': 'Total', 'confidence': 0.8477303981781006, 'geometry': ((0.3867971399802148, 0.6201171875), (0.4931392206048615, 0.6669921875))}, {'value': 'F', 'confidence': 0.9914229512214661, 'geometry': ((0.5051455845463538, 0.634765625), (0.5394494815220463, 0.669921875))}]}, {'geometry': ((0.39022752967778407, 0.6640625), (0.5668925991026004, 0.6923828125)), 'words': [{'value': 'Entregado', 'confidence': 0.9985143542289734, 'geometry': ((0.39022752967778407, 0.6640625), (0.5668925991026004, 0.6923828125))}]}, {'geometry': ((0.39022752967778407, 0.693359375), (0.5085759742439231, 0.7158203125)), 'words': [{'value': 'Cambio', 'confidence': 0.9998855590820312, 'geometry': ((0.39022752967778407, 0.693359375), (0.5085759742439231, 0.7158203125))}]}], 'artefacts': []}, {'geometry': ((0.14495466630158282, 0.6669921875), (0.3147589563312606, 0.689453125)), 'lines': [{'geometry': ((0.14495466630158282, 0.6669921875), (0.3147589563312606, 0.689453125)), 'words': [{'value': 'TARJETA', 'confidence': 0.576540470123291, 'geometry': ((0.14495466630158282, 0.6669921875), (0.2770246696579989, 0.689453125))}, {'value': 'Me', 'confidence': 0.5712556838989258, 'geometry': ((0.29589181299462974, 0.6748046875), (0.3147589563312606, 0.68359375))}]}], 'artefacts': []}, {'geometry': ((0.05404933931599776, 0.7421875), (0.23242960358959863, 0.9658203125)), 'lines': [{'geometry': ((0.05404933931599776, 0.7421875), (0.22556882419446017, 0.767578125)), 'words': [{'value': 'Operacion', 'confidence': 0.9885244965553284, 'geometry': ((0.05404933931599776, 0.7421875), (0.22556882419446017, 0.767578125))}]}, {'geometry': ((0.05576453416478239, 0.7685546875), (0.15353064054550591, 0.791015625)), 'words': [{'value': 'Fecha', 'confidence': 0.9997254014015198, 'geometry': ((0.05576453416478239, 0.7685546875), (0.15353064054550591, 0.791015625))}]}, {'geometry': ((0.05576453416478239, 0.791015625), (0.21184726540418314, 0.81640625)), 'words': [{'value': 'Comercio', 'confidence': 0.9996271729469299, 'geometry': ((0.05576453416478239, 0.791015625), (0.21184726540418314, 0.81640625))}]}, {'geometry': ((0.05747972901356696, 0.8134765625), (0.1226571332673827, 0.837890625)), 'words': [{'value': 'ARC', 'confidence': 0.9992039799690247, 'geometry': ((0.05747972901356696, 0.8134765625), (0.1226571332673827, 0.837890625))}]}, {'geometry': ((0.06091011871113622, 0.8369140625), (0.12437232811616733, 0.8623046875)), 'words': [{'value': 'AID', 'confidence': 0.9997521042823792, 'geometry': ((0.06091011871113622, 0.8369140625), (0.12437232811616733, 0.8623046875))}]}, {'geometry': ((0.06091011871113622, 0.859375), (0.23242960358959863, 0.8916015625)), 'words': [{'value': 'App', 'confidence': 0.9994775056838989, 'geometry': ((0.06091011871113622, 0.8623046875), (0.1278027178137366, 0.8916015625))}, {'value': 'Label', 'confidence': 0.999256432056427, 'geometry': ((0.1380938869064443, 0.859375), (0.23242960358959863, 0.8857421875))}]}, {'geometry': ((0.06434050840870548, 0.90234375), (0.2101320705553985, 0.94140625)), 'words': [{'value': 'Tarjeta', 'confidence': 0.994183361530304, 'geometry': ((0.06434050840870548, 0.90234375), (0.2101320705553985, 0.94140625))}]}, {'geometry': ((0.06434050840870548, 0.92578125), (0.21356246025296777, 0.9658203125)), 'words': [{'value': 'Importe', 'confidence': 0.9969205856323242, 'geometry': ((0.06434050840870548, 0.92578125), (0.21356246025296777, 0.9658203125))}]}], 'artefacts': []}, {'geometry': ((0.2821702542043527, 0.7421875), (0.41595545240955345, 0.765625)), 'lines': [{'geometry': ((0.2821702542043527, 0.7421875), (0.41595545240955345, 0.765625)), 'words': [{'value': ':', 'confidence': 0.999728262424469, 'geometry': ((0.2821702542043527, 0.7490234375), (0.2976070078434143, 0.765625))}, {'value': 'VENTA', 'confidence': 0.9879698753356934, 'geometry': ((0.3147589563312606, 0.7421875), (0.41595545240955345, 0.7646484375))}]}], 'artefacts': []}, {'geometry': ((0.31818934602882987, 0.765625), (0.6269244188100622, 0.8359375)), 'lines': [{'geometry': ((0.31818934602882987, 0.765625), (0.6269244188100622, 0.7919921875)), 'words': [{'value': '06/04/2024', 'confidence': 0.6072558760643005, 'geometry': ((0.31818934602882987, 0.765625), (0.5120063639414923, 0.7900390625))}, {'value': '15:24', 'confidence': 0.9053549766540527, 'geometry': ((0.527443117580554, 0.7666015625), (0.6269244188100622, 0.7919921875))}]}, {'geometry': ((0.3233349305751837, 0.7919921875), (0.4897088309072923, 0.810546875)), 'words': [{'value': '249060518', 'confidence': 0.9984224438667297, 'geometry': ((0.3233349305751837, 0.7919921875), (0.4897088309072923, 0.810546875))}]}, {'geometry': ((0.31990454087761444, 0.814453125), (0.3644996069460147, 0.8359375)), 'words': [{'value': '00', 'confidence': 0.9989715218544006, 'geometry': ((0.31990454087761444, 0.814453125), (0.3644996069460147, 0.8359375))}]}], 'artefacts': []}, {'geometry': ((0.3233349305751837, 0.837890625), (0.5857597424392312, 0.88671875)), 'lines': [{'geometry': ((0.3233349305751837, 0.837890625), (0.5857597424392312, 0.8623046875)), 'words': [{'value': 'A0000000031010', 'confidence': 0.862206220626831, 'geometry': ((0.3233349305751837, 0.837890625), (0.5857597424392312, 0.8623046875))}]}, {'geometry': ((0.3233349305751837, 0.861328125), (0.5120063639414923, 0.88671875)), 'words': [{'value': 'Visa', 'confidence': 0.644270122051239, 'geometry': ((0.3233349305751837, 0.861328125), (0.4022338936192764, 0.8837890625))}, {'value': 'DEBIT', 'confidence': 0.9952477216720581, 'geometry': ((0.41424025756076877, 0.861328125), (0.5120063639414923, 0.88671875))}]}], 'artefacts': []}, {'geometry': ((0.06434050840870548, 0.8818359375), (0.4365377905949689, 0.9140625)), 'lines': [{'geometry': ((0.06434050840870548, 0.8818359375), (0.4365377905949689, 0.9140625)), 'words': [{'value': 'Autorizacion:', 'confidence': 0.9953259825706482, 'geometry': ((0.06434050840870548, 0.8818359375), (0.3096133717849067, 0.9140625))}, {'value': '846121', 'confidence': 0.982590913772583, 'geometry': ((0.3250501254239684, 0.884765625), (0.4365377905949689, 0.90625))}]}], 'artefacts': []}, {'geometry': ((0.4639809081755229, 0.9189453125), (0.47598727211701525, 0.9248046875)), 'lines': [{'geometry': ((0.4639809081755229, 0.9189453125), (0.47598727211701525, 0.9248046875)), 'words': [{'value': '-', 'confidence': 0.7176811695098877, 'geometry': ((0.4639809081755229, 0.9189453125), (0.47598727211701525, 0.9248046875))}]}], 'artefacts': []}, {'geometry': ((0.554886235161108, 0.9169921875), (0.6132028600197852, 0.931640625)), 'lines': [{'geometry': ((0.554886235161108, 0.9169921875), (0.6132028600197852, 0.931640625)), 'words': [{'value': '3/61', 'confidence': 0.3308126628398895, 'geometry': ((0.554886235161108, 0.9169921875), (0.6132028600197852, 0.931640625))}]}], 'artefacts': []}, {'geometry': ((0.3284805151215376, 0.9287109375), (0.4948544154536461, 0.95703125)), 'lines': [{'geometry': ((0.3284805151215376, 0.9287109375), (0.4948544154536461, 0.95703125)), 'words': [{'value': '15,12', 'confidence': 0.78933185338974, 'geometry': ((0.3284805151215376, 0.9287109375), (0.42453142665347654, 0.9541015625))}, {'value': 'EUR', 'confidence': 0.9998893737792969, 'geometry': ((0.4365377905949689, 0.931640625), (0.4948544154536461, 0.95703125))}]}], 'artefacts': []}, {'geometry': ((0.22556882419446017, 0.97265625), (0.41595545240955345, 0.9990234375)), 'lines': [{'geometry': ((0.22556882419446017, 0.97265625), (0.41595545240955345, 0.9990234375)), 'words': [{'value': '-Copia', 'confidence': 0.6154620051383972, 'geometry': ((0.22556882419446017, 0.97265625), (0.33534129451667605, 0.9990234375))}, {'value': 'para', 'confidence': 0.9859090447425842, 'geometry': ((0.35249324300452234, 0.98046875), (0.41595545240955345, 0.99609375))}]}], 'artefacts': []}]}]
    }

## LLM

In [None]:
import json

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, BitsAndBytesConfig
from transformers import pipeline

### Zephyr

In [None]:
# control model memory allocation between devices for low GPU resource (0,cpu)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

device_map = {
    "transformer.word_embeddings": 0,
    "transformer.word_embeddings_layernorm": 0,
    "lm_head": 0,
    "transformer.h": 0,
    "transformer.ln_f": 0,
    "model.embed_tokens": 0,
    "model.layers":0,
    "model.norm":0
}
device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = 'HuggingFaceH4/zephyr-7b-beta'

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [None]:
prompt=f"""You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object corresponding to pydantic schema provided.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json

### Pydantic schema:
class Item(BaseModel):
    name: str
    unit: float
    price: float
    amount: float

class PaymentMethodEnum(str, Enum):
    tarjeta = 'tarjeta'
    efectivo = 'efectivo'

class Receipt(BaseModel):
    store: str
    address: str
    city: str
    phone: str
    receipt_no: str
    date: str
    time: str
    items: List[Item]
    total: float
    number_items: int
    payment_method: PaymentMethodEnum

### Input:
{receipt_texts}
"""

In [None]:
with torch.inference_mode():
    inputs = tokenizer(prompt,return_tensors="pt",truncation=True).to(device)
    outputs = model.generate(**inputs, max_new_tokens=1000) ##use_cache=True, do_sample=True,temperature=0.1, top_p=0.95
    result_text = tokenizer.batch_decode(outputs)[0]
    print(result_text)

# clear
torch.cuda.empty_cache()

#### with examples

In [None]:
messages = [
    {
        "role": "system",
        "content": """Instruction:
You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json""",
    },
    {
        "role": "user",
        "content": """Input: {input_example} """,
    },
    {   "role":  "assistant",
        "content": "Output: {output_example}"
    },
    {   "role": "user",
        "content": "Input: {input}"}
]

parser = PydanticOutputParser(pydantic_object= Receipt)

pipe = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.1,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=1000,
)


prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

final_prompt = prompt.format(input_example=receipt_texts_3, example=example_3, input=receipt_texts)


outputs = pipe(final_prompt)
print(outputs[0]["generated_text"])


### OpenAI

In [None]:
!pip install langchain langchain_openai langchain_core

Collecting langchain
  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain_openai
  Downloading langchain_openai-0.1.3-py3-none-any.whl (33 kB)
Collecting langchain_core
  Downloading langchain_core-0.1.45-py3-none-any.whl (291 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m291.3/291.3 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain)
  Downloading langchain_community-0.0.34-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting lan

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)

In [None]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

··········


In [None]:
model = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [None]:
OPENAI_API_KEY="###"

In [None]:
structured_llm = model.with_structured_output(ReceiptInfo, method="json_mode")


The function `with_structured_output` is in beta. It is actively being worked on, so the API may change.



#### few-shots inference

In [None]:
examples = [
    {"input": f"{receipt_texts_1}", "output": f"{example_1}"},
    {"input": f"{receipt_texts_2}", "output": f"{example_2}"},
    {"input": f"{receipt_texts_4}", "output": f"{example_4}"}
]

In [None]:
# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

print(few_shot_prompt.format())

In [None]:
system_message = """You are POS receipt data expert, parse, detect, recognize and convert the receipt OCR image result into structure receipt data object corresponding to pydantic schema provided.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json

### Pydantic schema:
class Item(BaseModel):
    name: str
    unit: float
    price: float
    amount: float

class PaymentMethodEnum(str, Enum):
    tarjeta = 'tarjeta'
    efectivo = 'efectivo'

class Receipt(BaseModel):
    store: str
    address: str
    city: str
    phone: str
    receipt_no: str
    date: str
    time: str
    items: List[Item]
    total: float
    number_items: int
    payment_method: PaymentMethodEnum
"""

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_message}"),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)


In [None]:
chain = final_prompt | model

output = chain.invoke({"system_message":system_message,"input": receipt_texts_3})

In [None]:
output.content

[32m"[0m[32m{[0m[32m'store': 'SPAR TAFIRA', 'address': 'C/. Bruno Naranjo Diaz 9A-9B', 'city': 'Tafira Baja', 'phone': '928351616', 'receipt_no': '014001-42453', 'date': '08/04/2024', 'time': '10:47', 'items': [0m[32m[[0m[32m{[0m[32m'name': 'LECHE.GRNJ.FLR.UHT', 'unit': 1, 'price': 1.15, 'amount': 1.15[0m[32m}[0m[32m, [0m[32m{[0m[32m'name': 'PUERROS GRANEL', 'unit': 1.27, 'price': 2.99, 'amount': 3.81[0m[32m}[0m[32m, [0m[32m{[0m[32m'name': 'HUEVOS FRESCOS 12U', 'unit': 1.15, 'price': 2.99, 'amount': 3.44[0m[32m}[0m[32m, [0m[32m{[0m[32m'name': 'ESPINACAS SPAR', 'unit': 1, 'price': 1.15, 'amount': 1.15[0m[32m}[0m[32m, [0m[32m{[0m[32m'name': 'AGUA YUGUINAT NAT.8L', 'unit': 1.49, 'price': 1.49, 'amount': 1.49[0m[32m}[0m[32m][0m[32m, 'total': 8.05, 'number_items': 5, 'payment_method': 'tarjeta'[0m[32m}[0m[32m"[0m

#### few-shot with structured output

In [None]:
system_message = """You are POS receipt data expert, parse, detect, recognize and convert the receipt OCR image result into structure receipt data object.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json
"""

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_message}"),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

chain = final_prompt | structured_llm

output = chain.invoke({"system_message":system_message,"input": receipt_texts_3})

output



[1m{[0m
    [32m'store'[0m: [32m'SPAR TAFIRA'[0m,
    [32m'address'[0m: [32m'C/. Bruno Naranjo Diaz 9A-9B'[0m,
    [32m'city'[0m: [32m'Tafira Baja'[0m,
    [32m'phone'[0m: [32m'928 351 616'[0m,
    [32m'receipt_no'[0m: [32m'014001-42453'[0m,
    [32m'date'[0m: [32m'08/04/2024'[0m,
    [32m'time'[0m: [32m'10:47'[0m,
    [32m'items'[0m: [1m[[0m
        [1m{[0m[32m'name'[0m: [32m'LECHE.GRNJ.FLR.UHT'[0m, [32m'unit'[0m: [1;36m1[0m, [32m'price'[0m: [1;36m1.15[0m, [32m'amount'[0m: [1;36m1.15[0m[1m}[0m,
        [1m{[0m[32m'name'[0m: [32m'PUERROS GRANEL'[0m, [32m'unit'[0m: [1;36m0.425[0m, [32m'price'[0m: [1;36m2.99[0m, [32m'amount'[0m: [1;36m1.27[0m[1m}[0m,
        [1m{[0m[32m'name'[0m: [32m'HUEVOS FRESCOS 12U'[0m, [32m'unit'[0m: [1;36m1[0m, [32m'price'[0m: [1;36m2.99[0m, [32m'amount'[0m: [1;36m1.15[0m[1m}[0m,
        [1m{[0m[32m'name'[0m: [32m'ESPINACAS SPAR'[0m, [32m'unit'[0m: [1;36m1[0m, 

#### include categories

In [None]:
examples_cat = [
    {"input": f"{receipt_texts_1}", "output": f"{example_cat_1}"},
    {"input": f"{receipt_texts_2}", "output": f"{example_cat_2}"},
    {"input": f"{receipt_texts_4}", "output": f"{example_cat_4}"}
]

example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt_cat = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples_cat,
)

print(few_shot_prompt_cat.format())

In [None]:
system_message_cat = """You are POS receipt data expert, parse, detect, recognize and convert the receipt OCR image result into structure receipt data object.
Next, assign a category to each item. Don't make up value not in the Input. Output must be a well-formed JSON object.```json
"""

final_prompt_cat = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_message}"),
        few_shot_prompt_cat,
        ("human", "{input}"),
    ]
)

chain = final_prompt_cat | structured_llm

output_cat = chain.invoke({"system_message":system_message_cat,"input": receipt_texts_5})

output_cat



[1m{[0m
    [32m'store'[0m: [32m'MERCADONA'[0m,
    [32m'address'[0m: [32m'C/ Republica Dominicana S/N, 35010 Las Palmas de Gran Canaria'[0m,
    [32m'phone'[0m: [32m'928226288'[0m,
    [32m'receipt_no'[0m: [32m'2109-017-467040'[0m,
    [32m'date'[0m: [32m'06/04/2024'[0m,
    [32m'time'[0m: [32m'19:56'[0m,
    [32m'items'[0m: [1m[[0m
        [1m{[0m
            [32m'name'[0m: [32m'CERVEZA NEGRA P-6'[0m,
            [32m'unit'[0m: [1;36m1[0m,
            [32m'price'[0m: [1;36m5.4[0m,
            [32m'amount'[0m: [1;36m5.4[0m,
            [32m'category'[0m: [32m'beverages'[0m
        [1m}[0m,
        [1m{[0m[32m'name'[0m: [32m'BOLSA PAPEL'[0m, [32m'unit'[0m: [1;36m1[0m, [32m'price'[0m: [1;36m0.1[0m, [32m'amount'[0m: [1;36m0.1[0m, [32m'category'[0m: [32m'household'[0m[1m}[0m,
        [1m{[0m
            [32m'name'[0m: [32m'CHICLES MENTA FUERTE'[0m,
            [32m'unit'[0m: [1;36m1[0m,
            [

# Donut document parsing

In [None]:
import re
import torch

from transformers import DonutProcessor, VisionEncoderDecoderModel


## Donut base CORD

In [None]:
processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-cord-v2")
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-cord-v2")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

In [None]:
# prepare decoder inputs
task_prompt = "<s_cord-v2>"
decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids

pixel_values = processor(image_1, return_tensors="pt").pixel_values

outputs = model.generate(
    pixel_values.to(device),
    decoder_input_ids=decoder_input_ids.to(device),
    max_length=model.decoder.config.max_position_embeddings,
    pad_token_id=processor.tokenizer.pad_token_id,
    eos_token_id=processor.tokenizer.eos_token_id,
    use_cache=True,
    bad_words_ids=[[processor.tokenizer.unk_token_id]],
    return_dict_in_generate=True,
)

sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()  # remove first task start token
print(processor.token2json(sequence))

## Fine-tuned Donut on invoice and receipts

In [None]:
processor = DonutProcessor.from_pretrained("mychen76/invoice-and-receipts_donut_v1")
model = VisionEncoderDecoderModel.from_pretrained("mychen76/invoice-and-receipts_donut_v1")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

In [None]:
def generateTextInImage(processor,model,input_image,task_prompt="<s_receipt>"):
    pixel_values = processor(input_image, return_tensors="pt").pixel_values
    print ("input pixel_values: ",pixel_values.shape)
    task_prompt = "<s_receipt>"
    decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt")["input_ids"]
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model.to(device)
    outputs = model.generate(pixel_values.to(device),
                               decoder_input_ids=decoder_input_ids.to(device),
                               max_length=model.decoder.config.max_position_embeddings,
                               early_stopping=True,
                               pad_token_id=processor.tokenizer.pad_token_id,
                               eos_token_id=processor.tokenizer.eos_token_id,
                               use_cache=True,
                               num_beams=1,
                               bad_words_ids=[[processor.tokenizer.unk_token_id]],
                               return_dict_in_generate=True,
                               output_scores=True,)
    return outputs

In [None]:
def generateOutputXML(processor,model, input_image, task_start="<s_receipt>",task_end="</s_receipt>"):
    import re
    outputs=generateTextInImage(processor,model,input_image,task_prompt=task_start)
    sequence = processor.batch_decode(outputs.sequences)[0]
    sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
    sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()  # remove first task start token
    return sequence

def generateOutputJson(processor,model, input_image, task_start="<s_receipt>",task_end="</s_receipt>"):
    xml = generateOutputXML(processor,model, input_image,task_start=task_start,task_end=task_end)
    result=processor.token2json(xml)
    print("Results: ",result)
    return result

In [None]:
## generate json
invoice1_json=generateOutputJson(processor,model,image_1)
print(invoice1_json)

#Multimodal - image-text-to-text

In [None]:
import requests
import torch
import transformers
from PIL import Image
from io import BytesIO

from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers import AwqConfig


DEVICE = "cuda:0"

In [None]:
!pip install --upgrade transformers


Collecting transformers
  Using cached transformers-4.40.0-py3-none-any.whl (9.0 MB)
Collecting tokenizers<0.20,>=0.19 (from transformers)
  Using cached tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.15.2
    Uninstalling tokenizers-0.15.2:
      Successfully uninstalled tokenizers-0.15.2
  Attempting uninstall: transformers
    Found existing installation: transformers 4.38.2
    Uninstalling transformers-4.38.2:
      Successfully uninstalled transformers-4.38.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autoawq 0.2.4 requires transformers<=4.38.2,>=4.35.0, but you have transformers 4.40.0 which is incompatible.[0m[31m
[0mSuccessfully installed tokenizers-0.19.1 transforme

In [None]:
!pip install --upgrade autoawq

In [None]:
quantization_config = AwqConfig(
     bits=4,
     fuse_max_seq_len=4096,
     modules_to_fuse={
         "attention": ["q_proj", "k_proj", "v_proj", "o_proj"],
         "mlp": ["gate_proj", "up_proj", "down_proj"],
         "layernorm": ["input_layernorm", "post_attention_layernorm", "norm"],
         "use_alibi": False,
         "num_attention_heads": 32,
         "num_key_value_heads": 8,
         "hidden_size": 4096,
     }
 )

model = AutoModelForVision2Seq.from_pretrained("HuggingFaceM4/idefics2-8b-AWQ", quantization_config=quantization_config,).to(DEVICE)

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b-AWQ",do_image_splitting=False, size= {"longest_edge": 448, "shortest_edge": 378})

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
!pip install -q langchain

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.2/290.2 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m113.7/113.7 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.0/53.0 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autoawq 0.2.4 requires transformers<=4.38.2,>=4.35.0, but you have transformers 4.40.0 w

In [None]:
from langchain.output_parsers import PydanticOutputParser

In [None]:
parser = PydanticOutputParser(pydantic_object=Receipt)
parser.get_format_instructions()

[32m'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema [0m[32m{[0m[32m"properties": [0m[32m{[0m[32m"foo": [0m[32m{[0m[32m"title": "Foo", "description": "a list of strings", "type": "array", "items": [0m[32m{[0m[32m"type": "string"[0m[32m}[0m[32m}[0m[32m}[0m[32m, "required": [0m[32m[[0m[32m"foo"[0m[32m][0m[32m}[0m[32m\nthe object [0m[32m{[0m[32m"foo": [0m[32m[[0m[32m"bar", "baz"[0m[32m][0m[32m}[0m[32m is a well-formatted instance of the schema. The object [0m[32m{[0m[32m"properties": [0m[32m{[0m[32m"foo": [0m[32m[[0m[32m"bar", "baz"[0m[32m][0m[32m}[0m[32m}[0m[32m is not well-formatted.\n\nHere is the output schema:\n```\n[0m[32m{[0m[32m"$defs": [0m[32m{[0m[32m"Item": [0m[32m{[0m[32m"properties": [0m[32m{[0m[32m"name": [0m[32m{[0m[32m"title": "Name", "type": "string"[0m[32m}[0m[32m, "unit": [0m[32m{[0m[32m"title": "Unit"

In [None]:
parser = PydanticOutputParser(pydantic_object=ReceiptInfo)
output_format = parser.get_format_instructions()

messages_schema = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": f"""Extract data from the image and respond in the JSON format corresponding to the following schema: {output_format}."""},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": f"""{example_1}"""},
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Extract data from the image and respond in the JSON format."},
        ]
    },
]


prompt = processor.apply_chat_template(messages_schema, add_generation_prompt=True)
print(prompt)
inputs = processor(text=prompt, images=[image_1, image_2], return_tensors="pt")
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

print("Shape of inputs:", {k: v.shape for k, v in inputs.items()})

# Generate
generated_ids = model.generate(**inputs, max_new_tokens=1000)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(generated_texts)



# Deployment plan

1. Data Storage:
Choose a cloud-based storage service like Amazon S3.
Set up a bucket to store receipt images and extracted data.
2. Data Extraction and Parsing:
Use your Python code with LLM to extract data from receipt images.
Parse the extracted data using Pydantic objects.
3. Data Analysis:
Develop Python scripts to analyze the extracted data.
Aggregate expenses and calculate metrics like total expenses per week/month.
4. Visualization:
Choose a simple visualization tool like Matplotlib or Plotly.
Create basic weekly/monthly expense visualizations.
5. Deployment:
Package your code into a Python application.
Deploy the application to a cloud platform like AWS Lambda or Google Cloud Functions.
6. Security:
Implement basic security measures like encryption for data transmission.
Configure access controls for the storage bucket.
7. Documentation:
Create simple documentation on how to use and deploy the application.
Include instructions for configuring data storage and accessing analysis results.
8. Testing:
Conduct basic testing to ensure the application functions as expected.
Validate that data extraction, analysis, and visualization work correctly.
9. Feedback and Iteration:
Gather feedback from users to identify areas for improvement.
Iterate on the application based on feedback and changing requirements.



In [None]:
import pandas as pd
import json
import datetime

In [None]:
def convert_dict_to_dataframe(data, existing_df=None):
    """
    Convert a dictionary to a DataFrame and concatenate it with an existing DataFrame if provided.

    Args:
    - data (dict): Dictionary containing the data.
    - existing_df (DataFrame, optional): Existing DataFrame to concatenate new data. Defaults to None.

    Returns:
    - DataFrame: Concatenated DataFrame.
    """
    # Helper function to handle missing values
    def get_value(key):
        return data.get(key, None)

    # Convert dictionary to DataFrame
    store_df = pd.DataFrame({
        'store': [get_value('store')],
        'address': [get_value('address')],
        'city': [get_value('city')],
        'phone': [get_value('phone')],
        'receipt_no': [get_value('receipt_no')],
        'date': [get_value('date')],
        'time': [get_value('time')],
        'total': [get_value('total')],
        'number_items': [get_value('number_items')],
        'payment_method': [get_value('payment_method')]
    })

    # Convert date to datetime format
    if 'date' in data:
        store_df['date'] = pd.to_datetime(store_df['date'], format='%d/%m/%Y')

    # Add week and month columns
    if 'date' in data:
        store_df['week'] = store_df['date'].dt.isocalendar().week
        store_df['month'] = store_df['date'].dt.strftime('%b')

    # Create DataFrame for items using list comprehension
    items_df = pd.DataFrame([{
        'name': item.get('name', None),
        'unit': item.get('unit', None),
        'price': item.get('price', None),
        'amount': item.get('amount', None),
        'category': item.get('category', None)
    } for item in data.get('items', [])])

    # Concatenate store and items DataFrames
    df = pd.concat([store_df] * len(data.get('items', [])), ignore_index=True)
    df = pd.concat([df, items_df], axis=1)

    # If there's an existing DataFrame, concatenate new data
    if existing_df is not None:
        combined_df = pd.concat([existing_df, df], ignore_index=True)
    else:
        combined_df = df

    return combined_df

combined_df = convert_dict_to_dataframe(example_cat_1)


for e in [example_cat_2,example_cat_3,example_cat_4,example_cat_5,example_cat_6]:
    combined_df = convert_dict_to_dataframe(e,combined_df)

display(combined_df)

Unnamed: 0,store,address,city,phone,receipt_no,date,time,total,number_items,payment_method,week,month,name,unit,price,amount,category
0,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,FRESA TARINA 500 GR,1.0,1.59,1.59,fruits
1,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,HIPERDINO ACEITUNA R/ANCHOA LATA 350,1.0,0.95,0.95,canned_goods
2,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,DESPERADOS CERVEZA TOQUE TEQUILA BOT,1.0,1.05,1.05,beverages
3,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,HIPERDINO CENTRO JAMON SERRANO BODEG,0.31,13.62,4.22,protein_foods
4,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,MONTESANO JAMON COCIDO SELECCION KG,0.308,8.74,2.15,protein_foods
5,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,CLIPPER MANZ.1.5L.,1.0,1.49,1.49,beverages
6,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,PLATANO PRIMERA GR,1.4,1.99,2.79,fruits
7,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,MANZANA PINK LADY GR,1.0,2.99,2.99,fruits
8,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,SALSA.BARI.PES.GEN.1,1.0,3.1,3.1,canned_goods
9,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,GOFIO B.LUGAR MIL.FU,1.0,1.85,1.85,grains


In [None]:
def convert_dict_to_dataframe(data, existing_df=None):
    """
    Convert a dictionary to a DataFrame and concatenate it with an existing DataFrame if provided.

    Args:
    - data (dict): Dictionary containing the data.
    - existing_df (DataFrame, optional): Existing DataFrame to concatenate new data. Defaults to None.

    Returns:
    - DataFrame: Concatenated DataFrame.
    """
    # Convert dictionary to DataFrame
    store_df = pd.DataFrame({
        'store': [data['store']],
        'address': [data['address']],
        'city': [data['city']],
        'phone': [data['phone']],
        'receipt_no': [data['receipt_no']],
        'date': [data['date']],
        'time': [data['time']],
        'total': [data['total']],
        'number_items': [data['number_items']],
        'payment_method': [data['payment_method']]
    })
    store_df['date'] = pd.to_datetime(store_df['date'], format='%d/%m/%Y')
    store_df['week'] = store_df['date'].dt.isocalendar().week
    store_df['month'] = store_df['date'].dt.strftime('%b')

    # Create DataFrame for items using list comprehension
    items_df = pd.DataFrame([{
        'name': item['name'],
        'unit': item['unit'],
        'price': item['price'],
        'amount': item['amount'],
        'category': item['category']
    } for item in data['items']])

    # Concatenate store and items DataFrames
    df = pd.concat([store_df] * len(data['items']), ignore_index=True)
    df = pd.concat([df, items_df], axis=1)

    # If there's an existing DataFrame, concatenate new data
    if existing_df is not None:
        combined_df = pd.concat([existing_df, df], ignore_index=True)
    else:
        combined_df = df

    return combined_df

combined_df = convert_dict_to_dataframe(example_cat_1)


for e in [example_cat_2,example_cat_3,example_cat_4,example_cat_5,example_cat_6]:
    combined_df = convert_dict_to_dataframe(e,combined_df)

display(combined_df)

Unnamed: 0,store,address,city,phone,receipt_no,date,time,total,number_items,payment_method,week,month,name,unit,price,amount,category
0,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,FRESA TARINA 500 GR,1.0,1.59,1.59,fruits
1,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,HIPERDINO ACEITUNA R/ANCHOA LATA 350,1.0,0.95,0.95,canned_goods
2,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,DESPERADOS CERVEZA TOQUE TEQUILA BOT,1.0,1.05,1.05,beverages
3,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,HIPERDINO CENTRO JAMON SERRANO BODEG,0.31,13.62,4.22,protein_foods
4,HiperDino,9238-SD Bernardo de la torre,Tafira Baja,928493638,2024/923813-00060866,2024-04-15,16:01,9.96,5,tarjeta,16,Apr,MONTESANO JAMON COCIDO SELECCION KG,0.308,8.74,2.15,protein_foods
5,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,CLIPPER MANZ.1.5L.,1.0,1.49,1.49,beverages
6,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,PLATANO PRIMERA GR,1.4,1.99,2.79,fruits
7,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,MANZANA PINK LADY GR,1.0,2.99,2.99,fruits
8,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,SALSA.BARI.PES.GEN.1,1.0,3.1,3.1,canned_goods
9,SPAR TAFIRA,C/. Bruno Naranjo DIAZ 9A-9B,Tafira Baja,928 351 616,014\002-18965,2024-04-06,15:23,15.12,7,tarjeta,14,Apr,GOFIO B.LUGAR MIL.FU,1.0,1.85,1.85,grains


## Visualization

In [None]:
!pip install dash



In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

import dash
from dash import Dash, dcc, html
from dash.dependencies import Input, Output

from datetime import datetime

In [None]:
expenses_per_month = combined_df.groupby('month')['amount'].sum().reset_index()

expenses_per_month_category = combined_df.groupby(['month', 'category'])['amount'].sum().reset_index()

categories = [
    'protein_foods', 'dairy', 'fruits', 'vegetables', 'grains', 'nuts_and_seeds',
    'beverages', 'snacks', 'condiments', 'frozen_foods', 'bakery', 'canned_goods',
    'household', 'personal_care', 'pet_supplies', 'other'
]
colors = ['#87c293','#6074ab','#6b9acf','#8bbde6','#aae0f3','#c8eded',
'#d18b79','#dbac8c','#d18b79','#dbac8c','#e6cfa1','#e7ebbc',
'#b2dba0','#70a18f ','#637c8f', '#949da8','#b56e75','#c98f8f']


  #['#79A5A9', '#6A8FB4', '#F3EAE5', '#EDC7CB', '#756967', '#AD7BA7', '#878787', '#B6BE97', '#A08079','#6A809A',
         # '#518A7D','#9F6772','#A3B0A3','#C9A0A4','#98D8B8','#99C0CD']


play_colors  =['#d5f4e6', '#80ced6', '#fefbd8','#618685',
          "734b5e","bcbdc0","565857","f5d3c8",
          '#ffef96', '#50394c', '#b2b2b2', '#f4e1d2',
           '#f9d5e5','#eeac99', '#e06377', '#c83349', '#5b9aa0', '#d6d4e0', '#b8a9c9', '#622569']
#categories_to_plot = [category for category in categories if category in combined_df['category'].unique()]

# Generate a color map for each category   ##px.colors.qualitative.Pastel
color_map = {category: colors[i % len(colors)] for i, category in enumerate(categories)}

In [None]:
def visualize_expenses_vs_budget(selected_month,  budget):
    filtered_data = expenses_per_month[expenses_per_month['month'] == selected_month]
    total=float(filtered_data.amount)
    percent_budget_left = float(100 - (total / 300) * 100)

    labels = ['Expenses', 'Remaining Budget']
    values = [total, budget - total]

    fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=0.5, hoverinfo='label+value', textinfo='none')])

    fig.update_traces(marker=dict(colors=['rgba(0, 0, 0, 0)', '#80ced6']),sort=False)

    fig.add_annotation(
        text=f'<b>{percent_budget_left:.0f}%<b>',
        x=0.5,
        y=0.53,
        showarrow=False,
        font=dict(size=35),
        align='center'
    )

    fig.add_annotation(
        text='of Budget left',
        x=0.5,
        y=0.4,
        showarrow=False,
        font=dict(size=14),
        align='center'
    )

    fig.update_layout(
        title=dict(text=f'<b>Expenses vs. Budget in {selected_month}</b>',
                   x=0.5,
                   font=dict(size=16, color='Grey', family='Arial, sans-serif')),

        showlegend=False,
        width=500,
        height=500
    )

    return fig

# Visualize expenses vs. budget
fig1 = visualize_expenses_vs_budget("Mar", budget=300)
fig1.show()


  total=float(filtered_data.amount)


In [None]:
def visualize_budget_tracking(expenses_per_month, budget):

    expenses_per_month['percentage_expenses'] = (expenses_per_month['amount'] / budget) * 100
    expenses_per_month['percentage_budget'] = 100 - expenses_per_month['percentage_expenses']

    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=expenses_per_month['month'],
        y=expenses_per_month['percentage_expenses'],
        name='Actual Expenses',
        marker_color='#eeac99',
        text=round(expenses_per_month['amount'],0),
        hovertemplate='<b>%{y:.0f}%<br>Expenses: %{text} EUR<b>',
        textposition='inside',
        insidetextanchor='middle',
        textfont_size=14
    ))
    fig.add_trace(go.Bar(
        x=expenses_per_month['month'],
        y=expenses_per_month['percentage_budget'],
        name='Remaining Budget',
        marker_color='#80ced6',
        text=round(budget - round(expenses_per_month['amount']),0),
        hovertemplate='<b>%{y:.0f}%<br>Remaining Budget: %{text} EUR<b>',
        textposition='inside',
        insidetextanchor='middle',
        textfont_size=14
    ))

    fig.update_layout(
        title=dict(text='<b>Budget Tracking: Expenses vs. Budget per Month</b>',
                   x=0.5,
                   font=dict(size=16, color='Grey', family='Arial, sans-serif')),
        xaxis_title='',
        yaxis=dict(title='Percentage', zeroline=False, showgrid=False),
        plot_bgcolor='rgb(242,242,242)',
        showlegend=True,
        legend=dict(
                orientation="h",
                yanchor="bottom",
                y=-0.2),
        barmode='relative',
        height=500,
        width=700
        #width=len(expenses_per_month['month'])*300
              )

    return fig

fig2 = visualize_budget_tracking(expenses_per_month, budget=300)
fig2.show()

In [None]:
def visualize_pie_chart(selected_month):
    filtered_data = expenses_per_month_category[expenses_per_month_category['month'] == selected_month]
    fig = px.pie(filtered_data, values='amount', names='category',
                 hole=0.4,
                 color='category',
                 #color_discrete_sequence=px.colors.qualitative.Light24,
                 color_discrete_map=color_map,
                 labels={'amount':'Expenses','category':'Category'},
                 width=500,
                 height=500,
                 )

    fig.update_traces(textinfo='percent',
                      insidetextorientation='radial',
                      textposition='inside',
                      hovertemplate="<b>Category: %{customdata}<br>Expenses: %{value} EUR<b>",
                      customdata=expenses_per_month_category['category'])

    fig.update_layout(
        showlegend=False,
        plot_bgcolor='rgb(242,242,242)',
        title=dict(text=f'<b>Category Expenses in {selected_month}</b>',
                   x=0.5,
                   font=dict(size=16, color='Grey', family='Arial, sans-serif'))
)

    return fig

fig3 = visualize_pie_chart("Apr")
fig3.show()

In [None]:
def visualize_category_expenses(expenses_per_month_category):
    fig = px.bar(expenses_per_month_category, x='month', y='amount', color='category',
              barmode='stack',
              #color_discrete_sequence=px.colors.qualitative.Light24,
              color_discrete_map=color_map,
              labels={'amount':'Expenses','category':'Category'},
              height=500,
              width=700
              #width=expenses_per_month_category['month'].nunique()*300
                 )

    fig.update_traces(hovertemplate="<b>Expenses: %{y} EUR<b>")

    fig.update_layout(
    xaxis=dict(showticklabels=True, title='', showgrid=False),
    yaxis=dict(zeroline=False,showgrid=False),
    plot_bgcolor='rgb(242,242,242)',
    title=dict(text='<b>Category expenses per month</b>', x=0.5, font=dict(size=16, color='Grey', family='Arial, sans-serif'))
)
    return fig

fig4 = visualize_category_expenses(expenses_per_month_category)
fig4.show()


In [None]:
def visualize_price_distribution(combined_df):
    fig = px.box(combined_df, x='category', y='price', color='category',
                title='Price Distribution by Category',
                #color_discrete_sequence=px.colors.qualitative.Light24,
                color_discrete_map=color_map,
                height=500,
                width=500
                 )


    fig.update_layout(
                xaxis=dict(showticklabels=False, title=''),
                yaxis=dict(zeroline=False),
                #paper_bgcolor='rgb(233,233,233)',
                plot_bgcolor='rgb(242,242,242)',
                showlegend=False,
                title=dict(text='<b>Price Distribution by Category<b>',
                           x=0.5, font=dict(size=16, color='Grey', family='Arial, sans-serif'))
                      )

    return fig

fig5 = visualize_price_distribution(combined_df)
fig5.show()

In [None]:
def visualize_trend_expenses(combined_df):
    expenses_over_time_category = combined_df.groupby(['date', 'category'])['amount'].sum().reset_index()

    fig = px.line(expenses_over_time_category, x='date', y='amount', color='category',
                  labels={'amount':'Expenses','category':'Category','date':'Date'},
                  #color_discrete_sequence=px.colors.qualitative.Light24,
                  color_discrete_map=color_map,
                  text = expenses_over_time_category['category'],
                  width=700,
                  height=500
                  )
    fig.update_traces(mode="markers+lines",
                      hovertemplate = "<b>%{text}: <br>Expences: %{y} EUR </br> %{x}")

    fig.update_layout(
                  xaxis=dict(showticklabels=True, title='', showgrid=False),
                  yaxis=dict(zeroline=False,showgrid=False),
                  plot_bgcolor='#f0efef',
                  title=dict(text='<b>Trends in expenses over time<b>',
                             x=0.5,
                             font=dict(size=16, color='Grey', family='Arial, sans-serif')),
                      )
    return fig

fig6 = visualize_trend_expenses(combined_df)
fig6.show()

##Deployment

In [None]:
# Get the current month
current_month = datetime.now().strftime('%b')


# Initialize the Dash app
app = Dash()

# Define the layout of the app
app.layout = html.Div([
    # Title for the filters
    #html.Div([
    #    html.H3('Filters', style={'color': 'grey', 'font-weight': 'bold','font-family':'Arial, sans-serif'})
    #], style={'background-color': 'lightgrey', 'width': '1200px','margin-bottom':'0px'}),

    # Filters
    html.Div([
        # Month filter
        html.Div([
            html.H4('Month', style={'color': 'grey', 'font-weight': 'bold','font-family':'Arial, sans-serif'}),
            dcc.Dropdown(
                id='month-dropdown',
                options=[
                   {'label': month, 'value': month} for month in expenses_per_month_category['month'].unique()
                ],
                value=current_month,  # Default value
                clearable=False,
                searchable=False
            )
        ], style={'width': '3cm', 'margin-right': '3mm'}),

        # Budget input filter
        html.Div([
            html.H4('Enter monthly budget', style={'color': 'grey', 'font-weight': 'bold','font-family':'Arial, sans-serif'}),
            dcc.Input(
                id='budget-input',
                type='number',
                placeholder='Enter budget...',
                value=300,  # Default value
                min=0,
                max=10000,
                step=10
            )
        ], style={'width': '400px'})
    ], style={'display': 'flex','background-color': 'lightgrey', 'padding': '20px', 'width': '1160px'}),

    # Row 1
    html.Div([
        dcc.Graph(id='fig1'),  # Placeholder for Figure 1
        dcc.Graph(id='fig2')   # Placeholder for Figure 2
    ], style={'display': 'flex'}),  # Arrange Figures 1 and 2 side by side

    # Row 2
    html.Div([
        dcc.Graph(id='fig3'),  # Placeholder for Figure 3
        dcc.Graph(id='fig4')   # Placeholder for Figure 4
    ], style={'display': 'flex'}),  # Arrange Figures 3 and 4 side by side

    # Row 3
    html.Div([
        dcc.Graph(id='fig5'),  # Placeholder for Figure 5
        dcc.Graph(id='fig6')   # Placeholder for Figure 6
    ], style={'display': 'flex'})  # Arrange Figures 5 and 6 side by side
])


# Define callback to update figures based on filters
@app.callback(
    [Output('fig1', 'figure'),
     Output('fig2', 'figure'),
     Output('fig3', 'figure'),
     Output('fig4', 'figure'),
     Output('fig5', 'figure'),
     Output('fig6', 'figure')],
    [Input('month-dropdown', 'value'),
     Input('budget-input', 'value')]
)
def update_figures(selected_month, budget):
    fig1 = visualize_expenses_vs_budget(selected_month, budget)
    fig2 = visualize_budget_tracking(expenses_per_month, budget)
    fig3 = visualize_pie_chart(selected_month)
    fig4 = visualize_category_expenses(expenses_per_month_category)
    fig5 = visualize_price_distribution(combined_df)
    fig6 = visualize_trend_expenses(combined_df)

    return fig1, fig2, fig3, fig4, fig5, fig6

# Run the Dash app
if __name__ == '__main__':
    app.run_server(debug=True, use_reloader=True)


<IPython.core.display.Javascript object>

In [None]:
from dash import Dash, dcc, html, Input, Output, State
import base64
from PIL import Image
import io
import paddleocr
import llm

# Initialize the Dash app
app = Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    html.H1("Image Text Extraction and Visualization"),

    # Upload image component
    dcc.Upload(
        id='upload-image',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select an Image')
        ]),
        style={
            'width': '50%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        multiple=False
    ),

    # Display extracted text
    html.Div(id='extracted-text'),

    # Button to trigger processing
    html.Button('Process Image', id='process-image', n_clicks=0),

    # Display structured code
    html.Div(id='structured-code'),

    # Display combined data table
    dcc.Graph(id='data-table'),

    # Display visualization
    dcc.Graph(id='visualization')
])

# Define callback to handle image upload
@app.callback(
    Output('extracted-text', 'children'),
    [Input('upload-image', 'contents')],
    [State('upload-image', 'filename')]
)
def extract_text(contents, filename):
    if contents is not None:
        # Decode image
        content_type, content_string = contents.split(',')
        decoded_image = base64.b64decode(content_string)
        image = Image.open(io.BytesIO(decoded_image))

        # Extract text using PaddleOCR
        ocr = paddleocr.PaddleOCR()
        result = ocr.ocr(image)
        extracted_text = ' '.join([line[1][0] for line in result])

        return html.Div([
            html.H3('Extracted Text:'),
            html.Pre(extracted_text)
        ])

# Define callback to handle image processing
@app.callback(
    Output('structured-code', 'children'),
    [Input('process-image', 'n_clicks')],
    [State('extracted-text', 'children')]
)
def generate_structured_code(n_clicks, extracted_text):
    if n_clicks > 0 and extracted_text:
        # Generate structured code using LLm
        structured_code = llm.generate_code(extracted_text)

        return html.Div([
            html.H3('Structured Code:'),
            html.Pre(structured_code)
        ])

# Define callback to handle data combination and visualization
@app.callback(
    Output('data-table', 'figure'),
    Output('visualization', 'figure'),
    [Input('process-image', 'n_clicks')],
    [State('extracted-text', 'children')]
)
def process_and_visualize_data(n_clicks, extracted_text):
    # Process and combine data
    # (Replace this with your own data processing logic)
    combined_data = process_data(extracted_text)

    # Visualize data
    # (Replace this with your own data visualization logic)
    fig = visualize_data(combined_data)

    return fig, fig

# Function to process data (replace with your own data processing logic)
def process_data(extracted_text):
    # Placeholder function
    return extracted_text

# Function to visualize data (replace with your own data visualization logic)
def visualize_data(data):
    # Placeholder function
    return {'data': [], 'layout': {}}

# Run the Dash app
if __name__ == '__main__':
    app.run_server(debug=True)
