# Running Llava: a large multi-modal model on Google Colab

Run Llava model on a Google Colab!

Llava is a multi-modal image-text to text model that can be seen as an "open source version of GPT4". It yields to very nice results as we will see in this Google Colab demo.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/FPshq08TKYD0e-qwPLDVO.png)

The architecutre is a pure decoder-based text model that takes concatenated vision hidden states with text hidden states.

We will leverage QLoRA quantization method and use `pipeline` to run our model.

In [8]:
# !pip install -q -U transformers==4.37.2
!pip install -q bitsandbytes==0.41.3 accelerate==0.25.0
# !pip install PyMuPDF

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [9]:
!pip install pytesseract

Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Downloading pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13


In [10]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [12]:
import fitz  # PyMuPDF
import pytesseract
from PIL import Image
import re
import json

# Path to your PDF
pdf_path = "/test.pdf"

def ocr_pdf_to_text(pdf_path):
    """Convert PDF pages to OCR text using Tesseract"""
    doc = fitz.open(pdf_path)
    results = []
    for i, page in enumerate(doc):
        pix = page.get_pixmap(dpi=300)  # render at 300 dpi for accuracy
        img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
        text = pytesseract.image_to_string(img, lang="eng")
        results.append(text)
    return results

def parse_export_license(text_pages):
    """Extract structured fields into JSON"""
    data = {}

    # Combine all pages into one text block
    full_text = "\n".join(text_pages)

    # ---------------- Contact Information ----------------
    data["ContactInformation"] = {
        "ReferenceNumber": re.search(r"Reference Number\s+(\S+)", full_text).group(1) if re.search(r"Reference Number\s+(\S+)", full_text) else None,
        "ContactPerson": re.search(r"1\. Contact Person.*\n(.*)", full_text).group(1).strip() if re.search(r"1\. Contact Person.*\n(.*)", full_text) else None,
        "Telephone": re.search(r"Telephone Number.*\n(\d+)", full_text).group(1) if re.search(r"Telephone Number.*\n(\d+)", full_text) else None,
        "Email": re.search(r"Email\s*\n([^\s]+@[^\s]+)", full_text).group(1) if re.search(r"Email\s*\n([^\s]+@[^\s]+)", full_text) else None,
        "CreationDate": re.search(r"Creation Date\s*\n(\d{2}/\d{2}/\d{4})", full_text).group(1) if re.search(r"Creation Date\s*\n(\d{2}/\d{2}/\d{4})", full_text) else None,
        "ApplicationType": re.search(r"Type of Application\s*\n(.+)", full_text).group(1).strip() if re.search(r"Type of Application\s*\n(.+)", full_text) else None
    }

    # ---------------- Applicant Information ----------------
    applicant_match = re.search(r"CIN \(Applicant ID\)\s*([\w\d]+)\s*(.*?)\nAddress", full_text, re.DOTALL)
    if applicant_match:
        data["ApplicantInformation"] = {
            "CIN": applicant_match.group(1),
            "Name": applicant_match.group(2).strip()
        }

    # ---------------- Purchaser Information ----------------
    purchaser_match = re.search(r"Purchaser\s*\n(.*?)\n\nAddress 1\s*(.*?)\n", full_text, re.DOTALL)
    if purchaser_match:
        data["PurchaserInformation"] = {
            "Name": purchaser_match.group(1).strip(),
            "Address": purchaser_match.group(2).strip()
        }

    # ---------------- Intermediate Consignee ----------------
    consignee_match = re.search(r"Intermediate Consignee\s*\n(.*?)\n", full_text, re.DOTALL)
    if consignee_match:
        data["IntermediateConsignee"] = consignee_match.group(1).strip()

    # ---------------- Document Checklist ----------------
    checklist_items = []
    checklist_section = re.search(r"Document Checklist(.*?)(Applicant Information|License Information)", full_text, re.DOTALL)
    if checklist_section:
        lines = checklist_section.group(1).splitlines()
        for line in lines:
            line = line.strip()
            if not line:
                continue
            # detect checkbox markers (OCR may output _, CJ, ✔, etc.)
            checked = bool(re.match(r"^[_CJ\[\(✔]", line))
            # clean up item text
            item = re.sub(r"^[_CJ\[\(✔\)]+", "", line).strip(" -")
            checklist_items.append({"item": item, "selected": checked})
    if checklist_items:
        data["DocumentChecklist"] = checklist_items

    return data

if __name__ == "__main__":
    # Step 1: OCR all pages
    pages_text = ocr_pdf_to_text(pdf_path)

    # Step 2: Parse into JSON
    extracted_data = parse_export_license(pages_text)

    # Step 3: Print JSON
    print(json.dumps(extracted_data, indent=4))


{
    "ContactInformation": {
        "ReferenceNumber": "SLV0530",
        "ContactPerson": "Shelley Vybiral",
        "Telephone": "6302003543",
        "Email": "shelley.vybiral@cmcelectronics.us",
        "CreationDate": "05/30/2025",
        "ApplicationType": "Export License Application"
    },
    "ApplicantInformation": {
        "CIN": "C702375",
        "Name": "Address 1\n84 N. Dugan Road\n\nCity\nSugar Grove\n\nState/Province\nIllinois\n\nEIN\n363503592\n\nOther Party Information\n15. Other Party ID"
    },
    "PurchaserInformation": {
        "Name": "PILATUS AIRCRAFT LIMITED",
        "Address": "Address 2"
    },
    "IntermediateConsignee": "Hellmann Worldwide Logistics AG",
    "DocumentChecklist": [
        {
            "item": "6. Documents submitted with application 7. Documents on file with applicant",
            "selected": false
        },
        {
            "item": "Export Items (BIS-748P-A) (_) Bis-711",
            "selected": false
        },
        {


In [14]:
import fitz  # PyMuPDF
import pytesseract
from PIL import Image

# Path to your PDF
pdf_path = "/test.pdf"

def ocr_pdf_to_text(pdf_path, output_txt="output.txt"):
    """Extract text from all pages of PDF using OCR and save to a file"""
    doc = fitz.open(pdf_path)
    all_text = []

    for i, page in enumerate(doc):
        # Convert each page to image
        pix = page.get_pixmap(dpi=300)
        img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

        # OCR using Tesseract
        text = pytesseract.image_to_string(img, lang="eng")

        # Save per-page text
        page_header = f"\n\n===== PAGE {i+1} =====\n\n"
        all_text.append(page_header + text.strip())

    # Combine all pages into one text string
    full_text = "\n".join(all_text)

    # Save to file
    with open(output_txt, "w", encoding="utf-8") as f:
        f.write(full_text)

    return full_text

if __name__ == "__main__":
    extracted_text = ocr_pdf_to_text(pdf_path)
    print(extracted_text[:2000])  # print first 2000 characters as a preview




===== PAGE 1 =====

= An official website of the United States government Here's how you know v

Bureau of Industry and Security

U.S. Department of Commerce

 

Export License Application _ status (¢ompzetes=xpPROVEDW/CONDITIONS)

Contact Information

Reference Number
SLV0530

1. Contact Person (First Name, Last Name)
Shelley Vybiral

2. Telephone Number 3. Fax Number
6302003543 -

Email
shelley.vybiral@cmcelectronics.us

4. Creation Date
05/30/2025

5. Type of Application
Export License Application

Document Checklist

6. Documents submitted with application 7. Documents on file with applicant
Export Items (BIS-748P-A) (_) Bis-711

CJ End Users (BIS-748P-B) CJ Letter of Assurance

CJ BIS-711 CJ Import/End-User Certificate
Import/End-User Certificate CJ Nuclear Certification

Technical Specification
C) P Other
CJ Letter of Explanation -

(_) Foreign Availability

Other


===== PAGE 2 =====

purchase order

License Information

9. Special Purpose

10. Resubmission ACN

13. Import Cer

In [15]:
print(extracted_text)



===== PAGE 1 =====

= An official website of the United States government Here's how you know v

Bureau of Industry and Security

U.S. Department of Commerce

 

Export License Application _ status (¢ompzetes=xpPROVEDW/CONDITIONS)

Contact Information

Reference Number
SLV0530

1. Contact Person (First Name, Last Name)
Shelley Vybiral

2. Telephone Number 3. Fax Number
6302003543 -

Email
shelley.vybiral@cmcelectronics.us

4. Creation Date
05/30/2025

5. Type of Application
Export License Application

Document Checklist

6. Documents submitted with application 7. Documents on file with applicant
Export Items (BIS-748P-A) (_) Bis-711

CJ End Users (BIS-748P-B) CJ Letter of Assurance

CJ BIS-711 CJ Import/End-User Certificate
Import/End-User Certificate CJ Nuclear Certification

Technical Specification
C) P Other
CJ Letter of Explanation -

(_) Foreign Availability

Other


===== PAGE 2 =====

purchase order

License Information

9. Special Purpose

10. Resubmission ACN

13. Import Cer