Detects layout regions (titles, paragraphs, lists) in handwritten note images using DocLayout-YOLO, then runs a custom character-level OCR model to convert them into structured Markdown text.
- Python 3.11 — TensorFlow does not support 3.12+ on Windows yet
- Git
git clone https://github.com/Gilliooo/Layout-Parser-ComputerVision.git
cd Layout-Parser-ComputerVisionWindows:
winget install Python.Python.3.11Mac / Linux: Download from python.org
Windows (PowerShell):
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
py -3.11 -m venv .venv --without-pip
.venv\Scripts\Activate.ps1
python -m ensurepip --upgradeMac / Linux:
python3.11 -m venv .venv
source .venv/bin/activatepython -m pip install -r requirements.txtpython -m streamlit run app.pyOn first run with DocLayout-YOLO mode selected, the model weights (~100 MB) are automatically downloaded from Hugging Face and cached locally.
handwriting_recognition_model.kerasandclasses.jsonare included in the repo — no manual download needed.- The DocLayout-YOLO weights are fetched from
juliozhao/DocLayout-YOLO-DocStructBenchon first use. - Use
python -m streamlitinstead of juststreamlitto ensure the venv's Python is used.