Skip to content

Latest commit

 

History

History
42 lines (38 loc) · 1005 Bytes

README.md

File metadata and controls

42 lines (38 loc) · 1005 Bytes

PDF 2 NER

Web application to convert scanned PDF files to text-based data and apply Named Entity Recognition (NER) to extract entities in Spanish

Created by: Fer Aguirre

Directory Structure

├── app.py
├── assets
│   └── pdfs
├── config.ini
├── config.ini.secret
├── data
│   ├── processed
│   └── raw
├── docs
│   ├── data-dictionary.md
│   ├── explore-data.md
│   ├── references
│   └── reports
├── LICENSE
├── notebooks
│   ├── 0.0-testing-nlp-models.ipynb
│   ├── 1.0-scraping-data.ipynb
│   └── 2.0-analyzing-data.ipynb
├── outputs
│   ├── figures
│   └── tables
├── pdf_2_ner
│   ├── data
│   ├── __init__.py
│   └── utils
├── Pipfile
├── Pipfile.lock
├── README.md
└── setup.py

License

This project is released under MIT License.