Skip to content

ML API to perform OCR text extraction on receipt images and push the extracted then classified data to firebase.

Notifications You must be signed in to change notification settings

ssandra102/Machine-Learning-API

Repository files navigation

Machine-Learning-API

Flask API developed using Python Flask, to extract texts from Indian itemized receipts. The extracted text is classified into categories belonging to one of the 21 categories(see 'data' dictinonary in main.py).
The cumulative sum of categorised items along with their respective category is pushed to Firebase Realtime Database. Also, the receipt image is fetched from Firebase Storage.

Files

  1. main.py - The API is written in this file. Run the file using the command:
python main.py
## or
python3 main.py


The frontend is a webpage with the text "Hello World".

2. Fetch_images.py - contains configuration details of firebase, that you will get by creating a new project in firebase.

3. serviceAccount.json - contains configuration details of Firebase storage database.
note: replace congiration details in Fetch_images.py and serviceAccount.json files with your own details.

4. requirements.txt - contains the libraries used for the project.

5. SVC_model.pkl - a pickle file used for categorising, receipt items. It is a SVC model, used for multi-text classification with 3 pre-processing steps done on the text. They are coded as a pipeline with the following functions: removing stopwords, porter stemming, and tf-idf vectoriser.

6. Categorization.ipynb - notebook with all the steps used to develop SVM model i.e SVC_model.pkl.

Dataset

DATA1.csv - 11179 rows with 3 columns of Indian product desccription, sub category, and category.

About

ML API to perform OCR text extraction on receipt images and push the extracted then classified data to firebase.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published