MediScanFlow is an application seamlessly integrates Optical Character Recognition (OCR) to automate the classification, extraction, transformation, and loading of Department of Health (DOH) medical forms data directly into a Spreadsheet.
-
Integration of Optical Character Recognition (OCR): The application incorporates Azure AI Document Intelligence OCR technology, enabling the recognition and interpretation of text from images or scanned documents.
-
Automated Classification, Extraction, Transformation, and Loading (ETL): The application performs a series of automated tasks, including the classification of medical forms, extraction of relevant data, transformation of the data into a structured format, and loading it into a spreadsheet.
-
DOH Medical Forms: The application specifically focuses on handling data from Department of Health (DOH) medical forms. It can classify various forms, including HIV Certificate Forms, Medical Certificate for Land-based Overseas Workers, Medical Certificate for Service at Sea, Medical Examination Report for Land-based Overseas Workers, Medical Examination Report for Seafarers, and Tabulated Psychological Evaluation Form.
-
Export to a Spreadsheet File: The end result of the processing is the creation of a spreadsheet file, indicating that the extracted and transformed data is organized and presented in a spreadsheet format.
Our project was created to compete with an "AI and Machine Learning Challenge," aiming to create a smart program that can quickly gather information from medical forms.
Time-Efficiency: Automation minimizes manual data entry efforts, swiftly processing large volumes of medical forms for heightened operational efficiency.
Enhanced Accessibility: Digitally organizing and storing extracted data facilitates easy analysis, reporting, and integration with other systems, improving overall healthcare information accessibility and usability.
Enhanced Focus: Automating routine tasks allows healthcare professionals to redirect efforts towards value-added activities like patient care, research, and decision-making.
To run this project locally, you'll need to set up a virtual environment and install the required dependencies. Follow the steps below to get started.
- Python (3.10.11 or higher) installed on your system.
A virtual environment is a way to isolate your project's dependencies. It's a good practice to use one to avoid conflicts with other projects. To set up a virtual environment, follow these steps:
-
Open a terminal in the root directory of your project.
-
Run the following command to create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
venv\Scripts\activate
You'll now be working within the virtual environment, and you can deactivate it by running
deactivate
in the terminal.
This project uses a requirements.txt file to specify its dependencies. To install these dependencies, follow these steps:
-
Make sure your virtual environment is activated (as explained in the previous section).
-
Run the following command to install the dependencies:
pip install -r requirements.txt
Now that you have set up the virtual environment and installed the dependencies, you can run the project.
Simply run this in the root directory of your project.
python main.py
In this section, you will see a demonstration of how to use the created application for Dashlabs. You can check this video or read through it. Promotional Video
This is our home screen, it has four buttons. Button for uploading a single file, uploading a folder for processing multiple files, inserting the data extracted JSON to main CSV, and View CSV
-
To process a single file simply click
upload a file
button -
After clicking the button, a file dialog will pop up. Choose a form you want to process. For this example, I will be processing a landbase_cert_3.jpg file or a Medical Certificate for Landbased Overseas Workers
-
A new window will pop up that has an empty canvas and a process document button
This button will start to process document
-
During the processing of document, a print diagnostic will be shown.
File Information
- File being processed: landbase_cert_3.jpg - Tells what file is under the process
- MIME Type: image/jpeg - Tells the content of the file which a jpeg file
Document Classifier Status - This tells the status of Document Classifier
- Document Classifier Status (1): running
- Document Classifier Status (2): succeeded
Document Information
- Document Type: Landbase Certificate - This tells that the Landbase Certificate is the model being processed
- Accuracy: 23.7% - Please note that the accuracy of 23.7% suggests that the document classifier may have identified the document type with a relatively low confidence level. Further review and validation may be required, depending on the specific use case.
Processing Steps
-
Processing Document... - The system initiated the processing of the document.
-
Request Successful - The initial request to process the document was successful.
-
Creating JSON file - The system generated a JSON file, possibly containing the extracted information.
-
Text extraction successful to landbase_cert_3.jpg - The text extraction process from the document (landbase_cert_3.jpg) was successful.
Processing Result
- Processing Successful! - The document processing was completed successfully, and the extracted text in json file is ready for insert
-
In your file explorer add the files you want to process. For this example I will be uploading for upload forms folder with the following content
-
Click the
Upload a Folder
button -
Choose the for upload forms folder. Do step 4 for processing document and see the printing status. Please be noted that the window screen will freeze when it starts to process the form. To see live status refrain from touching the window screen.
HIV Certificate Processed
Medical Certificate for Landbase Overseas Workers
Medical Certificate for Service at Sea
Pyschological Evaluation Form
Medical Examination Report for Landbased Overseas Workers
Medical Examination Report for Seafarers
-
In the home screen there is a
insert data
button. The purpose of this button was to actually use the json data extracted from processing the document, convert it to a dataframe, and then insert it to a dedicated csv file for storage. -
After click the
insert data
button, a window will pop up telling the result of insertion -
When you clicked the
insert data
button data again, a new message will show telling No dataframes to concatenate. Upload a form. This is just common since we successfully inserted the data in csv and the json files are move to new folder.
-
To view the stored data, simply click the
View CSV
button. -
After clicking the
View CSV
button, a new window will pop up displaying different buttons dedicated for each form. -
As an example, let us view the
Medical Examination Report for Overseas Workers
. -
It will automatically open the csv file holding the extracted data. You can edit and fix the wrongly extracted data
-
Click the button
ZIP and Encrypt All CSV Files
. -
After clicking the
ZIP and Encrypt All CSV Files
a pop up will show asking you to enter the name of your zip file. -
After entering your name a pop up will show asking you to add a password to your zip file
-
After setting the password a filedialog will pop up. This will help you navigate where to put your zip file
-
You can now access the downloaded zipped file. Note that use winrar to open and extract the files.