Skip to content

A program that automatically scans and extract information from the forms and records submitted by users

Notifications You must be signed in to change notification settings

jlozion026/dashlab_challenge

Repository files navigation

MediScanFlow

MediScanFlow is an application seamlessly integrates Optical Character Recognition (OCR) to automate the classification, extraction, transformation, and loading of Department of Health (DOH) medical forms data directly into a Spreadsheet.

What does the project do?

  1. Integration of Optical Character Recognition (OCR): The application incorporates Azure AI Document Intelligence OCR technology, enabling the recognition and interpretation of text from images or scanned documents.

  2. Automated Classification, Extraction, Transformation, and Loading (ETL): The application performs a series of automated tasks, including the classification of medical forms, extraction of relevant data, transformation of the data into a structured format, and loading it into a spreadsheet.

  3. DOH Medical Forms: The application specifically focuses on handling data from Department of Health (DOH) medical forms. It can classify various forms, including HIV Certificate Forms, Medical Certificate for Land-based Overseas Workers, Medical Certificate for Service at Sea, Medical Examination Report for Land-based Overseas Workers, Medical Examination Report for Seafarers, and Tabulated Psychological Evaluation Form.

  4. Export to a Spreadsheet File: The end result of the processing is the creation of a spreadsheet file, indicating that the extracted and transformed data is organized and presented in a spreadsheet format.

Promotional Video

Motivation

Our project was created to compete with an "AI and Machine Learning Challenge," aiming to create a smart program that can quickly gather information from medical forms.

Time-Efficiency: Automation minimizes manual data entry efforts, swiftly processing large volumes of medical forms for heightened operational efficiency.

Enhanced Accessibility: Digitally organizing and storing extracted data facilitates easy analysis, reporting, and integration with other systems, improving overall healthcare information accessibility and usability.

Enhanced Focus: Automating routine tasks allows healthcare professionals to redirect efforts towards value-added activities like patient care, research, and decision-making.


Getting Started

To run this project locally, you'll need to set up a virtual environment and install the required dependencies. Follow the steps below to get started.

Prerequisites

  • Python (3.10.11 or higher) installed on your system.

Setting up a Virtual Environment

A virtual environment is a way to isolate your project's dependencies. It's a good practice to use one to avoid conflicts with other projects. To set up a virtual environment, follow these steps:

  1. Open a terminal in the root directory of your project.

  2. Run the following command to create a virtual environment:

    python -m venv venv
  3. Activate the virtual environment:

    venv\Scripts\activate

    You'll now be working within the virtual environment, and you can deactivate it by running deactivate in the terminal.

Installing Dependencies

This project uses a requirements.txt file to specify its dependencies. To install these dependencies, follow these steps:

  1. Make sure your virtual environment is activated (as explained in the previous section).

  2. Run the following command to install the dependencies:

    pip install -r requirements.txt

Running the Project

Now that you have set up the virtual environment and installed the dependencies, you can run the project.

Simply run this in the root directory of your project.

python main.py

How to use

In this section, you will see a demonstration of how to use the created application for Dashlabs. You can check this video or read through it. Promotional Video

Home Screen

This is our home screen, it has four buttons. Button for uploading a single file, uploading a folder for processing multiple files, inserting the data extracted JSON to main CSV, and View CSV

image

Upload a Single File

  1. To process a single file simply click upload a file button

    image

  2. After clicking the button, a file dialog will pop up. Choose a form you want to process. For this example, I will be processing a landbase_cert_3.jpg file or a Medical Certificate for Landbased Overseas Workers

    image

    image

Processing Document

  1. A new window will pop up that has an empty canvas and a process document button image

    image

    This button will start to process document

  2. During the processing of document, a print diagnostic will be shown. image

    File Information

    • File being processed: landbase_cert_3.jpg - Tells what file is under the process
    • MIME Type: image/jpeg - Tells the content of the file which a jpeg file

    Document Classifier Status - This tells the status of Document Classifier

    • Document Classifier Status (1): running
    • Document Classifier Status (2): succeeded

    Document Information

    • Document Type: Landbase Certificate - This tells that the Landbase Certificate is the model being processed
    • Accuracy: 23.7% - Please note that the accuracy of 23.7% suggests that the document classifier may have identified the document type with a relatively low confidence level. Further review and validation may be required, depending on the specific use case.

    Processing Steps

    1. Processing Document... - The system initiated the processing of the document.

    2. Request Successful - The initial request to process the document was successful.

    3. Creating JSON file - The system generated a JSON file, possibly containing the extracted information.

    4. Text extraction successful to landbase_cert_3.jpg - The text extraction process from the document (landbase_cert_3.jpg) was successful.

    Processing Result

    • Processing Successful! - The document processing was completed successfully, and the extracted text in json file is ready for insert

Upload a Multiple File by uploading a folder

  1. In your file explorer add the files you want to process. For this example I will be uploading for upload forms folder with the following content

    image image.

  2. Click the Upload a Folder button

    image

  3. Choose the for upload forms folder. Do step 4 for processing document and see the printing status. Please be noted that the window screen will freeze when it starts to process the form. To see live status refrain from touching the window screen.

    image

    HIV Certificate Processed

    image

    Medical Certificate for Landbase Overseas Workers

    image

    Medical Certificate for Service at Sea

    image

    Pyschological Evaluation Form

    image

    Medical Examination Report for Landbased Overseas Workers

    image

    Medical Examination Report for Seafarers

    image

Insert Data to CSV

  1. In the home screen there is a insert data button. The purpose of this button was to actually use the json data extracted from processing the document, convert it to a dataframe, and then insert it to a dedicated csv file for storage.

    image

  2. After click the insert data button, a window will pop up telling the result of insertion

    image

  3. When you clicked the insert data button data again, a new message will show telling No dataframes to concatenate. Upload a form. This is just common since we successfully inserted the data in csv and the json files are move to new folder.

    image

Viewing of CSV

  1. To view the stored data, simply click the View CSV button.

    image

  2. After clicking the View CSV button, a new window will pop up displaying different buttons dedicated for each form.

    image

  3. As an example, let us view the Medical Examination Report for Overseas Workers.

    image

  4. It will automatically open the csv file holding the extracted data. You can edit and fix the wrongly extracted data

    image

ZIP and Encrypt All CSV Files

  1. Click the button ZIP and Encrypt All CSV Files.

    image

  2. After clicking the ZIP and Encrypt All CSV Files a pop up will show asking you to enter the name of your zip file.

    image

  3. After entering your name a pop up will show asking you to add a password to your zip file

    image

  4. After setting the password a filedialog will pop up. This will help you navigate where to put your zip file

    image

  5. You can now access the downloaded zipped file. Note that use winrar to open and extract the files.

    image

    image

About

A program that automatically scans and extract information from the forms and records submitted by users

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages