-
Notifications
You must be signed in to change notification settings - Fork 4
Home
AI4WRD was first developed as a non-invasive data extraction method for window based applications, mainly for software that does not support data output either through APIs to a generally usable format such as JSON or CSV. AI4WRD extracts data using optical character recognition on video streams, working with both video capture cards, virtual cameras and video cameras. Given that such software might have multiple screens and multiple sections of texts to be cropped, AI4WRD makes use of the SIFT and FLANN algorithm to match crops to specific frame from a video stream. More details below.
This guide will first detail a standard workflow, then dive into a few of the additional features of the software.
Use the navigation drop down menu to navigate between the various sections of the app. The app currently consists of three sections:
The main workflow consists of the following:
Choosing video stream -> capturing screenshots of specific screens -> cropping specific sections of each of the screens -> starting optical character recognition
The rest of the section will guide you through this workflow in detail:
Load Frame Screen
Use the drop down menu to select additional languages to detect. Currently English is the default language, with the option to simultaneously detect either traditional or simplified Chinese too.
Use the drop down menu to select the video stream that you want to perform Optical Character Recognition on, then click the run widget to preview the video stream.
Capture the necessary screenshots of the various screens that you would like to perform optical character recognition on. Later on, you would get to define crops for each of the screens and the software would automatically detect and crop the relevant video stream to perform optical character recognition.
Screenshots are captured using the Capture screenshot button. The screenshots captured will be displayed bellow the button.
Crop Screen
Drop down menu to select the screenshot to crop. Crops will be automatically saved and associated with the specified screenshots internally.
Drag the box to specify sections of the screen to crop. Later the application will perform Optical Character Recognition on the specified crops. A screen below will preview the selected crop.
Once you are satisfied with the crop, click the crop button and the crop will be saved listed below. You can proceed to perform as many crops as you like.
You can additionally specify if you would like to see the crop preview in real time, the color of cropping box, and the zoom level of the crops.
Optical Character Recognition Livestream Screen
Click on the Done Crop check box to initialize the models. One the model is initialized, optical character recognition will begin. The application extracts features from the videos stream using the Scale Invariant Feature Transform algorithm and matches the video stream with associated crops using the FLANN algorithm.
Note: If this is your first time running the program, it might take some time as the software will need to download the required models. Please ensure that your internet connection is stable. Check the terminal output if it does not respond after a significant amount of time, if it still does not respond you might need to restart the program.
Drop down menu to select the optical character recognition library to be used. Currently there are two libraries available: Tesseract and Easy-Ocr. We recommend tesseract for printed characters on screens and easy-ocr for streams from video cameras.
Specifies the minimum confidence level for optical character recognition. If the confidence level drops bellow the specified level the text will not be displayed.
Preview of the current video stream
Note that the video stream is the same stream selected in the load frame page. If you would like to select a different video stream, please return to the load frame page using the navigation drop down menu and select a different stream.
Optical Character Recognition output for each crop will be displayed at the bottom of the screen.
Configurations of screenshots and associated crops may be saved and later reused using the save and reload functionality. You can access the functionality on the Crop page. Simply specify the output directory and filename and click "save screen and crop configuration". To reload the screenshot and crops, simply input the path to a previous save-file and click "load screen and crop configuration".
Configurations do not contain video stream information, just the screenshots and associated crops, the stream must be specified in the load frame page, accessible from the navigation drop down menu at the top of the page.
Warning the save and load functionality makes use of the python pickle module, which allows arbitrary code to be run. Please only load configuration files that you trust. More information about the security of the pickle module is available here.
The software provides functionality to output detected characters to CSV and MQTT. You can access the functionality on the OCR-Livestream page.
There are two modes for csv output. "Save previous to csv" saves all detected text since starting the OCR Livestream to a csv file. "Save continuous to csv" first creates a csv file and then continuously appends detected characters to the csv file. The CSV file will contain the timestamps as well as the detected characters.
To use merely input the path and filename that you would like to save the csv file too and click on one of the buttons.
The software also provides functionality to output detected characters through the MQTT protocol. Just specify the broker, port and topic. Click publish to MQTT server and then the software will start publishing the output of the Optical Character Recognition.
Note, if there are multiple crops, each crop will be output to a different topic as follows:
<user-specific-topic><crop-number>
For instance, if there are 2 crops and the topic specified is ai4wrdOutput, the text from crop 1 will be published to ai4wrdOutput1 and the text from crop 2 will be published to ai4wrOutput2.