Authors: Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, Jinwook Seo
- The paper proposes a method to classify charts and extract data from charts.
- Challenges:
- Large number of varying chart styles , hence difficult to use a simple data extraction approach
- Work around for data items that are hidden/occluded, differentiating between 2 different trends represented in the chart
- Image to Text conversion ie. extracting data from image and convert to appropriate text. Character recognition could be a viable option however, the existing methods have low accuracy.
- ReVision, a tool used for automatic chart data extraction uses a 2 stage pipeline, ie. chart classification followed by data extraction (Drawbacks: low extraction accuracy for practical use of data)
- Mixed Initiative Interfaces, enables effective collaboration with intelligent agents and incorporates feedback based system to handle uncertainity in user input.
- Chart Data Extraction, utilizing edge detection/vectorization for extracting data from chart images using ML and image processing. This work is partly utilized by ChartSense as well.
- iVoLVER, another web-based tool for data extraction in chart images which allows users to classify the type of data to be extracted. (Drawbacks: large number of user inputs required for improving data extraction accuracy)
- Other tools like DataThief extract data from line charts and is dependent on user input for identifying region from where data needs to be extracted.
- Uses a deep learning based chart classifier and extracts data with mixed-initiative interaction.
- Uses user feedback based mixed-initiative interaction combined with image processing methods for data extraction with improved accuracy and efficiency of data extraction.
- Network Architecture:
- Classification Task: Uses simple CNN structure for image classification, called GoogLeNet
- Data Extraction Task: Uses a user interaction based model ("mixed initiative interface") to assist data extraction. For eg. asking users to identify colors representing graphs to improve accuracy.
- Dataset:
- Classification: Input: 256x256x3 with the dataset split into training and validation as 80% and 20%. For a larger dataset, ChartSense performs better than any of the present chart data extractors.
- Data Extraction: With 3% error rate and outperforming WebPlotDigitizer , ChartSense seems to be useful.