Small utility for retrieving data from figures. Inspired by the Java package of the same name.
The usual: pip install datathief
.
Unlike the Java DataThief package and similar online tools, here the user manually annotates the figure with the data points of their choosing. This makes it more transparent how the data are being read and makes the results more reproducible. However, it might be annoying for a large amount of data.
If you want to extract a lot of data, or extract data from a continuous line, you are better off using the original Java DataThief package, or one of the many online tools that do exactly this.
To use this tool, first annotate the plot by adding a single pixel at the start and end of the x-axis in a specified color that does not exist anywhere else in the image (default color: pure blue). Do the same for the y-axis (default color: pure red). Then one pixel for each data point you wish to extract (default color: pure green). This function will then return the x and y coordinates of each data point. It will warn you if too many or too few pixels are detected.
For example, running this code:
import datathief as dt
filename = 'du_fig1a_annotated.png'
xlim = [-10, 20]
ylim = [0, 15]
data = dt.datathief(filename, xlim=xlim, ylim=ylim)
On this input (NB, you might need to zoom in to see the individual pixels):
Extracts the data for this plot:
See the examples folder for more information. (Figure courtesy Du et al., https://www.medrxiv.org/content/10.1101/2020.02.19.20025452v4)
More questions? Email info@sciris.org.