Skip to content
Anubhab Chakraborty edited this page Feb 21, 2022 · 4 revisions

pyamiimage

pyamiimage is a tool to extract semantic information from diagrams. The diagrams can be pathway diagrams, plots, charts or more.

Story

We had an idea a few months back - we should try to build an automated tool that could extract semantic data from an image. An example of the is given as follows:

Image: Screenshot from 2022-02-20 23-47-54

.reaction
{
Reactants: {Diene, Dienophile};
Products: {endo product};
Temperature: {423};
Time: {15hrs}
Pressure: {?}
Catalysts: {o-xylene};
Reversible: {False};
Exothermic: {?};
}

We started out on a specific use case: extracting biosynthetic pathways -- which is like a series of reactions to form different products. We are looking at terpene synthase pathways - biosynthetic pathways in plants that synthesize terpenes. Terpenes are the aromatic compounds found in aromatic plants such as tea, citrus, grapes. These are the compounds that lend such plants with their specific aromas.

Terpenes are synthesized in the plant leaves, flowers and fruits using various terpene synthase pathways. There are two known pathways: The MVA pathway and the MEP pathway, both lead to the same isomeric product: IPP and DMAPP. These compounds are the root of all terpenes. So, we'll start with a diagram such as this:

image

And extract all the relevant pathway information from this image, we can effectively create a smart image with links to wikidata, kegg or other databases in SVG. We can essentially annotate the diagram, AUTOMATICALLY.
This can be used to mine new pathway information from scientific literature and store it in a public database such as wikipathways very quickly.

Installation

pyamiimage can be downloaded via pip:

pip install pyamiimage

pyamimage requires Tesseract to run. Make sure you have Tesseract installed and it runs from the terminal with:

tesseract -h

Usage

Python

pyamiimage can be accessed as a python library.

from pyamiimage.ami_image import AmiImage
from pyamiimage.ami_graph import AmiGraph
from pyamiimage.ami_ocr import AmiOCR

image = AmiImage.create_gray_from_path(IMAGE_PATH)
bin_img = AmiImage.create_white_binary_from_image(image)
ocr = AmiOCR(bin_img)
words = ocr.get_words()

pyamiimage is still in developmental phase. Currently it only supports text extraction via the command-line. You can run pyamiimage from the terminal with:

pyamiimage --text /path/to/image/file /path/to/output/file

Contribution

Please open issues on Github whenever you encounter any issues with pyamiimage. It will greatly help making our software better.

If you would like to contribute, please fork and submit pull requests. Please open issues whenever making major changes.