Skip to content

eldir/ChemDataExtractor

 
 

Repository files navigation

ChemDataExtractor

image

image

image

ChemDataExtractor is a toolkit for extracting chemical information from the scientific literature.

Features

  • HTML, XML and PDF document readers
  • Chemistry-aware natural language processing pipeline
  • Chemical named entity recognition
  • Rule-based parsing grammars for property and spectra extraction
  • Table parser for extracting tabulated data
  • Document processing to resolve data interdependencies

Installation

To install ChemDataExtractor, simply run:

pip install chemdataextractor

Or if you are an Anaconda user, run:

conda install -c chemdataextractor chemdataextractor

Alternatively, try one of the other installation options.

Documentation

Full documentation is available at http://chemdataextractor.org/docs

License

ChemDataExtractor is licensed under the MIT license, a permissive, business-friendly license for open source software.

About

Automatically extract chemical information from scientific documents

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 60.5%
  • HTML 39.1%
  • Shell 0.4%