Functional and structural analysis of tables in research papers (Table disentangling)
Java Web Ontology Language Python Shell
Clone or download
Permalink
Failed to load latest commit information.
IEResources Pattern based analysis Oct 27, 2014
InformationClasses Reading of more complex Information class files Feb 12, 2015
Models Model for DDI header detection and classifier in code Apr 3, 2017
analysis-UPCI add folder for analysis Jun 21, 2016
annotation Some changes Nov 8, 2016
lib Changes for making reader for HTML output from easyPDFConverter Jun 27, 2017
src Fix of breaking documents because of special characters Aug 7, 2017
DataBaseFileDrugs.sql Fix of breaking documents because of special characters Aug 7, 2017
DatabaseModel.png Added database model image Jan 18, 2016
HighLevelPatterns New way of writing conceptualization of values in tree structure. Nov 13, 2014
LICENSE.txt Merge branch 'master' of Apr 5, 2015
Level2Patterns New way of writing conceptualization of values in tree structure. Nov 13, 2014
ProcessDailyMed.sh Bash scripts for running Jul 19, 2016
ProcessPMC.sh Bash scripts for running Jul 19, 2016
README.md Update of readme file Jul 6, 2017
SemanticTypes Output with semantic types from metamap. Jun 10, 2014
SemanticTypes_2013AA.txt Project cleanup and reference fixing Jan 18, 2016
TableMiningOntology.owl Settings and update of TableMiningOntology Jun 20, 2017
en-pos-maxent.bin Taggers, lexicons and improvements Jun 29, 2015
en-sent.bin Extracting sentences that refer to a table Jan 26, 2017
en-token.bin Taggers, lexicons and improvements Jun 29, 2015
file_properties.xml Extracting sentences that refer to a table Jan 26, 2017
my-token.bin Tokenization model Feb 13, 2015
patterns Conceptualization changes Dec 8, 2014
patterns.txt New pattern analysis Nov 10, 2014
settings.cfg Changes for making reader for HTML output from easyPDFConverter Jun 27, 2017
sprag_AdverseEvent.arff Pragmatic analysis dataset and the script to generate dataset Feb 2, 2017
sprag_AllData.arff Pragmatic analysis dataset and the script to generate dataset Feb 2, 2017
sprag_BaselineCharacteristic.arff Pragmatic analysis dataset and the script to generate dataset Feb 2, 2017
sprag_InclusionExclusion.arff Pragmatic analysis dataset and the script to generate dataset Feb 2, 2017
sprag_Other.arff Pragmatic analysis dataset and the script to generate dataset Feb 2, 2017

README.md

TableDisentangler - A tool for automatic disentangling of functional areas in tables and their annotation

TableDisentangler is a tool for annotating tables written in Java. It uses specific annotation schema we proposed that is able to capture information about functions of a cell and inter-cell relationships. TableDisentangler is a tool for extracting annotations from tables in PMC clinical documents in XML format (it is possible to generate XML from PDF).

Tool does this in a couple of steps. Firstly, tables are decomposed to a matrix of cell objects containing data and information about navigational path (headers, stubs, subheaders).

This project is developed on the University of Manchester as a part of my PhD

Requirements

The tool requires Java, OpenNLP, Weka toolkit, MySQL database, installed MetaMap and WordNet.

Other project dependences

Some manipulation on dataset (splitting data to training, testing and cross-validation sets, downloading data, extracting tables etc.) are done by python scripts in TableMiningHelpers git project.

Database output of this system may be used as input database for the MedCurator project

You need also to checkout Marvin project and include reference to it in a project.

License

The tool is under GNU/GPL 3 license. Licence agreement may be read here: http://www.gnu.org/copyleft/gpl.html

Referencing