NLM .nxml to text format conversion
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
src
test
LICENSE
README.md
nxml2txt
nxml2txt.sh

README.md

nxml2txt

NLM .nxml to text format conversion

Usage:

./nxml2txt NXMLFILE [TEXTFILE] [SOFILE]

For example (using test document):

./nxml2txt test/PMC3357053.nxml test/PMC3357053.txt test/PMC3357053.so

This creates the files test/PMC3357053.txt, containing the text content of the input document, and test/PMC3357053.so, containing the annotations (XML elements and their attributes) in a simple standoff format.

nxml2txt assumes a unix-like environment. If the input .nxml file contains embedded TeX-math, nxml2txt requires LaTeX and catdvi.

This tool was originally introduced as part of the BioNLP Shared Task 2011 supporting resources (https://github.com/ninjin/bionlp_st_2011_supporting).