reference-parser

Parses references in academic papers and extracts their metadata such as authors, title, date and so on.

Usage

To use it you must first train on some hand-labeled data. Example of such data is given in the file training-data.txt. To do training, execute this command in a terminal:

$ php run-train.php > trained-data.php

This creates file trained-data.php with trained model parameters. Also, if you want to see the actual transition and emission probability matrices, you can print the HTML file with tables:

$ php run-train.php html > trained-data.html

To perform actual extraction process, execute:

$ php run-extract.php

This should print out labeled fields of a reference.

Sample training data

training-data.txt contains some hand-labeled data with following fields:

<T> TITLE
<A> AUTHOR
<D> DATE
<P> PAGES
<V> VOLUME
<J> JOURNAL
<N> NUMBER
<U> URL
<B> PUBLISHER
<L> LOCATION

Of course you can create your own training data and capture the fields you want. The quality of extraction greatly depends on volume of this data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
MetadataExtractor.php		MetadataExtractor.php
README.md		README.md
run-extract.php		run-extract.php
run-train.php		run-train.php
training-data.txt		training-data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reference-parser

Usage

Sample training data

About

Releases

Packages

Languages

License

redcapital/reference-parser

Folders and files

Latest commit

History

Repository files navigation

reference-parser

Usage

Sample training data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages