Automatic-Metadata-Extraction-from-Scientific-Documents

This project deals with Automatic metadata extraction from Scientific Documents. The relevant metadata includes Title, Authors, Abstract, Keywords, Journal Name, Volume, etc.

The automatic extraction of metadata is performed by analyzing the relevant text, with application of suitable information extraction and natural language processing techniques. The methods used include font analysis and processing of the uncompressed and converted PDF file (converted to xml and text) using information extraction techniques like regular expressions, tokenizing, etc.

This project was done at Indira Gandhi Centre for Atomic Research(IGCAR) to perform automatic metadata extraction on scientific documents getting submitted to the knowledge repository at IGCAR. Further details are included in the final report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic-Metadata-Extraction-from-Scientific-Documents

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic-Metadata-Extraction-from-Scientific-Documents