Skip to content

mon95/Automatic-Metadata-Extraction-from-Scientific-Documents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Automatic-Metadata-Extraction-from-Scientific-Documents

This project deals with Automatic metadata extraction from Scientific Documents. The relevant metadata includes Title, Authors, Abstract, Keywords, Journal Name, Volume, etc.

The automatic extraction of metadata is performed by analyzing the relevant text, with application of suitable information extraction and natural language processing techniques. The methods used include font analysis and processing of the uncompressed and converted PDF file (converted to xml and text) using information extraction techniques like regular expressions, tokenizing, etc.

This project was done at Indira Gandhi Centre for Atomic Research(IGCAR) to perform automatic metadata extraction on scientific documents getting submitted to the knowledge repository at IGCAR. Further details are included in the final report.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages