Automatic-Metadata-Extraction-from-Scientific-Documents

This project deals with Automatic metadata extraction from Scientific Documents. The relevant metadata includes Title, Authors, Abstract, Keywords, Journal Name, Volume, etc.

The automatic extraction of metadata is performed by analyzing the relevant text, with application of suitable information extraction and natural language processing techniques. The methods used include font analysis and processing of the uncompressed and converted PDF file (converted to xml and text) using information extraction techniques like regular expressions, tokenizing, etc.

This project was done at Indira Gandhi Centre for Atomic Research(IGCAR) to perform automatic metadata extraction on scientific documents getting submitted to the knowledge repository at IGCAR. Further details are included in the final report.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Code		Code
FinalReport.pdf		FinalReport.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code

Code

FinalReport.pdf

FinalReport.pdf

README.md

README.md

Repository files navigation

Automatic-Metadata-Extraction-from-Scientific-Documents

About

Releases

Packages

Languages

mon95/Automatic-Metadata-Extraction-from-Scientific-Documents

Folders and files

Latest commit

History

Repository files navigation

Automatic-Metadata-Extraction-from-Scientific-Documents

About

Resources

Stars

Watchers

Forks

Languages