Skip to content

Text mining on electronic theses and dissertations.

License

Notifications You must be signed in to change notification settings

lamps-lab/ETDMiner

Repository files navigation

ETDMiner

ETDMiner consists of multiple AI based applications and datasets which helps to parse, extract, classify, and mine Electronic Theses and Dissertations (ETDs).

Table of Contents

AutoMeta

This application is version 1.1 of etd_crf to extract metadata automatically from scanned ETDs.

data

It contains the dataset which is used to extract metadata from scanned ETD.

etd_crf

It is the AutoMeta tool version 1.0.

etd_segmentation

This is ETD segmentation tool to classify ETD pages.

metadata_correction

This is an application to fill out the missing metadata in the database (i.e., pates_etds).

Go to the sub-folder

samples

It contains the sample dataset which has been tested out in the above process.

src

It contains the handful of source file to pre process dataset.

html_parser

It contains the code and instruction to get ETDs in html file.

webcrawler

Contains the crawlers & parsers for different universities developed to collect ETDs and extract metadata from the webpages.

About

Text mining on electronic theses and dissertations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published