Final project for LIS590 Text Mining.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
documentation
processed_data
project_activities
raw_data
README.md

README.md

Text Mining, Spring 2015 – Predictive OSTI Subject Classification of Technical Report Abstracts from SciTech Connect

I explored what classification models and feature selections were best for automating the subject classification of technical reports in the SciTech Connect database. I exported metadata for technical report records with abstracts and assigned one of three OSTI subjects. These records were processed with java programs and XSL stylesheets I created before I uploaded them into Oracle. Features were selected from the Oracle tables using two feature selection algorithms: Information Gain and TFxIDF. Oracle Data Miner was used to create two types of classification models: Decision Tree and SVM. Finally, these models were tested against new records pulled from SciTech Connect. The final report also includes a literature review on the value of grey literature.

Technologies: Oracle, SQL, Java, XML/XSL

Final Report

Initial Presentation

Final Presentation