Skip to content

psanghal/text-difficulty-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-difficulty-prediction

University of Michigan: Milestone Project 2

Project Description: Applied supervised and unsupervised learning techniques on Wikipedia text to predict sentences which will need to be simplified for readers to make it easier to understand. Readers may include students, children, adults with learning/reading disability, and non-native English speakers.

Project Workflow: This project contains 5 jupyter notebooks. It begins with extracting features from the original text and then goes on to implementing supervised and unsupervised learning models using extracted features and text tokenizers such as TFIDF, Sentence Piece, and Keras Tokenizer. The goal of doing this was to assess the effectiveness of feature representation in classifying text difficulty as well understand which steps in manual feature extraction worked well Vs could be improved in future.

Please refer to following jupyter notebooks for code implementation.

  1. Text Difficulty-Feature Extraction-Final
  2. Text Difficulty-Supervised Models-Final
  3. Text Difficulty- Deep Learning-Final
  4. Text Difficulty-Unsupervised Models-Final
  5. Text Difficulty-Topic Modelling-Final Features extracted from the first notebook “Text Difficulty-Feature Extraction-Final” has been used extensively in all notebooks to save computational time.

Please click on the dataset to view the file.

About

UMSI-Wikipedia Text Difficulty Prediction (NLP Project)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published