Skip to content

eeshagoyal/ScoreEasy

Repository files navigation

ScoreEasy : Automated Essay Scoring

Brief Overview

The project aims to build a machine learning system for automatic scoring of essays written by students. The motivation for the project comes from Kaggle. The system takes as input an essay and outputs a score calculated with respect to six rating traits, namely Ideas and Content, Organization, Voice, Word Choice, Sentence Fluency and Conventions, rated with respect to the essay set.

The project was a part of a semester long course in Business Intelligence, done under the guidance of Mr. Rishabh Kaushal and Swati Sinha, for the course MCA-210. Involved - building a problem statement, collecting data, generating analysis and getting the final result, which culminated into a Poster Presentation competition for the entire batch.

Files Description

The work is distributed in a set of jupyter notebooks.

  • kaggle-dataset/ : directory consisting of the Dataset, taken from Kaggle.
    • Training dataset - training_set_rel3.tsv
    • Validation dataset - valid_set.tsv
    • Testing dataset - test_set.tsv
  • 1.1-EDA.ipynb and 1.2-Basic-Metric-Analysis.ipynb: Initial data analysis of the entire dataset and key findings from the data that would provide for specific parameters in developing the model.
  • 2-LDA.ipynb: A brief experiment done to implement an unsupervised approach to assign scores based on word probabilities.
  • 3-Tokenisation-and-Regression.ipynb: Tokenisation, feature extraction, application of regression techniques (Linear, Lasso, Ridge, Gradient Boosting) and accuracy comparison.
  • 4(a)-LSTM.ipynb, 4(b)-LSTM.ipynb and 4(c)-LSTM.ipynb: Notebooks implementing the LSTM (Long Short Term Memory) model to scoring essays.

Links

About

Automated Essay Scoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published