Skip to content

laphisboy/NLP_MathProblem

Repository files navigation

NLP_MathProblem

Since it is an Industry-University Cooperation Project,
no longer access to the dataset as the project ended :)

Note the project was worked as a team,
and I was mainly in charge of Text pre-processing / machine learning methods (XGBoost) / Ensembling / Result Analysis

Implementations:

  1. Processing dirty text
  2. Machine Learning methods to classify math word problems with equations (Korean + English + Math Symbols + Numbers)
  3. with the help of pretrained KoBert and KoELECTRA models, classification used with fine tuning
  4. Ensembling
  5. Analysis of results: wrongs and rights

Results:

  1. Comparison of different text preprocessing before applied in CountVectorizer and then XGBoost
  • From ex1 ~ ex4 the level of removing text and substitution increases

NLProject_graphTextPreprocessingCompare

  1. Comparison of different Models with fixed text preprocessing
  • Ensemble method is aggregation of outputs from different selected models with weights
  • After comparing...
  1. using all model outputs and using a small neural network to learn the best way to combine the outputs
  2. using only the best performing models for each type of method, best weight is found through trial and error
    NLProject_graphModelCompare
Additionally,

Another method tested, that is not discussed here, is pretraining BERT model with the data currently in hand.
However the dataset is too small, so doesn't show better result than any of the results above.
If there were enough data, it would be expected that such implementation would produce the best results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published