Classifier for the question classification dataset - [ http://cogcomp.org/Data/QA/QC/ ]
Results from the empirical tests carried out are in {project_directory}/documentation/Results.md
- Go to the project directory.
- We need to execute the command
./bin/qc.sh nlp
first. - Once the Natural Language Processing (NLP) is done for computing annotated natural language property we can train one of the models.
- To train a model run command
./bin/qc.sh train {ml_algo_model}
. e.g./bin/qc.sh train svm
- To test a model run command
./bin/qc.sh test {ml_algo_model}
.
svm
= Support Vector Machinelr
= Logistic Regressionlinear_svm
= Linear Support Vector Classifier (Machine)
-
The method to convert text data to ML features can be modified in function
qc.dataprep.text_features.get_vect
. -
The feature stack (what all data is to be feed to ML algorithm) can be modified/transformed/generated in file
qc.dataprep.feature_stack
These (point 1, 2) changes are used whenever you execute training process again. There is no need to execute
nlp
step again. -
Machine learning algorithms can be added in function
qc.ml.train.train_one_node
. (Parameter tuning too can be done) e.g In the experimental part of the code add extraelif
statementelif == {your_model_name}: machine = {Initialize the algorithm you want to use}
While executing using shell script execute command
./bin/qc.sh train {your_model_name}
, and this command will use the model defined by you.
- python - v3.6.3
- configobj - v5.0.6
- spaCy - v2.0.9 (with "en_core_web_lg" english model)
- sner - v0.2.3
- scipy - v1.0.0
- scikit-learn - v0.19.1
This project has been inspired from one of the problem we tried to solve - understanding the question for our QA bot.
In the project I did work with Akash Pateria - [https://github.com/Akash-Pateria], we worked together in the final year
graduate project, named Invoker
.
This project aims at exploring more options to process Natural Language (English) and improve the accuracy.