Skip to content


Repository files navigation

Navigate to for the challenge page

LSTM (QUEST labeling model)

  1. Run cd lstm to navigate to the LSTM folder

  2. Run pip3 install -r requirements.txt

  3. Run python3 to preprocess the QUEST dataset

  4. Run python3

  5. Run python3 to generate a test submission file for Kaggle

  6. The final, best model will be saved in the models/ folder

Logistic Regression (QUEST labeling model)

Steps to obtain the training accuracy (test accuracy is displayed on Kaggle) using Spearman score -

  1. cd into 'log_reg' directory - cd log_reg

  2. Run pip3 install -r requirements.txt

  3. We then run logistic regression to generate the features, train and save the model - python (Will take around 5 minutes)

  4. Above command will display the Spearman score for each label and also the overall average Spearman score, and also generate a generate a test submission file `submission.csv' for Kaggle

Naive Bayes (QUEST labeling model)

The naive Bayes model requires only the numpy, pandas, re, and collections libraries to run.

  1. Run cd naive-bayes to navigate to the naive Bayes folder.

  2. Run python3 to execute the naive Bayes model. This will train the model on the train.csv dataset and save a file submission.csv containing the predicted labels for the test.csv dataset.

Bi-directional Attention Flow (BiDAF) (QA baseline model)

The initial code repository is located at .

  1. Run cd qa to navigate to the qa folder

  2. Run pip3 install -r requirements.txt

  3. Run pip3 install -r requirements.txt to install all Python3 libraries

  4. Run python3 -m spacy download en to download necessary Spacy models. These should be located in your Python/Python36/Lib/site-packages/en_core_web_sm/ folder.

  5. Modify data_dir, spicy_en, and glove in to be the locations where you want to download SQuAD to, the location from step 3 of your Spacy models on your system, and the location of your pre-trained GloVe embeddings downloaded from and extracted to that folder.

  6. Run python3 to preprocess the SQuAD dataset and create the vocabulary.

  7. Run python3 to train and save the BiDAF model.

  8. Run python3 to test the BiDAF model.

Steps to show dataset analysis

Before running the below programs, please download the SQuAD1.1 dataset from, create a data directory, and place the file into that directory. Ensure the file is titled SQuAD-v1.1.csv before continuing.

  1. Run to generate all plots of the QUEST datasets' features, which are placed in the plots/ directory

  2. Run to generate QUEST dataset statistics.

  3. Run to generate SQuAD dataset statistics.

Steps to generate error analysis plots (Plots already present in plots_error_analysis folder, run steps if not present)

  1. cd into 'log_reg' directory - cd log_reg

  2. Run logistic regression if not done above - python3

  3. Now, run the command - python3 to generate the error analysis plots in the plots_error_analysis folder.

Steps to generate transformed SQuAD using saved logistic regression models

  1. First we parse squad json into csv for a more readable format, run the command from the parent directory- python3 This will generate train-v1.1.csv and dev-v1.1.csv in the squad_dataset folder.

  2. cd into 'log_reg' directory - cd log_reg

  3. Run the command - python3 This will use the saved logistic regression models to label squad. This will take ~5 minutes.

  4. The labeled csv will be generated in the squad_labelled folder, dev-v1.1_labeled.csv and train-v1.1_labeled.csv.

Steps to generate plots that analyze the transformed SQuAD data

  1. cd into 'log_reg' directory - cd log_reg

  2. Run the command - python3

  3. The plots will be generated in the plots_labeled folder.


No releases published


No packages published

Contributors 4

