GitHub - rillce/CS7650-term-project: Google QUEST Q&A Labeling - https://www.kaggle.com/c/google-quest-challenge/data

Navigate to https://www.kaggle.com/c/google-quest-challenge/data for the challenge page

LSTM (QUEST labeling model)

Run cd lstm to navigate to the LSTM folder
Run pip3 install -r requirements.txt
Run python3 preprocess_data.py to preprocess the QUEST dataset
Run python3 train.py
Run python3 test.py to generate a test submission file for Kaggle
The final, best model will be saved in the models/ folder

Logistic Regression (QUEST labeling model)

Steps to obtain the training accuracy (test accuracy is displayed on Kaggle) using Spearman score -

cd into 'log_reg' directory - cd log_reg
Run pip3 install -r requirements.txt
We then run logistic regression to generate the features, train and save the model - python logistic_regression.py (Will take around 5 minutes)
Above command will display the Spearman score for each label and also the overall average Spearman score, and also generate a generate a test submission file `submission.csv' for Kaggle

Naive Bayes (QUEST labeling model)

The naive Bayes model requires only the numpy, pandas, re, and collections libraries to run.

Run cd naive-bayes to navigate to the naive Bayes folder.
Run python3 naive-bayes.py to execute the naive Bayes model. This will train the model on the train.csv dataset and save a file submission.csv containing the predicted labels for the test.csv dataset.

Bi-directional Attention Flow (BiDAF) (QA baseline model)

The initial code repository is located at https://github.com/ElizaLo/Question-Answering-based-on-SQuAD .

Run cd qa to navigate to the qa folder
Run pip3 install -r requirements.txt
Run pip3 install -r requirements.txt to install all Python3 libraries
Run python3 -m spacy download en to download necessary Spacy models. These should be located in your Python/Python36/Lib/site-packages/en_core_web_sm/ folder.
Modify data_dir, spicy_en, and glove in config.py to be the locations where you want to download SQuAD to, the location from step 3 of your Spacy models on your system, and the location of your pre-trained GloVe embeddings downloaded from https://nlp.stanford.edu/projects/glove/ and extracted to that folder.
Run python3 make_dataset.py to preprocess the SQuAD dataset and create the vocabulary.
Run python3 train.py to train and save the BiDAF model.
Run python3 test.py to test the BiDAF model.

Steps to show dataset analysis

Before running the below programs, please download the SQuAD1.1 dataset from https://www.wolframcloud.com/objects/d91733a5-57f5-40fe-8e09-2f5285d21fe6, create a data directory, and place the file into that directory. Ensure the file is titled SQuAD-v1.1.csv before continuing.

Run analysis.py to generate all plots of the QUEST datasets' features, which are placed in the plots/ directory
Run quest_analysis.py to generate QUEST dataset statistics.
Run squad_analysis.py to generate SQuAD dataset statistics.

Steps to generate error analysis plots (Plots already present in `plots_error_analysis` folder, run steps if not present)

cd into 'log_reg' directory - cd log_reg
Run logistic regression if not done above - python3 logistic_regression.py
Now, run the command - python3 error_analysis.py to generate the error analysis plots in the plots_error_analysis folder.

Steps to generate transformed SQuAD using saved logistic regression models

First we parse squad json into csv for a more readable format, run the command from the parent directory- python3 parse_squad_to_csv.py. This will generate train-v1.1.csv and dev-v1.1.csv in the squad_dataset folder.
cd into 'log_reg' directory - cd log_reg
Run the command - python3 read_and_label_squad.py. This will use the saved logistic regression models to label squad. This will take ~5 minutes.
The labeled csv will be generated in the squad_labelled folder, dev-v1.1_labeled.csv and train-v1.1_labeled.csv.

Steps to generate plots that analyze the transformed SQuAD data

cd into 'log_reg' directory - cd log_reg
Run the command - python3 analysis_labeled.py
The plots will be generated in the plots_labeled folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSTM (QUEST labeling model)

Logistic Regression (QUEST labeling model)

Naive Bayes (QUEST labeling model)

Bi-directional Attention Flow (BiDAF) (QA baseline model)

Steps to show dataset analysis

Steps to generate error analysis plots (Plots already present in `plots_error_analysis` folder, run steps if not present)

Steps to generate transformed SQuAD using saved logistic regression models

Steps to generate plots that analyze the transformed SQuAD data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
log_reg		log_reg
lstm		lstm
naive-bayes		naive-bayes
plots		plots
qa		qa
squad_dataset		squad_dataset
README.md		README.md
analysis.py		analysis.py
parse_squad_to_csv.py		parse_squad_to_csv.py
quest_analysis.py		quest_analysis.py
squad_analysis.py		squad_analysis.py
test.csv		test.csv
train.csv		train.csv

rillce/CS7650-term-project

Folders and files

Latest commit

History

Repository files navigation

LSTM (QUEST labeling model)

Logistic Regression (QUEST labeling model)

Naive Bayes (QUEST labeling model)

Bi-directional Attention Flow (BiDAF) (QA baseline model)

Steps to show dataset analysis

Steps to generate error analysis plots (Plots already present in plots_error_analysis folder, run steps if not present)

Steps to generate transformed SQuAD using saved logistic regression models

Steps to generate plots that analyze the transformed SQuAD data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Steps to generate error analysis plots (Plots already present in `plots_error_analysis` folder, run steps if not present)

Packages