This repository requires the following libraries to run:
We divide the solution in 3 steps: preprocessing, training and submissions (predictions).
- training_mytk_09. pynb
First we have to do the preprocessing in
preprocessing.ipynb. In this notebook we download
test.csv, make some directories and generate some vocabularies, subsets and dictionaries.
We make the following structure:
We make 20 pandas dataframes and store them in
We make 4 vocabulary dictionaries and store them in
We make two pickle files containing some statistics about the classes,
class_weights.pkl and store them in
Then we train 7 similar models. Whose difference is either the vocabulary, the dataset, or the minibatch sample strategy.
./training/ folder we have 7 notebooks, one per model. Each model generate 1 or more copy of the model, whose difference is in the learning rate or the number of epoches. We store the models in
./data/models/. Also we store some statistics in
After that, we do the predictions and make the submission files. In the
./submissions/ folder we have 7 files, one per model. We make a prediction for each copy of the model for each model. Then we use all the predictions of each model and generate one prediction per model using a voting system.
Finally, we use the 7 predictions (one per model) to make the final submission file. This file is generated from the
We use a vote system in which our best model model_mytk_07 has a weight of 5 and all the others have a weight of 2. We make the
submission.csv file and store it in