In this workshop you will have the opportunity to work directly with real-world problem by exploring different approaches to the Kaggle competition: Jigsaw Toxic Comment Classification challenge. The competition had the objective of developing a model capable of predicting a probability of six different types of toxicity for Wikipedia comments. The types of toxicity are:
Any of the comments can have more than one type of toxicity (or none at all) so we consider this a multi-label classification problem. Before digging deeper, make sure you understand what is the difference between multi-class and multi-label classification.
1 - Install Python dependencies
For completing all the proposed exercises, you'll need to clone this repository and install the required dependencies. We recommend using Miniconda, if you're new to package manager systems, which allows you to create a separate development environment, isolated for your own convenience and safety :) Feel free to use any other tools, like pipenv or pure virtualenv.
Just make sure you are using Python3.6 or newer.
- Download and install Miniconda
- Create a virtual environment (we will name it test_env, but you can call it whatever you want)
conda create --name test_env python=3.6
- Activate the environment
source activate test_env
- Go to your cloned project folder and install the requirements on you newly created environment
conda install --file requirements.txt
- Run your python scripts or jupyter-notebook as usual
- Deactivate the environment, once you're done for the day
source deactivate test_env
- You can remove the environment for once and for all
conda remove -n test_env -all
2 - Download Kaggle dataset
Go to https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge , sign in (or create an account first) and then download the data for the Kaggle challenge. If you can not find it, try downloading it directly using this link.
Extract the csv files inside the
You should also download the fastText embeddings that willbe later used in the notebooks.
3 - Run Jupyter notebooks
Follow the notebooks' order and try to complete the exercises (we will provide these later).