# EXAMPLE - 8

**Tasks :- Sentiment analysis**

**Tasks Description**

``sentiment`` :- This is modeled as single sentence classification task to determine where a piece of text conveys a positive or negative sentiment.

**Conversational Utility** :- To determine whether a review is positive or negative.

**Data** :- In this example, we are using the <a href="https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data">IMDB</a> data which can be downloaded after accepting the terms and saved under `imdb_data` directory. The data is having total 50k samples labeled as positive or negative.


In [None]:
!unzip imdb_data/134715_320111_bundle_archive.zip -d imdb_data/imdb_dataset.csv

In [None]:
!mv imdb_data/IMDB\ Dataset.csv imdb_data/imdb_sentiment_data.csv

# Step - 1: Transforming data
The data file `imdb_dataset` is having 50k samples with two columns - review and sentiment. Sentiment is the label which can be positive or negative.
We already provide a sample transformation function ``imdb_sentiment_data_to_tsv`` to convert this data to required tsv format.
Running data transformations will save the required train and test tsv data files under ``data`` directory in root of library. For more details on the data transformation process, refer to <a href="https://multi-task-nlp.readthedocs.io/en/latest/data_transformations.html">data transformations</a> in documentation.

The transformation file should have the following details which is already created ``transform_file_imdb.yml``.

```
transform1:
  transform_func: imdb_sentiment_data_to_tsv
  read_file_names:
  - imdb_sentiment_data.csv
  read_dir: imdb_data
  save_dir: ../../data
```

In [None]:
!python ../../data_transformations.py \
    --transform_file 'transform_file_imdb.yml'

# Step -2 Data Preparation

For more details on the data preparation process, refer to <a href="https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-data-preparation">data preparation</a> in documentation.

Defining tasks file for training single model for sentiment task. The file is already created at ``tasks_file_imdb.yml``

```
sentiment:
    model_type: BERT
    config_name: bert-base-uncased
    dropout_prob: 0.2
    label_map_or_file:
    - negative
    - positive
    class_num: 2
    metrics:
    - classification_accuracy
    loss_type: CrossEntropyLoss
    task_type: SingleSenClassification
    file_names:
    - imdb_sentiment_train.tsv
    - imdb_sentiment_test.tsv
```

In [None]:
!python ../../data_preparation.py \
    --task_file 'tasks_file_imdb.yml' \
    --data_dir '../../data' \
    --max_seq_len 200

# Step - 3 Running train

Following command will start the training for the tasks. The log file reporting the loss, metrics and the tensorboard logs will be present in a time-stamped directory.

For knowing more details about the train process, refer to <a href= "https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-train">running training</a> in documentation.

In [None]:
!python ../../train.py \
    --data_dir '../../data/bert-base-uncased_prepared_data' \
    --task_file 'tasks_file_imdb.yml' \
    --out_dir 'imdb_sentiment_bert_base' \
    --epochs 8 \
    --train_batch_size 32 \
    --eval_batch_size 32 \
    --max_seq_len 200 \
    --grad_accumulation_steps 1 \
    --log_per_updates 50 \
    --eval_while_train  \
    --silent

# Step - 4 Infering

You can import and use the ``inferPipeline`` to get predictions for the required tasks.
The trained model and maximum sequence length to be used needs to be specified.

For knowing more details about infering, refer to <a href="https://multi-task-nlp.readthedocs.io/en/latest/infering.html">infer pipeline</a> in documentation.

In [None]:
import sys
sys.path.insert(1, '../../')
from infer_pipeline import inferPipeline