# EXAMPLE - 1

**Tasks :- Intent Detection, NER, Fragment Detection**

**Tasks Description**

``Intent Detection`` :- This is a single sentence classification task where an `intent` specifies which class the data sample belongs to. 

``NER`` :- This is a Named Entity Recognition/ Sequence Labelling/ Slot filling task where individual words of the sentence are tagged with an entity label it belongs to. The words which don't belong to any entity label are simply labeled as "O". 

``Fragment Detection`` :- This is modeled as a single sentence classification task which detects whether a sentence is incomplete (fragment) or not (non-fragment).

**Conversational Utility** :-  Intent detection is one of the fundamental components for conversational system as it gives a broad understand of the category/domain the sentence/query belongs to.

NER helps in extracting values for required entities (eg. location, date-time) from query.

Fragment detection is a very useful piece in conversational system as knowing if a query/sentence is incomplete can aid in discarding bad queries beforehand.


**Data** :- In this example, we are using the <a href="https://snips-nlu.readthedocs.io/en/latest/dataset.html">SNIPS</a> data for intent and entity detection. For the sake of simplicity, we provide 
the data in simpler form under ``snips_data`` directory taken from <a href="https://github.com/LeePleased/StackPropagation-SLU/tree/master/data/snips">here</a>.


# Step - 1: Transforming data

The data is present in *BIO* format where each word in a sentence is tagged with corresponding entity. 
Sentences are separated by \" " and at the end of each sentence, intent class to which the sentence belongs is mentioned. We already provide a sample transformation function ``snli_entailment_to_tsv`` to convert this data to required tsv data files. T
Fragment detection data is generated from intent detection data created using the transform function
``create_fragment_detection_tsv``. 

Running data transformations will save the required train, dev and test tsv data files under ``data`` directory in root of library. For more details on the data transformation process, refer to <a href="https://multi-task-nlp.readthedocs.io/en/latest/data_transformations.html">data transformations</a> in documentation.

The transformation file should have the following details which is already created ``transform_file_snips.yml``.

```
transform1:
  transform_func: snips_intent_ner_to_tsv
  read_file_names:
    - snips_train.txt
    - snips_dev.txt
    - snips_test.txt
  read_dir: snips_data
  save_dir: ../../data
  
transform2:
  transform_func: create_fragment_detection_tsv
  read_file_names:
    - intent_snips_train.tsv
    - intent_snips_dev.tsv
    - intent_snips_test.tsv
  read_dir: ../../data
  save_dir: ../../data
  
 ```
 Following command can be used to run the data transformation for the tasks.

In [None]:
!python ../../data_transformations.py \
    --transform_file 'transform_file_snips.yml'

# Step -2 Data Preparation

Here we are training the three tasks together for demonstration. This means we will have a single
multi-task model capable of performing on all the three tasks. You can also train the tasks separately 
by mentioning single tasks in task file.

For more details on the data preparation process, refer to <a href="https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-data-preparation">data preparation</a> in documentation.

Defining tasks file for training single model for multiple tasks - intent detection, NER and fragment detection. The file is already created at ``tasks_file_snips.yml``

```
ner:
  model_type: BERT
  config_name: bert-base-uncased
  dropout_prob: 0.3
  label_map_or_file: ../../data/ner_snips_train_label_map.joblib
  metrics:
  - snips_f1_score
  - snips_precision
  - snips_recall
  loss_type: NERLoss
  task_type: NER
  file_names:
  - ner_snips_train.tsv
  - ner_snips_dev.tsv
  - ner_snips_test.tsv

intent:
    model_type: BERT
    config_name: bert-base-uncased
    dropout_prob: 0.3
    label_map_or_file: ../../data/int_snips_train_label_map.joblib
    metrics:
    - classification_accuracy
    loss_type: CrossEntropyLoss
    task_type: SingleSenClassification
    file_names:
    - int_snips_train.tsv
    - int_snips_dev.tsv
    - int_snips_test.tsv

    
fragment:
    model_type: BERT
    config_name: bert-base-uncased
    dropout_prob: 0.2
    class_num: 2
    metrics:
    - classification_accuracy
    loss_type: CrossEntropyLoss
    task_type: SingleSenClassification
    file_names:
    - fragment_snips_train.tsv
    - fragment_snips_dev.tsv
    - fragment_snips_test.tsv
```

Following command can be used to run the data preparation for the tasks.

In [None]:
!python ../../data_preparation.py \
    --task_file 'tasks_file_snips.yml' \
    --data_dir '../../data' \
    --max_seq_len 50

# Step - 3 Running train

Following command will start the training for the tasks. The log file reporting the loss, metrics and the tensorboard logs will be present in a time-stamped directory. For demonstration, we've put up sample logs under ``train_logs`` directory.

For knowing more details about the train process, refer to <a href= "https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-train">running training</a> in documentation.

In [None]:
!python ../../train.py \
    --data_dir '../../data/bert-base-uncased_prepared_data' \
    --task_file 'tasks_file_snips.yml' \
    --out_dir 'snips_intent_ner_fragment_bert_base' \
    --epochs 3 \
    --train_batch_size 16 \
    --eval_batch_size 32 \
    --grad_accumulation_steps 2 \
    --log_per_updates 50 \
    --max_seq_len 50 \
    --eval_while_train \
    --test_while_train \
    --silent 

# Step - 4 Infering

You can import and use the ``inferPipeline`` to get predictions for the required tasks.
The trained model and maximum sequence length to be used needs to be specified.

For knowing more details about infering, refer to <a href="https://multi-task-nlp.readthedocs.io/en/latest/infering.html">infer pipeline</a> in documentation.

In [None]:
import sys
sys.path.insert(1, '../../')
from infer_pipeline import inferPipeline