# Slicing Functions with MlFlow 
----------------------------------------------------------------

In this notebook, we will explore Snorkel's slicing functions and their use in T2R2 together, paired with mlflow

### mlflow
start out by running the server locally or elsewhere:

```
  mlflow server
```

and set the tracking uri together with the experiment name in the config, the default being:

```
  experiment_name: 'my_experiment_slicing'
  tags:
      version: 'v1'
  tracking_uri: "http://localhost:5000"
```

after running you can view your experiment in the browser, using the tracking uri.

### slicing functions
Slicing functions allow you to examine the performance of the model on a variety of defined subsets defined by slicing functions.
Specify slicing functions in config. Like this:
```
  selectors: 
    - name: slicing
      args: 
        result_file: 'slicing/test_slicing.pickle'
        list_of_slicing_functions: [short, textblob_polarity, long]
```

You can pick them from the avalable list in [default_slicing_functions.py](src\t2r2\selector\slicing\default_slicing_functions.py) below:

```
@slicing_function()
def short(x):
    '''Short texts, below 60 characters'''
    return len(x.text.split()) < 60

@slicing_function()
def long(x):
    '''Long texts, above 100 characters'''
    return len(x.text.split()) > 100

@slicing_function(pre=[textblob_sentiment])
def textblob_polarity(x):
    '''Slightly more positive sentiment(-1 is negative 1 is positive)'''
    return x.polarity > 0.1
```

But feel free to add your own and contribute to t2r2! 


In [2]:
import os
import t2r2

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
os.environ['WANDB_DISABLED'] = 'true'

In [4]:
t2r2.loop()

100%|██████████| 12109/12109 [00:10<00:00, 1144.44it/s]
100%|██████████| 2584/2584 [00:02<00:00, 1131.07it/s]
  return _infer_schema(self._df)
  return _infer_schema(self._df)
  return _infer_schema(self._df)
  return _infer_schema(self._df)
***** Running training *****
  Num examples = 1000
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 500
The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: token_type_ids. If token_type_ids are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 2
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: 

{'loss': 2.2379, 'learning_rate': 0.0, 'epoch': 1.0}


Saving model checkpoint to ClassificationBERT\checkpoint-500
Configuration saved in ClassificationBERT\checkpoint-500\config.json


{'eval_loss': 2.1426098346710205, 'eval_f1_score': 0.03269024651661307, 'eval_slicing_scores_accuracy_overall': 0.244, 'eval_slicing_scores_accuracy_short': 0.256120527306968, 'eval_slicing_scores_accuracy_textblob_polarity': 0.2747603833865815, 'eval_slicing_scores_accuracy_long': 0.22448979591836735, 'eval_runtime': 153.419, 'eval_samples_per_second': 6.518, 'eval_steps_per_second': 3.259, 'epoch': 1.0}


Model weights saved in ClassificationBERT\checkpoint-500\pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ClassificationBERT\checkpoint-500 (score: 0.03269024651661307).


{'train_runtime': 939.6518, 'train_samples_per_second': 1.064, 'train_steps_per_second': 0.532, 'train_loss': 2.237875, 'epoch': 1.0}


***** Running Prediction *****
  Num examples = 100
  Batch size = 2
The following columns in the test set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: token_type_ids. If token_type_ids are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 9
  Batch size = 2
The following columns in the test set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: token_type_ids. If token_type_ids are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.


{'train_results': {'train_runtime': 939.6518,
  'train_samples_per_second': 1.064,
  'train_steps_per_second': 0.532,
  'train_loss': 2.237875,
  'epoch': 1.0},
 'test_results': {'test_loss': 2.0251612663269043,
  'test_f1_score': 0.03977272727272728,
  'test_slicing_scores_accuracy_overall': 0.28,
  'test_slicing_scores_accuracy_short': 0.2807017543859649,
  'test_slicing_scores_accuracy_textblob_polarity': 0.4,
  'test_slicing_scores_accuracy_long': 0.23076923076923078,
  'test_runtime': 15.1103,
  'test_samples_per_second': 6.618,
  'test_steps_per_second': 3.309},
 'control_results': {'test_loss': 2.2905657291412354,
  'test_f1_score': 0.0,
  'test_slicing_scores_accuracy_overall': 0.0,
  'test_slicing_scores_accuracy_short': 0.0,
  'test_slicing_scores_accuracy_textblob_polarity': 0.0,
  'test_slicing_scores_accuracy_long': 0.0,
  'test_runtime': 1.4715,
  'test_samples_per_second': 6.116,
  'test_steps_per_second': 3.398}}