# Utilizing automation script to test IMLY's perfromance

This notebook will show you how to use the automation_script to test the performance of IMLY on any given dataset. The results can be viewed in this [sheet](https://docs.google.com/spreadsheets/d/1E5jcq2w42gN8bMIaeaRJpAdhgSVN-2XDJ_YTHe4qfwY/edit?usp=sharing).

## Step 1: Add dataset details to the sheet

Add the following details to the ["Dataset details"](https://docs.google.com/spreadsheets/d/1E5jcq2w42gN8bMIaeaRJpAdhgSVN-2XDJ_YTHe4qfwY/edit?usp=sharing) sheet :   
    1. Name - Name of your dataset
    2. Link - A download link to your dataset with public access. Please make sure it's a csv file.
    3. Algorithm - Name of the algorithm you expect IMLY to use on your dataset. ex - "logistic_regression"
    
## Step 2: Add client secret

To be able to edit the sheet via automation_script you will have to add the client_secret.json file shared with you to cook-imly/data in your local repo.

## Step 3: Data preperation

The automation_script accepts the dataset as X(features) and Y(target). So the user is expected to split their dataset into X and Y before triggering the script. The following sample demonstrates the same for a simple dataset.

**Note - The X and Y values are expected to be passed as a dataframe**


In [1]:
import automation_script
import pandas as pd
import numpy as np
from os import path

dataset_name = "uci_iris" # Name of your dataset as mentioned in the 
dataset_info = automation_script.get_dataset_info(dataset_name)

# Gathering data and converting it into a dataframe
url = dataset_info['url']
data = pd.read_csv(url , delimiter=",", header=None, index_col=False)

# This part of the preparation is specific to the dataset
class_name,index = np.unique(data.iloc[:,-1],return_inverse=True)
data.iloc[:,-1] = index
data = data.loc[data[4] != 2]
X = data.iloc[:,:-1]
Y = data.iloc[:,-1]

Using TensorFlow backend.


## Step 4: Run the script

By calling the `run_imly()` function from automation_script you would be able to process your dataset with IMLY and record the performance to the above mentioned sheet.

The following arguments are mandatory for the `run_imly()` function:
    1. dataset_info - The dataset info gathered previously using `get_dataset_info()`
    2. model_name - Name of the algorithm you're planning to use
    3. X, Y
    4. test_size - The test_size for train_test_split
    
`params` is an optional argument. You can add any legal params accepted by keras to this argument(sample shown below).

```
params = {
        "units": 1,
        "batch_size": 10,
        "epochs": 100,
        "optimizer": "adam",
        "losses": "binary_crossentropy",
        "activation": "sigmoid"
      }
```

In [None]:
automation_script.run_imly(dataset_info=dataset_info, 
                                      model_name='logistic_regression', 
                                      X=X, Y=Y, 
                                      test_size=0.60)