# Using Meeshkan for developing Kaggle kernels
For a full example of how to, for example, submit kernels from the command-line, see the example page in [GitHub](https://github.com/Meeshkan/meeshkan-client/tree/kaggle-kernel-example/examples/kaggle).

### Install dependencies

In [None]:
!pip install keras tensorflow meeshkan pandas scikit-learn

In [None]:
import os
import meeshkan
import keras
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, precision_score, make_scorer

Check that datasources are correctly linked:

In [None]:
print(os.listdir("../input"))

### Setting up Meeshkan

To use Meeshkan, you need to 
1. register at [meeshkan.com](https://meeshkan.com) and get our API key
1. setup Slack integration as instructed in the [docs](https://www.meeshkan.com/docs)
1. initialize the credentials and start the agent.

The second step can be achieved with the `meeshkan.init(token=YOUR_API_KEY)` command. **Make sure to replace the example API key below with your own key!**

In [None]:
meeshkan.init(token="meVLHpGVcpG1QI2jTo1VLJpv96ZRTijCROaj4CQSU1IMQ")  # REPLACE THIS WITH YOUR API_KEY

Meeshkan agent can be used to schedule multiple machine learning jobs, but as we're just running stuff sequentially, we can simply use the [as_blocking_job](https://meeshkan-client.readthedocs.io/en/latest/#meeshkan.as_blocking_job) decorator to group runs as jobs. Execute the cell below to ensure you get Slack notifications.

In [None]:
import time
@meeshkan.as_blocking_job(job_name="test-job", report_interval_secs=10)
def train():
    for i in range(10):
        meeshkan.report_scalar("counter", i)
        time.sleep(2)
        
train()

If everything works as expected, we can get rocking with machine learning!

### Run your job

To keep this demo as simple as possible, we will only use the `Age` column to try and predict `AdoptionSpeed`. First load data and peek into first rows:

In [None]:
train_df = pd.read_csv("../input/train/train.csv")
train_df.head()

Check what the columns look like:

In [None]:
train_df.info()

Let us check how `Age` compares to `AdoptionSpeed`:

In [None]:
train_df.plot(x="Age", y="AdoptionSpeed", kind="scatter")

Obviously the correlation is not very good and age of zero should be taken into account.

In [None]:
TRAIN_COLUMNS = ['Age']

X_train = train_df[TRAIN_COLUMNS].values
y_train = train_df['AdoptionSpeed'].values

In [None]:
def build_grid_search(pipeline, param_grid):
    return GridSearchCV(pipeline, param_grid, cv=5, return_train_score=True, refit='accuracy',
                        scoring={ 'accuracy': make_scorer(accuracy_score),
                                  'precision': make_scorer(precision_score)
                                },
                        verbose=1)

@meeshkan.as_blocking_job(job_name="grid-search", report_interval_secs=60)
def run_grid_search(grid_search):
    grid_search.fit(X_train, y_train)
    # print('Best test score accuracy is:', grid_search.best_score_)
    return pretty_cv_results(grid_search.cv_results_)

In [None]:
param_grid = [
    { 
        'preprocessing__truncate_titles__n_values_to_keep': [5],
        'classifier': [ SVC(random_state=42, probability=True) ], # Probability to use in voting later
        'classifier__C': np.logspace(-1, 1, 3),
        'classifier__kernel': ['linear', 'poly', 'rbf'],
        'classifier__gamma': ['auto', 'scale']
    }
]

pipeline = ([
        ('classifier', None) # Expected to be filled by grid search
    ])

svm_grid_search = build_grid_search(pipeline=pipeline, param_grid=param_grid)
svm_cv_results = run_grid_search(grid_search=svm_grid_search)

### Teardown

Finally we stop the Meeshkan agent cleanly:

In [None]:
meeshkan.stop()