# Notebook to train a model

We will train a model on kaggle dataset to detect credit card fraud. This model will be uploaded as a versioned model artifact in order to be used in a production context.

### Adding dependencies to the environment

This notebook only needs the pex created previously. Thus we load it via the [punch_dependencies](https://punch-1.gitbook.io/punch-doc/v/welcome-to-the-punch/applications/jupyter/magic-commands#punchdependencies) magic cell.

In [None]:
%%punch_dependencies
additional-pex:demo:dependencies:1.0.0

++ java -Xmx1g -Xms256m -Dlog4j.configurationFile=/punch/conf/log4j2/log4j2-stdout.xml -cp /punch/resourcectl.jar com.github.punchplatform.resourcectl.ResourceCtl -u http://artifacts-server.punch-artifacts:4245 download -r additional-pex:demo:dependencies:1.0.0 -o /usr/share/punch/extlib/python


Resource additional-pex:demo:dependencies:1.0.0 downloaded to /usr/share/punch/extlib/python/dependencies-1.0.0.pex


<IPython.core.display.Javascript object>

### Importing modules

In [1]:
from sklearn import tree
from sklearn.metrics import accuracy_score
import mlflow

### Reading data from s3
Punch provides magic cells to read data from different sources. If your Jupypunch was deployed with preconfigured databases then you do not need to rewrite your login credentials.

Here, the train data set is download from a minio bucket named "demo". We read the file and store the data in a variable called "train". The testing dataset is loaded in the test variable. ([punch_source](https://punch-1.gitbook.io/punch-doc/v/welcome-to-the-punch/applications/jupyter/magic-commands#punchsource-and-punchsink))

In [2]:
%%punch_source --type s3 --name train -o 
bucket: demo
prefix: train/train.csv

Data is available in train variable.
Execution time: 0:00:00.418407


In [3]:
%%punch_source --type s3 --name test -o 
bucket: demo
prefix: test/test.csv

Data is available in test variable.
Execution time: 0:00:00.166920


### Removing unused columns

Punch source node adds some variables like *_ppf_path* and *_ppf_last_modified* which are useful in some contexts but unnecessary for our example.

In [4]:
train = train[['distance_from_home', 'distance_from_last_transaction',
       'ratio_to_median_purchase_price', 'repeat_retailer', 'used_chip',
       'used_pin_number', 'online_order', 'fraud']]
train.head(2)

Unnamed: 0,distance_from_home,distance_from_last_transaction,ratio_to_median_purchase_price,repeat_retailer,used_chip,used_pin_number,online_order,fraud
0,4.805367,1.379477,1.23696,1.0,0.0,0.0,0.0,0.0
1,27.052054,1.76607,0.415689,1.0,0.0,0.0,0.0,0.0


In [5]:
test = test[['distance_from_home', 'distance_from_last_transaction',
       'ratio_to_median_purchase_price', 'repeat_retailer', 'used_chip',
       'used_pin_number', 'online_order', 'fraud']]
test.head(2)

Unnamed: 0,distance_from_home,distance_from_last_transaction,ratio_to_median_purchase_price,repeat_retailer,used_chip,used_pin_number,online_order,fraud
0,11.188842,0.067784,1.659848,1.0,0.0,0.0,1.0,0.0
1,8.359728,0.186258,0.495259,1.0,1.0,0.0,0.0,0.0


### Training the model

We train a decision tree classifier on train data.

In [6]:
model = tree.DecisionTreeClassifier()
model = model.fit(train.drop("fraud", axis=1).values, train["fraud"].values)

### Testing the model

We test the model on test data.

In [7]:
prediction = model.predict(test.drop("fraud", axis=1))
accuracy_score(test["fraud"], prediction)



0.9999866666666667

### Saving the model and uploading it as an artifact

Once satisfied with model results, we can upload the model in a desired packaging format (here mlflow) via a lambda function. 

In [8]:
%%punch_upload_model -g demo -n credit_card -v 1.0.0 -o
lambda path: mlflow.sklearn.save_model(model, path)

++ java -Xmx1g -Xms256m -Dlog4j.configurationFile=/punch/conf/log4j2/log4j2-stdout.xml -cp /punch/resourcectl.jar com.github.punchplatform.resourcectl.ResourceCtl -u http://artifacts-server.punch-artifacts:4245 upload -f /tmp/punch_upload_model/demo/credit_card/1.0.0/artifact_credit_card_1.0.0.zip -o


Resource uploaded : model:demo:credit_card:1.0.0
