# Objective
Create a machine learning model using JAI to solve a classification problem with unbalanced classes. In this first attempt we will use the [Kaggle Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud) dataset. Basically what you will find in the dataset are 30 columns, most of them with encripted information about different customers and which of these customers might default or not. Have fun! And if you have any doubts, check our documentation or ask us on our slack =].

# Imports 

In [None]:
import pandas as pd
from jai import Jai
from sklearn import metrics
from tabulate import tabulate
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

# Generating your JAI account (if you don't have it already)

In [None]:
# ATTENTION: If you haven't generate your key yet, just run the command below
#Jai.get_auth_key(email='email@mail.com', firstName='Jai', lastName='Z')

# Instantiating JAI

In [None]:
# Insert here your Authentication Key, that you have received in your email
AUTH_KEY= "insert_your_auth_key_here"
j = Jai(AUTH_KEY)

# Loading the dataset and checking basic information

In [None]:
df = pd.read_csv('https://myceliademo.blob.core.windows.net/example-classification-cc-default/creditcard.csv', index_col=0)


In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df["Class"].value_counts()

**Perceptions**: Was we can see, we are dealing only with numerical data in this dataset. Another very important characteristic to see is how skewed the classes are, where less than 1% of our dataset is defaulting. This will impact how we will train our model.

# Classification model 

In [None]:
# Just splitting the dataset here to show how we would use the j.predict methord further on
# When using j.fit in you real application, this is not necessary
X_train, X_test, y_train, y_test = train_test_split(
            df.drop(["Class"],axis=1), df["Class"], test_size=0.3, random_state=42)

In [None]:
# For classification models, we need to pass a dataframe contianing the label to JAI
train = pd.concat([X_train,y_train],axis=1)

In [None]:
# Checking the distribution of the class after the split
train["Class"].value_counts()

Now we are gonna train, test and validate our model with **j.fit**. This will create a **collection** inside JAI which will contain one **vector for each line** of our train dataset. These vectores are a numerical representation of each row that compress the whole information and extracts the most important characteristics of these data, where the vectors of the examples (rows) that are similar will be close to each other on its vectorial space =].

In [None]:
j.fit(
    # Here you will name your collection inside JAI
    name="cc_fraud_supervised", 
    
    # data should always receive a dataframe, even if it is of one column. 
    data=train, 
    
    # Here you will define the type of model you want to. The other options you have are  
    db_type='Supervised', 
    
    # You can set these parameter to True if you want to overweite an already created collection
    overwrite = False,
    
    # verbose =2 will bring the loss graph as well as the metrics result.
    verbose=2,
    
    # The split type as stratified guarantee that the same proportion of both classes are maintained for train, validation and test
    split = {'type':'stratified', "split_column": "Class"},
    
    # When we set task as *metric_classification* we use Supervised Contrastive Loss, which tries to make examples 
    #of the same class closer and make those of different classes apart 
    label={"task": "metric_classification", "label_name": "Class"}
)

# Checking your collection information

In [None]:
# List all collections in your subscription and some info about them
j.info

In [None]:
# Download the generated vectors. If you have too many vectors, this can take a while
#vectors = j.download_vectors('cc_fraud_supervised')

In [None]:
len(vectors)

In [None]:
vectors[0]

In [None]:
# The default size of each vector for the Supervised is 64
len(vectors[0])

**Hurray \0/!!!** Now your model is already deployed to be consumed by your applications. We will show below two way to apply your model to new data =].

# Make predictions and analysing the results

## Predictions without predict_proba

In [None]:
# Now we will make the predictions
#In this case, it will use 0.5 as threshold to return the predicted class
ans = j.predict(
    
    # Collection to be queried
    name='cc_fraud_supervised',
    
    # This will make your ansewer return as a dataframe
    as_frame=True,
    
    # Here you will pass a dataframe to predict which examples are default or not
    data=X_test
)


In [None]:
# ATTENTION: JAI ALWAYS RETURNS THE ANSWERS ORDERED BY ID! Bringin y_test like this will avoid mismathings.
ans["y_true"] = y_test

In [None]:
print(tabulate(ans.head(), headers='keys', tablefmt='rst'))

In [None]:
print(metrics.classification_report( ans["y_true"],ans["predict"],target_names=['0','1']))

## Predictions using predict_proba

In [None]:
ans = j.predict(
    
    # Collection to be queried
    name='cc_fraud_supervised',
    
    # This will bring the probabilities predicted
    predict_proba = True,
    
    # This will make your ansewer return as a dataframe
    as_frame=True,
    
    # Here you will pass a dataframe to predict which examples are default or not
    data=X_test
)



    >>> ans = j.predict(
    >>>     
    >>>     # Collection to be queried
    >>>     name='cc_fraud_supervised',
    >>>     
    >>>     # This will bring the probabilities predicted
    >>>     predict_proba = True,
    >>>     
    >>>     # This will make your ansewer return as a dataframe
    >>>     as_frame=True,
    >>>     
    >>>     # Here you will pass a dataframe to predict which examples are default or not
    >>>     data=X_test
    >>> )

In [None]:
# ATTENTION: JAI ALWAYS RETURNS THE ANSWERS ORDERED BY ID! Bringin y_test like this will avoid mismathings.
ans["y_true"] = y_test

In [None]:
print(tabulate(ans.head(), headers='keys', tablefmt='rst'))

In [None]:
# Calculating AUC Score using the predictions of examples being 1
roc_auc_score(ans["y_true"], ans["1"])

# Making your predictions using the API Rest

In [None]:
# Import requests libraries
import requests

# Set Authentication header
header={'Auth': AUTH_KEY}

# Set collection name
db_name = 'cc_fraud_supervised'

# Model inference endpoint
url_predict = f"https://mycelia.azure-api.net/predict/{db_name}"

# json body
# Note that we need to provide a column named 'id'
# Also note that we drop the 'PRICE' column because it is not a feature
body = X_test.reset_index().rename(columns={'index':'id'}).head().to_dict(orient='records')

# Make the request
ans = requests.put(url_predict, json=body, headers=header)
ans.json()