# Welcome to Jovyan AI

_Jovyan AI is an AI copilot for data science and machine learning right inside JupyterLab._

This notebook is a step-by-step guide to use Jovyan.

## Data

For this tutorial, we will use a Kaggle Playground competition dataset.  
The topic is to __Predict Calorie Expenditure__ ( Season 5, Episode 5) based on individuals characteristics and activities.  
__Please execute cells below to load data.__

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('sample_data/train.csv')

In [None]:
data.columns

# Cell Assistant

## Generate code

Jovyan Cell Assistant is integrated inside each cell of the notebook.  
It is ideal for generating or modifying code quickly without leaving your flow.  
Let's start with an example.

__Select the cell below__, you can then see a blue button "Generate". You can click on that button to open the prompt area. 

You can then ask Jovyan to generate a grid on 2 columns of distribution charts for all the columns.  
You can copy this prompt example:
```text
Generate a grid (on 2 columns) of chart of distribution for all the columns
```
You can click __"Submit"__ to send the request. Once the code is generated, you can choose __"Accept and Run"__ to execute the code.


## Modify code

You have generated your first cell with Jovyan! 
Now let's try the __Modify Code__ feature.

Please select the cell with all the charts above, in the input area, can you see that the button has become __Modify__? Now you can click on that button to open the cell assistant again. 

__Challenge__ : ask the assistant to "make the charts more colorful" for instance. You should see a new code is generated and you can review the diff and decide to keep or reject the solution.

## Shortcuts

All the features above have shortcuts so you can stay in the flow:
* Activate cell assistant - __Cmd+K__(Mac) or __Ctrl+K__(Linux/Windows)
* Send request - __Enter__ or Cancel - __Escape__
* Review code with 3 options:
  - Accept and Run : __Shift + Enter__
  - Reject : __Escape__
  - Accept only: __Enter__

Challenge : Select the cell below and ask the assistant to split data into train/test __using only shortcuts__

In [None]:
### Generate code here with only shortcuts

# Chat Assistant

__Jovyan Chat Assistant__ is integrated in the side bar. 

You can activate it with shortcut __Option+B__ (Mac) or __Alt+B__ (Linux/Windows).  
Another option is to click on the Jovyan AI's icon in the right sidebar.

The Chat Assistant can read and reason over all the cells in your notebook.  
It is an excellent tool for review the notebook and brainstorm.

## Brainstorm with Chat Assistant.
__Challenge__: Activate the chat Assistant and ask it 
```
Analyse the notebook and proposes 3 machine learning models that would perform best on this dataset
```

## Fix Error with Chat Assistant

Sometimes, bugs in data science code can "fail silently". It means it seems to run but the models do not perform well.   

Chat Assistant is particularly useful is such a case because it can spot issues our eyes may not catch.

__First please execute the cell below__

In [10]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
import numpy as np

# Identify categorical and numerical features
categorical_features = ['Sex']
numerical_features = ['Age', 'Height', 'Weight', 'Duration', 'Heart_Rate',
       'Body_Temp']

# Create preprocessing steps for numerical and categorical features
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(handle_unknown='ignore', drop='first'), categorical_features) # drop='first' avoids multicollinearity
    ],
)

# Create the pipeline with preprocessing and the model
lr_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
   # Proposed change for the 'regressor' step in the Pipeline
    ('regressor', LinearRegression(fit_intercept=False))
])

# Fit the pipeline to the training data
lr_pipeline.fit(X_train, y_train)

# Make predictions on the test data
y_pred = lr_pipeline.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Linear Regression Model Performance (using Pipeline):")
print(f"Mean Squared Error (MSE): {mse}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R-squared (R2): {r2}")


Linear Regression Model Performance (using Pipeline):
Mean Squared Error (MSE): 2033.083496567852
Root Mean Squared Error (RMSE): 45.08972717335793
R-squared (R2): 0.4754166342334265


__Ask Jovyan for review__

This model achieve an R2 of 0.45-0.5 which is not bad, but comparing to other solution on the leaderboard it is surprisingly bad.

__Challenge__: Ask Jovyan to investigate this cell and check if it can detect the issue.  
Select the cell and use __Option+L__ (Mac) or __Alt+L__ (Linux/Windows) to make the Chat Assistant focus on it.
Prompt example:
```
Review this cell, I suspect some issues because the performance is worse than expected
```

__Hint__: After fixing the issue, the R2 score should go up to above 0.95!!

# Next steps

__Congratulations!__ You are now familiar with Jovyan AI features.  
Few free to play around in this demo, but please note that the compute ressources here is limited and this environment is not secure to use your data.

To start leveraging the full power of Jovyan AI, checkout the steps to
- Install on your local JupyterLab [HERE](https://doc.jovyan-ai.com/installation/local.html)
- Or use it directly on Google Colab [HERE](https://doc.jovyan-ai.com/installation/colab.html)