In [1]:
# for loading the dataset
import seaborn as sns

In [2]:
# load the magic extension
%load_ext oci_genai_magics

OCIGenaiMagics extension loaded...
List of magic commands available:
ask
ask_code
ask_data
clear_history
show_variables
show_model_config


In [3]:
%show_model_config

Model configuration defined in config.py:
* Model:  meta.llama-3.1-70b-instruct
* Endpoint:  https://inference.generativeai.eu-frankfurt-1.oci.oraclecloud.com
* Temperature:  0.1
* Max_tokens:  1024


In [None]:
%ask Create a report on Larry Ellison. Limit to 5 points.

In [None]:
%ask "is he a sailor?"

In [None]:
%clear_history

In [None]:
%ask "is he a sailor?"

In [None]:
cities = ["Naples", "Rome", "Treviso", "London", "New York"]
counts = [1, 2, 3]

In [None]:
%ask_data "Analyze the cities list and print only names of cities in UK. \
Print the names in alphabetical order. \
Count the number of italian cities in the list"

In [None]:
%ask_data "Sum all the values in counts and print the result."

In [4]:
# load a dataset
titanic = sns.load_dataset("titanic")

In [5]:
%show_variables

User-defined variables in the current session:
* titanic:      survived  pclass     sex   age  sibsp  parch     fare embarked   class  \
0           0       3    male  22.0      1      0   7.2500        S   Third   
1           1       1  female  38.0      1      0  71.2833        C   First   
2           1       3  female  26.0      0      0   7.9250        S   Third   
3           1       1  female  35.0      1      0  53.1000        S   First   
4           0       3    male  35.0      0      0   8.0500        S   Third   
..        ...     ...     ...   ...    ...    ...      ...      ...     ...   
886         0       2    male  27.0      0      0  13.0000        S  Second   
887         1       1  female  19.0      0      0  30.0000        S   First   
888         0       3  female   NaN      1      2  23.4500        S   Third   
889         1       1    male  26.0      0      0  30.0000        C   First   
890         0       3    male  32.0      0      0   7.7500        Q   Thi

In [None]:
%ask_data "Analyze the titanic dataset and provide a detailed report."

In [None]:
%ask_data "How many records and columns are in the titanic dataframe?"

In [None]:
%ask_data "Analyse the titanic dataset, show the correlation matrix."

In [None]:
# another famous dataset
iris = sns.load_dataset("iris")

In [None]:
%ask_data "Analyze the iris dataset and provide a detailed report."

In [None]:
%ask_data "Analyze the iris dataset. Do you think it is possible to identify the species from the features?"

In [None]:
%ask_data "Analyze the iris dataset. Can you suggest to remove some features as redundant?"

In [6]:
%ask_code "Give me the code to train a classification model on titanic dataset, using catboost, to predict survival. Add comments to the code"

```python
# Import necessary libraries
import pandas as pd
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Define features (X) and target (y)
X = titanic.drop(['survived', 'alive'], axis=1)  # Drop target variables
y = titanic['survived']

# Convert categorical variables to categorical type
categorical_cols = X.select_dtypes(include=['object', 'category']).columns
X[categorical_cols] = X[categorical_cols].apply(lambda x: x.astype('category'))

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, random_state=42)

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy

In [7]:
%ask_code "modify the code to treat categorical features"

```python
# Import necessary libraries
import pandas as pd
from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Define features (X) and target (y)
X = titanic.drop(['survived', 'alive'], axis=1)  # Drop target variables
y = titanic['survived']

# Identify categorical columns
categorical_cols = X.select_dtypes(include=['object', 'category']).columns

# Create a CatBoost Pool with categorical features
train_pool = Pool(X_train, label=y_train, cat_features=categorical_cols)
test_pool = Pool(X_test, label=y_test, cat_features=categorical_cols)

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, random_state=42)

# Train the model
model.fit(train_pool)

# Make predictions on the test set
y_pred = model.predict(test_pool)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Re

In [8]:
%ask What is catboost? Give a detailed description.

**What is CatBoost?**

CatBoost is an open-source, gradient boosting on decision trees library developed by Yandex, a Russian technology company. It is designed to handle categorical features and is particularly well-suited for datasets with a mix of numerical and categorical variables. CatBoost is widely used in machine learning competitions and has been employed in various applications, including natural language processing, computer vision, and recommender systems.

**Key Features of CatBoost**

1. **Handling Categorical Features**: CatBoost is specifically designed to handle categorical features, which are common in many real-world datasets. It uses a technique called "target encoding" to convert categorical variables into numerical representations that can be used by the model.
2. **Gradient Boosting**: CatBoost uses gradient boosting, a popular ensemble learning technique that combines multiple weak models to create a strong predictive model. Gradient boosting is particularly eff

In [None]:
%ask "can you suggest other gradient boosting models?"