In [1]:
# for loading the dataset
import seaborn as sns

In [2]:
# load the magic extension
%load_ext oci_genai_magics

OCIGenaiMagics extension loaded...
List of magic commands available:
* ask
* ask_code
* ask_data
* clear_history
* show_variables
* show_model_config
* genai_stats
* clear_stats


In [3]:
%show_model_config

Model configuration defined in config.py:
* Model:  meta.llama-3.1-405b-instruct
* Endpoint:  https://inference.generativeai.us-chicago-1.oci.oraclecloud.com
* Temperature:  0.1
* Max_tokens:  1024


In [None]:
%ask Create a report on Larry Ellison. Limit to 5 points.

In [4]:
%genai_stats

Performance metrics:
* Total requests:  0
* Total input tokens:  0
* Total output tokens:  0


In [None]:
%ask summarize in maximum two sentences

In [None]:
%genai_stats

In [None]:
%ask What is the 3d Laplace equation and when is it used ?

In [5]:
%ask What is a survival model ?

INFO:oci.circuit_breaker:Default Auth client Circuit breaker strategy enabled


A survival model, also known as a survival analysis or time-to-event model, is a type of statistical model used to analyze the time it takes for a specific event to occur. The event of interest can be anything from a patient's death, a machine's failure, or a customer's churn.

**Key characteristics of survival models:**

1. **Time-to-event data**: The outcome variable is the time it takes for the event to occur.
2. **Censoring**: Not all observations experience the event during the study period, so their event times are unknown (censored).
3. **Non-negative time**: Time cannot be negative, so the model must account for this constraint.

**Common applications of survival models:**

1. **Medical research**: Analyzing patient survival times to evaluate treatment effectiveness.
2. **Reliability engineering**: Predicting the time to failure of machines or components.
3. **Customer churn analysis**: Modeling the time until a customer stops using a service.
4. **Insurance risk assessment**: Estimating the likelihood of an event (e.g., accident, death) occurring within a certain time frame.

**Common types of survival models:**

1. **Kaplan-Meier estimator**: A non-parametric model that estimates the survival function.
2. **Cox proportional hazards model**: A semi-parametric model that estimates the hazard ratio.
3. **Parametric models**: Such as the Weibull, exponential, and log-normal distributions, which assume a specific distribution for the survival times.
4. **Machine learning models**: Such as random forests and neural networks, which can be used for survival analysis.

**Goals of survival modeling:**

1. **Estimate the survival function**: Describe the probability of surviving beyond a certain time point.
2. **Compare survival curves**: Evaluate the difference in survival times between groups (e.g., treatment vs. control).
3. **Identify risk factors**: Determine the factors associated with the event of interest.
4. **Predict event times**: Estimate the time until the event occurs for new, unseen data.

I hope this helps! Do you have any specific questions or applications in mind?

In [None]:
%clear_history

In [None]:
%ask is he a sailor ?

In [6]:
cities = ["Naples", "Rome", "Treviso", "London", "New York"]
counts = [1, 2, 3]

In [7]:
%%ask_data 
Analyze the cities list and print only names of cities in UK.
Print the names in alphabetical order.
Count the number of italian cities in the list

**UK Cities:**

1. London

**Italian Cities Count:** 3

Note: The list only contains one UK city, which is London. The Italian cities in the list are Naples, Rome, and Treviso.

In [8]:
%%ask_data 
Sum all the values in counts and print the result

**Sum of Counts:** 6

Explanation: The sum of the values in the counts list is calculated by adding 1 + 2 + 3, which equals 6.

In [9]:
# load a dataset
titanic = sns.load_dataset("titanic")

In [10]:
%show_variables

User-defined variables in the current session:
* cities (type: list): ['Naples', 'Rome', 'Treviso', 'London', 'New York']
* counts (type: list): [1, 2, 3]
* titanic (type: DataFrame):      survived  pclass     sex   age  sibsp  parch     fare embarked   class  \
0           0       3    male  22.0      1      0   7.2500        S   Third   
1           1       1  female  38.0      1      0  71.2833        C   First   
2           1       3  female  26.0      0      0   7.9250        S   Third   
3           1       1  female  35.0      1      0  53.1000        S   First   
4           0       3    male  35.0      0      0   8.0500        S   Third   
..        ...     ...     ...   ...    ...    ...      ...      ...     ...   
886         0       2    male  27.0      0      0  13.0000        S  Second   
887         1       1  female  19.0      0      0  30.0000        S   First   
888         0       3  female   NaN      1      2  23.4500        S   Third   
889         1       1    m

In [11]:
%%ask_data 
is titanic dataframe in your Context ?

Yes, the titanic DataFrame is now in my context.

You can ask me to analyze the data, and I'll be happy to help. What would you like to know about the Titanic dataset?

In [12]:
%%ask_code 
"Give me the code to train a classification model on titanic dataset, using catboost, to predict survival. Add comments to the code"

```python
```python
# Import necessary libraries
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Define features (X) and target (y)
X = titanic.drop('survived', axis=1)
y = titanic['survived']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define categorical features
categorical_features = ['sex', 'embarked', 'class', 'who', 'deck', 'embark_town', 'alive']

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, random_state=42)

# Train the model
model.fit(X_train, y_train, cat_features=categorical_features, verbose=False)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
```
```

In [14]:
%%ask_code 
"modify the code to compute precision and recall"

```python
```python
# Import necessary libraries
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, precision_score, recall_score

# Define features (X) and target (y)
X = titanic.drop('survived', axis=1)
y = titanic['survived']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define categorical features
categorical_features = ['sex', 'embarked', 'class', 'who', 'deck', 'embark_town', 'alive']

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, random_state=42)

# Train the model
model.fit(X_train, y_train, cat_features=categorical_features, verbose=False)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
```
```

In [None]:
%ask_code "provide the code to compute the correlation matrix for numeric features in dataset titanic"

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming the titanic dataset is already loaded into a DataFrame called 'titanic'
titanic_numeric = titanic.select_dtypes(include=[np.number])

# Compute the correlation matrix
corr_matrix = titanic_numeric.corr()

# Display the correlation matrix
print(corr_matrix)

# Visualize the correlation matrix using a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", square=True)
plt.title("Correlation Matrix for Numeric Features in Titanic Dataset")
plt.show()

In [None]:
%ask_code "modify the code to handle this error: Invalid type for cat_feature[object_idx=1,feature_idx=10]=NaN : cat_features must be integer or string, real number values and NaN values should be converted to string."

In [None]:
%ask_data "Analyze the titanic dataset I have loaded and provide a detailed report."

In [None]:
%genai_stats

In [None]:
%ask_data How many records and columns are in the titanic DataFrame ?

In [None]:
%ask how do you know the dataset contains 891 records if in the variable I have loaded has less records ?

In [None]:
%ask_data can you look at the titanic datframe you have in your context ?

In [None]:
%ask_data "list all columns in titanic dataset loaded with null values"

In [None]:
# another famous dataset
iris = sns.load_dataset("iris")

In [None]:
%ask_data "Analyze the iris dataset and provide a detailed report."

In [None]:
%ask_data Analyze the iris dataset. Do you think it is possible to identify the species from the features ?

In [None]:
%ask_data Analyze the iris dataset. Can you suggest to remove some features as redundant ?

In [None]:
%ask_code in catboost do I need to transform categorical in number ?

In [None]:
%ask_code "modify the code to use xgboost"

In [None]:
%ask What is catboost ? Give a detailed description.

In [None]:
%ask "can you suggest other gradient boosting models ?"

In [None]:
%ask_code "how to find rows with NaN values in titanic dataframe ?"