# Classification

In this example, we'll show how to use a model to predict values from your input.
We are using the Penguins dataset. We'll ask you to input a few details to classify based on parameters such as bill and flipper size. The result will be the species of that specific penguin.


In [1]:
# %pip install seaborn scikit-learn pandas sklearn-evaluation --quiet

In [2]:
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn_evaluation import plot, table
import ipywidgets as widgets
from IPython.display import display, HTML

# Center notebook
display(HTML("""
<style>
.output {
    align-items: center;
}
</style>
"""))

# Based on
# https://github.com/Adeyinka-hub/Machine-Learning-2/blob/master/Penguin%20Dataset.ipynb

## Sample our dataset
Take a look on the actual data

In [3]:
df = sns.load_dataset("penguins")

# Review a sample of the data
df.head(5)

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [4]:
#Data cleaning
df.isnull().sum()
df.dropna(inplace=True)
Y = df.species
Y = Y.map({"Adelie": 0, "Chinstrap": 1, "Gentoo": 2})
df.drop("species", inplace=True, axis=1)
se = pd.get_dummies(df["sex"], drop_first=True)
df = pd.concat([df, se], axis=1)
df.drop("sex", axis=1, inplace=True)
le = LabelEncoder()
df["island"] = le.fit_transform(df["island"])
df.drop(["bill_depth_mm", "body_mass_g", "Male"], inplace=True, axis=1)

In [5]:
# Train the Decision Tree Classifier
X = df
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.3, random_state=40
)

In [11]:
from IPython.display import Javascript
# Create a button widget
train_button = widgets.Button(description='Train Your Model')

# Create an output area for displaying results
train_output = widgets.Output()

# Define a global variables
global dt_model
global y_pred_dt
global dtc
global run_eval
global accuracy
run_eval = False

# Define a function to run the model and print accuracy
def train_and_toggle_cells(train_button):
    print(train_button)
    global dt_model
    global y_pred_dt
    global dtc
    global run_eval
    global accuracy
    
    with train_output:
        train_output.clear_output()  # Clear previous output
        # Perform model training (replace this with your actual code)
        dtc = DecisionTreeClassifier()
        dt_model = dtc.fit(X_train, y_train)
        y_pred_dt = dt_model.predict(X_test)
        accuracy = dtc.score(X_test, y_test)
        run_eval = True
        print("Model is trained")
    # display(Javascript('IPython.notebook.execute_cells_below()'))

# Associate the function with the button's click event
train_button.on_click(train_and_toggle_cells)

# Display the button and output area
display(train_button, train_output)

SyntaxError: invalid syntax (1195682862.py, line 1)

---

## Model evaluation merics

In this section, we can easily evaluate our model via a confusion matrix, and understand which feature affects our accuracy by order of importance.

In [8]:
metrics_output = widgets.Output()

def show_model_metrics(mm):
    with metrics_output:
        metrics_output.clear_output()  # Clear previous output
        print("Accuracy on test data: {:.3f}".format(accuracy))
        # plot.confusion_matrix(y_test, y_pred_dt)
        # print(plot.feature_importances(dtc, top_n=5, feature_names=list(dtc.feature_names_in_)))
        print(table.feature_importances(dtc, feature_names=list(dtc.feature_names_in_)))

button = widgets.Button(description="Show model metrics")
button.on_click(show_model_metrics)
display(button, metrics_output)

Button(description='Show model metrics', style=ButtonStyle())

Output()

In addition to the accuracy, we can also represent the feature importance through a table which we can query with SQL. For more information, check our [tracking guide](https://sklearn-evaluation.ploomber.io/en/latest/api/SQLiteTracker.html)

---

### Use sample data or use your own
The defaults in the form are part of a test dataset. You can predict your own penguin or check our model is working correctly.

In [9]:
# Readable Mappings
island_map = {
    0: 'Biscoe',
    1: 'Dream',
    2: 'Torgersen',
}

reverse_island_map = {
    'Biscoe': 0,
    'Dream': 1,
    'Torgersen': 2
}

species_map = {
    0: 'Adelie',
    1: 'Chinstrap',
    2: 'Gentoo',
}

# Create input widgets for each column
sample_data = X_test.iloc[0]
common_layout = widgets.Layout(width='200px')  # Adjust the width as needed

island_input = widgets.Dropdown(
    options=['Biscoe', 'Dream', 'Torgersen'],
    description='Island:',
    value=island_map[sample_data["island"]]
)

bill_length_input = widgets.FloatText(description='Bill Length (mm):', value=sample_data["bill_length_mm"])
flipper_length_input = widgets.FloatText(description='Flipper Length (mm):', value=sample_data["flipper_length_mm"])

# Styling the form labels width
display(HTML('''<style>
    .widget-label { min-width: 20ex !important; }
    
    /* Center-align widgets within a container */
    .widget-container { display: flex; justify-content: center; align-items: center; }
        .widget-floattext { background-color: #f2f2f2; }


</style>'''))

# Create a button for prediction
predict_button = widgets.Button(description='Predict')

# Create an output area for displaying predictions
output = widgets.Output()
        
# Define a function to make predictions
def predict_penguin(button):
    print(button)
    # Gather user inputs
    input_data = {
        'island': reverse_island_map[island_input.value],
        'bill_length_mm': bill_length_input.value,
        'flipper_length_mm': flipper_length_input.value
    }

    # Call your predict_with_decision_tree function (assuming you've defined it as in a previous response)
    input_array = pd.DataFrame([input_data])
    res = dt_model.predict(input_array)[0]

    with output:
        output.clear_output()
        print(f'Predicted Penguin Species: {species_map[res]}')

# Connect the button to the prediction function
predict_button.on_click(predict_penguin)

# Display the input form and output
display(island_input, bill_length_input, flipper_length_input,
        predict_button, output)

Dropdown(description='Island:', index=1, options=('Biscoe', 'Dream', 'Torgersen'), value='Dream')

FloatText(value=46.7, description='Bill Length (mm):')

FloatText(value=195.0, description='Flipper Length (mm):')

Button(description='Predict', style=ButtonStyle())

Output()

#### Expected species with sample data:

In [10]:
print(f'Expected Penguin Species: {species_map[y_test.iloc[0]]}')

Expected Penguin Species: Chinstrap
