### Creating a Random Forest Classifier in Python
#### What We’ll Do:

- Load the penguin dataset.
- Prepare the data (clean it and split it into training/testing sets).
- Create a Random Forest model using scikit-learn.
- Train it and test its accuracy.

#### Requirements:

- You need Python with pandas, seaborn, and scikit-learn installed. Install them with:


In [1]:
%pip install pandas seaborn scikit-learn

Note: you may need to restart the kernel to use updated packages.


In [7]:
# Step 1: Import libraries
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Step 2: Load the dataset
data = sns.load_dataset('penguins')

# Step 3: Clean the data
data = data.dropna()

# Step 4: Define features (X) and target (y)
X = data[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']]
y = data['species']

# Step 5: Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 6: Create and train the Random Forest model
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)
rf_model.fit(X_train, y_train)

# Step 7: Make predictions
y_pred = rf_model.predict(X_test)

# Step 8: Check performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Step 9: Predict a new penguin (Fixed)
new_penguin = pd.DataFrame(
    [[40.0, 18.0, 190, 4000]], 
    columns=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']
)
prediction = rf_model.predict(new_penguin)
print(f"Predicted species for the new penguin: {prediction[0]}")

Accuracy: 100.00%

Confusion Matrix:
[[31  0  0]
 [ 0 13  0]
 [ 0  0 23]]

Classification Report:
              precision    recall  f1-score   support

      Adelie       1.00      1.00      1.00        31
   Chinstrap       1.00      1.00      1.00        13
      Gentoo       1.00      1.00      1.00        23

    accuracy                           1.00        67
   macro avg       1.00      1.00      1.00        67
weighted avg       1.00      1.00      1.00        67

Predicted species for the new penguin: Adelie
