In [25]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Sample dataset
data = {'Person': [1, 2, 3],
        'Favorite Car Color': ['Red', 'Blue', 'Green']}
df = pd.DataFrame(data)

# Perform one-hot encoding
df_encoded = pd.get_dummies(df, columns=['Favorite Car Color'], prefix=['Color'])

# Display the encoded dataset
print(df_encoded)

   Person  Color_Blue  Color_Green  Color_Red
0       1       False        False       True
1       2        True        False      False
2       3       False         True      False


In [26]:
import pandas as pd

data = {'Weight (grams)': [150, 130, 160, 140, 155, 145],
        'Color (1=Red, 0=Orange)': [1, 1, 0, 0, 1, 0],
        'Fruit Type': ['Apple', 'Apple', 'Orange', 'Orange', 'Apple', 'Orange']}
df = pd.DataFrame(data)

In [27]:
X = df[['Weight (grams)', 'Color (1=Red, 0=Orange)']]
y = df['Fruit Type']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Gradient Boosting

Gradient Boosting is like a process where you start with a simple guess, find out where you went wrong, and then make a slightly better guess based on your mistakes. You keep doing this until your guesses are really good.

When to Use:

Gradient Boosting is suitable for a wide range of classification tasks.
- Particularly effective when dealing with structured data.
- It builds decision trees sequentially, improving on mistakes made by the previous trees.
- Useful for reducing bias and variance.

In [28]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Create and train the Gradient Boosting classifier
gb_model = GradientBoostingClassifier()
gb_model.fit(X_train, y_train)

# Make predictions on the test set
gb_y_pred = gb_model.predict(X_test)

# Calculate accuracy
gb_accuracy = accuracy_score(y_test, gb_y_pred)
print(f'Accuracy (Gradient Boosting): {gb_accuracy}')

Accuracy (Gradient Boosting): 1.0


### XGBoost (Extreme Gradient Boosting)

XGBoost is a popular choice in machine learning competitions and real-world applications because of its speed and accuracy. It’s like taking Gradient Boosting to the next level by using advanced techniques to learn from errors and make very accurate predictions.

When to Use:

XGBoost is a powerful and versatile model suitable for a wide range of classification tasks.
- It works well when dealing with structured data, tabular data, and numeric features.
- Especially effective when dealing with imbalanced datasets.
- Popular in Kaggle competitions due to its high performance.


### Random Forest

Random Forest is known for its simplicity, speed, and ability to handle complex data. It’s like having a group of diverse advisors (trees) to help you make better decisions (predictions) in machine learning tasks.

When to Use:

- Random Forest is robust and versatile, suitable for various classification tasks.
- Effective for both structured and unstructured data.
- Handles high-dimensional data well.
- Resistant to overfitting.

In [30]:
from sklearn.ensemble import RandomForestClassifier

# Create and train the Random Forest classifier
rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)

# Make predictions on the test set
rf_y_pred = rf_model.predict(X_test)

# Calculate accuracy
rf_accuracy = accuracy_score(y_test, rf_y_pred)
print(f'Accuracy (Random Forest): {rf_accuracy}')

Accuracy (Random Forest): 0.0


### Support Vector Classifier (SVC)

SVC is known for its ability to handle complex datasets and find the most effective way to separate different classes. It’s like finding the safest path through a minefield by drawing a well-placed fence to guide you.

When to Use:

- SVC is useful for binary classification tasks.
- Works well when there’s a clear margin of separation between classes.
- Effective for high-dimensional data.
- Sensitive to feature scaling.


In [31]:
from sklearn.svm import SVC

# Create and train the Support Vector Classifier
svc_model = SVC()
svc_model.fit(X_train, y_train)

# Make predictions on the test set
svc_y_pred = svc_model.predict(X_test)

# Calculate accuracy
svc_accuracy = accuracy_score(y_test, svc_y_pred)
print(f'Accuracy (SVC): {svc_accuracy}')

Accuracy (SVC): 0.0
