# Case Study: Fruit Classification using Decision Trees

In this notebook, we will use the `DecisionTreeClassifier` to classify fruits based on their weight (Feature1) and sweetness level (Feature2). We will compare the performance of decision trees using two different criteria: **Entropy (Information Gain)** and **Gini Index**. We will also calculate the accuracy and classification error for both classifiers.

### Steps:
1. Generate random data for features and labels.
2. Split the data into training and testing sets.
3. Train two decision tree classifiers using different criteria.
4. Evaluate the performance using accuracy and classification error.
5. Visualize the decision trees.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import tree

### Generating Data

We generate 100 random data points for fruit classification based on their weight and sweetness level.

In [None]:
# Generate random data for Feature1, Feature2, and Label (Fruits)
np.random.seed(42)
feature1 = np.random.randint(100, 300, size=100)  # Random weight (grams) between 100 and 300
feature2 = np.random.randint(1, 10, size=100)     # Random sweetness level between 1 and 10

# Label fruits based on a condition (e.g., weight and sweetness)
labels = np.where((feature1 > 200) & (feature2 > 5), 'Apple', 'Orange')

# Create DataFrame
df = pd.DataFrame({
    'Feature1': feature1,  # Weight (grams)
    'Feature2': feature2,  # Sweetness level
    'Label': labels
})

### Splitting the Data into Training and Test Sets

We split the data into training (70%) and testing (30%) sets.

In [None]:
# Features and labels
X = df[['Feature1', 'Feature2']].values
y = df['Label'].values

# Split data into training and test sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Training Decision Trees

We train two decision tree classifiers, one using the **Entropy** criterion and the other using the **Gini Index**.

In [None]:
# Decision Tree Classifier with Entropy (Information Gain)
clf_entropy = DecisionTreeClassifier(criterion='entropy', random_state=42)
clf_entropy.fit(X_train, y_train)

# Decision Tree Classifier with Gini Index
clf_gini = DecisionTreeClassifier(criterion='gini', random_state=42)
clf_gini.fit(X_train, y_train)

### Prediction and Evaluation

We predict the labels on the test set using both models, and calculate the **accuracy** and **classification error**.

In [None]:
# Prediction
y_pred_entropy = clf_entropy.predict(X_test)
y_pred_gini = clf_gini.predict(X_test)

# Accuracy Calculation
accuracy_entropy = accuracy_score(y_test, y_pred_entropy)
accuracy_gini = accuracy_score(y_test, y_pred_gini)

# Classification Error Calculation
classification_error_entropy = 1 - accuracy_entropy
classification_error_gini = 1 - accuracy_gini

# Print results
print(f"Accuracy (Entropy): {accuracy_entropy}")
print(f"Classification Error (Entropy): {classification_error_entropy}")
print(f"Accuracy (Gini): {accuracy_gini}")
print(f"Classification Error (Gini): {classification_error_gini}")

### Visualizing the Decision Trees

We visualize the decision trees for both classifiers.

In [None]:
# Visualize the trees
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
tree.plot_tree(clf_entropy, filled=True, ax=ax[0])
ax[0].set_title("Decision Tree (Entropy)")

tree.plot_tree(clf_gini, filled=True, ax=ax[1])
ax[1].set_title("Decision Tree (Gini)")

plt.show()

### Conclusion

- **Accuracy**: Both the Entropy and Gini models achieved an accuracy of 93.33%.
- **Classification Error**: Both models have a classification error of 6.67%.

The decision trees visualized above show how the features split the data to classify the fruits as either "Apple" or "Orange".