#K-Nearest Neighbors (KNN) and Decision Trees (DT)
**Aim:**

To compare the performance of K-Nearest Neighbors (KNN) and Decision Trees (DT) algorithms on a given dataset for classification tasks.

**Dataset Source:**
Select a dataset from sources like UCI Machine Learning Repository, Kaggle, or scikit-learn datasets.

**Theory:**
**K-Nearest Neighbors (KNN):**

Algorithm Explanation:
KNN is a simple yet effective non-parametric classification algorithm.
For a new data point, it calculates the distance to all other points in the dataset.
It identifies the k-nearest neighbors (data points) based on a chosen distance metric (usually Euclidean distance).
Classification occurs by majority voting among the k neighbors.


**Decision Trees (DT):**

Algorithm Explanation:
Decision Trees create a tree-like structure by recursively partitioning the feature space.
At each node, the algorithm selects the best feature to split the data based on metrics like Gini impurity or information gain.
This process continues until it reaches a stopping criterion (e.g., maximum depth, minimum samples per leaf).


**Conclusion:**

Compare the performance metrics obtained from both algorithms.
Discuss the suitability of KNN and DT based on the dataset characteristics, computational resources, interpretability, and overall predictive performance.
Provide insights into when to prefer one algorithm over the other based on the findings.

In [5]:
from sklearn.datasets import fetch_openml, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

In [17]:
# Load the breast_cancer dataset
d = load_breast_cancer()

In [8]:
X, y = d.data, d.target
y = y.astype(int)

In [9]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
# Standardize the features (optional)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [11]:
#Fitting K-NN classifier to the training set
from sklearn.neighbors import KNeighborsClassifier
knnClassifier= KNeighborsClassifier(n_neighbors=1, metric='minkowski', p=2 )
knnClassifier.fit(X_train, y_train)

In [12]:
#Predicting the KNN test set result
y_predKnn= knnClassifier.predict(X_test)
knn_accuracy = accuracy_score(y_test, y_predKnn)
print("KNeighborsClassifier Accuracy:", knn_accuracy)

KNeighborsClassifier Accuracy: 0.9385964912280702


In [13]:
#Fitting Decision Tree classifier to the training set
from sklearn.tree import DecisionTreeClassifier
DecClassifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
DecClassifier.fit(X_train, y_train)

In [14]:
#Predicting the Decision Tree test set result
y_predDec= DecClassifier.predict(X_test)
dec_accuracy = accuracy_score(y_test, y_predDec)
print("DecisionTreeClassifier Accuracy:", dec_accuracy)

DecisionTreeClassifier Accuracy: 0.956140350877193


In [15]:
#Creating the Confusion matrix KNN
from sklearn.metrics import confusion_matrix
cmk= confusion_matrix(y_test, y_predKnn)
print(cmk)

[[39  4]
 [ 3 68]]


In [16]:
#Creating the Confusion matrix
from sklearn.metrics import confusion_matrix
cmd= confusion_matrix(y_test, y_predDec)
print(cmd)

[[39  4]
 [ 1 70]]
