**TUTORIAL**
1. Fetching the Dry Beans Dataset
This line fetches the Dry Beans dataset from the UCI Machine Learning Repository using the ucimlrepo library. The dataset contains features like the area, perimeter, and shape of beans, and the goal is to classify different types of dry beans. The id=602 is a unique identifier for the dataset on the UCI repository.

2. Separating Features and Target Variables
Here, we separate the data into:

X: Features (independent variables), which include various properties of the beans like size and shape.
y: The target (dependent variable), which is the label representing the type of bean.


3. Splitting the Dataset for Training and Testing
This step splits the dataset into training and testing sets. We're using an 80/20 split, where:

80% of the data will be used to train the model.
20% will be used to test its accuracy.
The parameter stratify=y ensures that the target labels are evenly distributed in both the training and testing sets. The random_state=42 ensures reproducibility (you'll get the same split every time you run the code).

4. Initializing the K-Nearest Neighbors (KNN) Classifier
We create a K-Nearest Neighbors (KNN) classifier. The KNN algorithm works by comparing new data points with the closest known data points (neighbors) and assigning the most common class label among them.

Here, we set n_neighbors=5, which means the algorithm will consider the 5 nearest neighbors to make predictions.

5. Training the KNN Classifier
In this step, we train the KNN model using the training data. The .fit() function takes the training features (X_train) and the corresponding labels (y_train), and the model learns from this data.

6. Making Predictions
Once the model is trained, we use it to make predictions on the test set. The knn.predict(X_test) line generates predictions for the test data and stores them in y_pred.

7. Evaluating the Model
Now, we evaluate the model using two key metrics:

Accuracy: This tells us what percentage of the predictions were correct. It's calculated using accuracy_score().
Classification Report: This gives a detailed breakdown of the model’s performance, including precision, recall, F1-score, and support for each class (type of dry bean). The classification_report() function handles this.

8. Displaying Results
Finally, we print out:

The shapes of the training and testing sets, so we know how many samples are used for each.
The accuracy of the model on the test data, formatted to two decimal places.
The full classification report, which provides insights into the model's performance for each class in the dataset.


**Conclusion**
This tutorial walks you through building a K-Nearest Neighbors classifier using the Dry Beans dataset. The dataset is split into training and testing sets, and the model is evaluated based on accuracy and classification performance. By adjusting the number of neighbors (with n_neighbors), you can experiment and see how it affects the accuracy of the classifier.

In [None]:
from ucimlrepo import fetch_ucirepo

# fetch dataset
dry_bean = fetch_ucirepo(id=602)

# data (as pandas dataframes)
X = dry_bean.data.features
y = dry_bean.data.targets

# metadata
print(dry_bean.metadata)

# variable information
print(dry_bean.variables)


{'uci_id': 602, 'name': 'Dry Bean', 'repository_url': 'https://archive.ics.uci.edu/dataset/602/dry+bean+dataset', 'data_url': 'https://archive.ics.uci.edu/static/public/602/data.csv', 'abstract': 'Images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. A total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.', 'area': 'Biology', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 13611, 'num_features': 16, 'feature_types': ['Integer', 'Real'], 'demographics': [], 'target_col': ['Class'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 2020, 'last_updated': 'Thu Mar 28 2024', 'dataset_doi': '10.24432/C50S4B', 'creators': [], 'intro_paper': {'ID': 244, 'type': 'NATIVE', 'title': 'Multiclass classification of dry beans using computer vision and machine learning techniques', 'authors': 'M. Koklu, Ilker Ali Özkan', 'venue': 'Co

In [None]:
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

# Fetch the Dry Beans dataset
dry_bean = fetch_ucirepo(id=602)

# Data separation
X = dry_bean.data.features  # Features
y = dry_bean.data.targets    # Target variable

# Applying train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Initialize the K-Nearest Neighbors Classifier
knn = KNeighborsClassifier(n_neighbors=5)  # You can adjust the number of neighbors

# Fit the model on the training data
knn.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
classification_report_output = classification_report(y_test, y_pred)

# Output the results
print(f"Training feature set shape: {X_train.shape}")
print(f"Testing feature set shape: {X_test.shape}")
print(f"Training target set shape: {y_train.shape}")
print(f"Testing target set shape: {y_test.shape}")
print(f"Accuracy of the K-Nearest Neighbors Classifier: {accuracy:.2f}")
print("Classification Report:")
print(classification_report_output)


  return self._fit(X, y)


Training feature set shape: (10888, 16)
Testing feature set shape: (2723, 16)
Training target set shape: (10888, 1)
Testing target set shape: (2723, 1)
Accuracy of the K-Nearest Neighbors Classifier: 0.72
Classification Report:
              precision    recall  f1-score   support

    BARBUNYA       0.52      0.51      0.51       265
      BOMBAY       1.00      1.00      1.00       104
        CALI       0.66      0.66      0.66       326
    DERMASON       0.81      0.89      0.85       709
       HOROZ       0.75      0.68      0.72       386
       SEKER       0.75      0.60      0.67       406
        SIRA       0.66      0.72      0.69       527

    accuracy                           0.72      2723
   macro avg       0.74      0.72      0.73      2723
weighted avg       0.72      0.72      0.72      2723

