# Introduction

Scikit-learn is a powerful and easy-to-use Python library for machine learning. Here is a basic tutorial to get you started with Scikit-learn:

### Installation

First, you need to install Scikit-learn. You can do this using pip:

In [2]:
pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.4.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.13.0-cp312-cp312-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 0.0/60.6 kB ? eta -:--:--
     ------ --------------------------------- 10.2/60.6 kB ? eta -:--:--
     ------------ ------------------------- 20.5/60.6 kB 320.0 kB/s eta 0:00:01
     -------------------------------- ----- 51.2/60.6 kB 518.5 kB/s eta 0:00:01
     -------------------------------------- 60.6/60.6 kB 457.6 kB/s eta 0:00:00
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.4.2-cp312-cp312-win_amd64.whl (10.6 MB)
   ---------------------------------------- 0.0/10.6 MB ? eta -:--:--
    ---------------------------------

### Basic Workflow

The typical workflow in Scikit-learn involves the following steps:

1. Importing necessary libraries.
2. Loading the dataset.
3. Splitting the dataset into training and testing sets.
4. Choosing a model.
5. Training the model.
6. Making predictions.
7. Evaluating the model.

Let's go through each step with a simple example.

### Example: Iris Classification

We'll use the famous Iris dataset, which is included in Scikit-learn.

1. **Importing necessary libraries**

In [3]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

2. **Loading the dataset**

In [4]:
# Load the Iris dataset
# More info: https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
iris = load_iris()

# Create a DataFrame
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Display the first few rows of the dataset
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


3. **Splitting the dataset into training and testing sets**

In [5]:
# Define features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

4. **Choosing a model**

    For this example, we'll use the K-Nearest Neighbors (KNN) classifier

In [6]:
# Initialize the model
knn = KNeighborsClassifier(n_neighbors=3)

5. **Training the model**

In [7]:
# Train the model
knn.fit(X_train, y_train)

6. **Making predictions**

In [8]:
# Make predictions
y_pred = knn.predict(X_test)

7. **Evaluating the model**

In [9]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Generate a classification report
print('Classification Report:')
print(classification_report(y_test, y_pred))

# Generate a confusion matrix
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


### Conclusion

This tutorial provides a basic example of using Scikit-learn for a classification task. Scikit-learn offers many more models and utilities for different machine learning tasks, such as regression, clustering, dimensionality reduction, and more. You can explore the extensive Scikit-learn documentation to learn about all the features it provides.