### **Project Overview & Requirements**

* **Goal:** Build a simple, robust image classifier using Python to categorize images into "empty" or "not empty".
* **Process:** The workflow consists of four steps: Data Preparation, Data Splitting, Training, and Testing.
* **Libraries Used:**
    * `scikit-learn`: For machine learning models and utilities.
    * `scikit-image`: For image processing (loading and resizing).
    * `numpy`: For array manipulation.


* **Data Source:** Images come from a parking slot detector project. The "not empty" category contains cars, while the "empty" category contains empty parking spots.
* **Applicability:** This classifier works best for simple problems where categories are visually distinct, rather than complex state-of-the-art challenges.

### **Step 1: Data Preparation**

* **Setup:** Define input directories and categories (`empty`, `not_empty`).
* **Data Structures:** Create lists for `data` (image features) and `labels` (category indices).
* **Image Processing Loop:**
    * Iterate through all files in the directories using `os.listdir`.
    * Read images using `imread` from `skimage.io`.
    * **Resizing:** Resize all images to a uniform 15x15 resolution using `skimage.transform.resize`.
    * **Flattening:** Convert the image matrix (RGB) into a flat, 1D array using `.flatten()` before appending to the `data` list. Classifiers require 1D arrays as input.
    * **Labeling:** Append the corresponding category index to the `labels` list.


* **Conversion:** Convert the `data` and `labels` lists into Numpy arrays (`np.asarray`).

### **Step 2: Data Splitting**

* **Function:** Use `train_test_split` from `sklearn.model_selection` to divide data into training and testing sets.
* **Parameters:**
    * `test_size=0.2`: Allocates 20% of the data for testing and 80% for training.
    * `shuffle=True`: Randomizes the data order to remove bias.
    * `stratify=labels`: Ensures the proportion of categories in the splits matches the original dataset.



### **Step 3: Training the Classifier**

* **Model:** Use a Support Vector Classifier (`SVC`) from `sklearn.svm`.
* **Grid Search:** Implement `GridSearchCV` to test multiple hyperparameter combinations automatically.
* **Hyperparameters:**
    * `gamma`: Tested values `[0.01, 0.001, 0.0001]`.
    * `C`: Tested values `[1, 10, 100, 1000]`.


* **Execution:** This setup trains 12 different classifiers (3 gammas Ã— 4 Cs) to find the optimal combination.
* **Training:** Call `.fit(x_train, y_train)` on the grid search object to execute the training.

### **Step 4: Testing and Saving**

* **Selection:** Retrieve the best-performing model using `grid_search.best_estimator_`.
* **Prediction:** Use `.predict(x_test)` on the test set to generate classification predictions.
* **Evaluation:** Calculate the accuracy using `accuracy_score` from `sklearn.metrics`.
    * *Result:* The model achieved approximately 99.9% accuracy.


* **Saving:** Use the `pickle` library to save the trained model.
    * Use `pickle.dump` to write the model to a file named `model.p` for future use in other projects.

---
---

# **Suppport Vector Classifier**

**SVC (Support Vector Classifier) Crash Course**

### 1. The Core Concept: "The Widest Street"

Imagine you have red balls (cars) and blue balls (empty spots) thrown on the floor. Your job is to place a stick on the floor that perfectly separates them.

* **The Problem:** You could place the stick in many different angles and it would still separate them. Which one is best?
* **The SVC Solution:** SVC tries to find the position for the stick that leaves the **widest possible gap** (or street) between the red balls and the blue balls.
* This "stick" is called the **Hyperplane**.
* The "gap" is called the **Margin**.
* The specific balls that touch the edge of the street are called the **Support Vectors** (hence the name). They are the only data points the model actually cares about; the rest are safe behind the line.



### 2. The "Knobs" You Tuned (C and Gamma)

In the video, the code didn't just pick one setting; it used **Grid Search** to test different combinations of `C` and `Gamma`. Think of these as two knobs that control the "shape" of your stick.

#### **Knob 1: `C` (The Strictness)**

This controls how much you punish the model for making a mistake on the training data.

* **High C:** "I want 0 mistakes." The model will create a crazy, jagged boundary just to make sure every single training point is on the correct side. (Risk: Overfitting).
* **Low C:** "Chill out." The model accepts a few mistakes if it means keeping the boundary straight and simple. (Better for generalization).
* *In the video:* You tested `[1, 10, 100, 1000]` to see how strict the model should be.

#### **Knob 2: `Gamma` (The Curvature)**

This controls how far the influence of a single training example reaches.

* **High Gamma:** Only points very close to the boundary matter. This creates "islands" around specific points (a very curvy, complex boundary).
* **Low Gamma:** Points far away still have influence. This creates a smoother, gentler boundary.
* *In the video:* You tested `[0.01, 0.001, 0.0001]` to see how "curvy" the boundary needed to be.

### 3. Why did it work so well (99.9%)?

The video mentioned this classifier is robust for "visually super distinct" categories.

* **Empty Spots** are gray/uniform.
* **Cars** are colorful/complex.
In the mathematical "space" SVC looks at, these two groups are likely very far apart. It didn't need a complex jagged line; a simple "wide street" was enough to separate them perfectly.

### Summary

* **SVC** draws a line between your classes.
* It tries to make the **margin** (gap) as wide as possible.
* **C** and **Gamma** control how straight or wiggly that line is allowed to be.
* **Grid Search** simply automated the process of twisting these knobs until it found the perfect setting (the "Best Estimator").

---

## **SVM VS. SVC**

They are **the same thing**, but with a tiny technical distinction.

Think of it like this:

* **SVM (Support Vector Machine):** This is the name of the **theory** or the entire mathematical concept. It covers everything: classification (grouping things), regression (predicting numbers), and outlier detection.
* **SVC (Support Vector Classification):** This is the name of the **specific tool** inside the library (`scikit-learn`) that implements the SVM theory specifically for **Classification** tasks (like your "Empty vs. Not Empty" problem).

**In short:**
You are using an **SVM** (the concept) to solve your problem, and the tool you use to do it is called **SVC** (the code).

**Bonus Note:**
If you were trying to predict a continuous number (like the *price* of the car instead of whether it is there or not), you would use **SVR** (Support Vector Regression). Both SVC and SVR are types of SVMs.