# Homework 7 - Classification
In this assignment, we will be applying some basic classification methods to the soccer database (found on canvas). We will first need to import all the libraries required for this guide.

## Instructions
In this assignment, you will be performing the specified classification methods in Python.

---

### Step 1: Load Data

- Load the following attributes from `Player_Attributes`:
  - `gk_reflexes`
  - `gk_kicking`
  - `gk_handling`

These values will be used for classification.

---

### Step 2: Classification (Part 1)

- Use `gk_reflexes` and `gk_kicking`.
- Choose one of the attributes as the **target attribute**.
- Generate **five classes** in the target property by reducing the range of values in the target data.
- Split the data into **training** and **testing** sets.
- Apply the following methods and print the resulting `accuracy_score` from `sklearn.metrics`:
  - Logistic Regression
  - Support Vector Machine (SVM)
  - Decision Tree
  - K-Nearest Neighbors (KNN)

---

### Step 3: Classification (Part 2)

- Repeat **Step 2**, this time using:
  - `gk_kicking` and `gk_handling`

- Again, print the corresponding `accuracy_score` for each classification method.

---

### Step 4: Analysis (Comment in Python file)

Answer the following question as a **comment** in your Python file:

> Since this assignment (Classification) and the previous assignment (Regression) are with the same data, can you compare and conclude which technique is yielding best results?

---

### Dataset Overview
The dataset covers information about soccer players in sqlite format. This file is located in the `Datasets` directory of this repository. The file is called `fifa_soccer_dataset.sqlite.gz`. **This is the same file from the previous homework (assignment 4).**

If you haven't decompressed the file, you may need to follow the instructions below to decompress it.

**IMPORTANT** The database is compressed and needs to be decompressed before use. You can do this by running the following command in your terminal on Linux or MacOS:

```bash
gunzip Datasets/fifa_soccer_dataset.sqlite.gz
```

If you are using Windows, you can use the following command in your powershell:

```powershell
$sourceFile = "$PWD\Datasets\fifa_soccer_dataset.sqlite.gz"
$destinationFile = "$PWD\Datasets\fifa_soccer_dataset.sqlite"

$inputStream = [System.IO.File]::OpenRead($sourceFile)
$outputStream = [System.IO.File]::Create($destinationFile)
$gzipStream = New-Object System.IO.Compression.GzipStream($inputStream, [System.IO.Compression.CompressionMode]::Decompress)
$gzipStream.CopyTo($outputStream)

$gzipStream.Close()
$outputStream.Close()
$inputStream.Close()
```

Alternatively, you can extract the file using the GUI of your operating system.


### Submission Guidelines

- Submit your completed notebook as a HTML export, or a PDF file.

To export to HTML, if you are on Jupyter, select `File` > `Export Notebook As` > `HTML`.

If you are on VSCode, you can use the `Jupyter: Export to HTML` command.
 - Open the command palette (Ctrl+Shift+P or Cmd+Shift+P on Mac).
     - Search for `Jupyter: Export to HTML`.
     - Save the HTML file to your computer and submit it via Canvas.

---


In [1]:
import pandas as pd
import sqlite3
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

import os

# Local directory
print(os.getcwd())


c:\Ricardo\2025-02 SP25 USABLE ARTIFICIAL INTELLIGENCE\GitHub\usable_ai\Homework


To start this assignment, we first need to connect to the sqlite database, do so below.

In [3]:
# Input Code Here
dataset_path = "../Database/fifa_soccer_dataset.sqlite" # Fix your path accordingly

# Your Code Here
conn = sqlite3.connect(dataset_path)

Now connected, let's grab required attributes for the scenario from the `Player_Attributes`(Using gk_reflexes and gk_kicking) table.

In [6]:
# Your Code Here

player_attr_df = pd.read_sql("SELECT gk_reflexes, gk_kicking, gk_handling FROM Player_Attributes;", conn)

player_attr_df.head()

Unnamed: 0,gk_reflexes,gk_kicking,gk_handling
0,8.0,10.0,11.0
1,8.0,10.0,11.0
2,8.0,10.0,11.0
3,7.0,9.0,10.0
4,7.0,9.0,10.0


Droping the rows with are having missing values

In [18]:
 # Your Code Here

print('Before Missing values:\n', player_attr_df.isna().sum())
print('\nBefore shape:', player_attr_df.shape)

player_attr_df = player_attr_df.dropna(subset=['gk_handling', 'gk_reflexes', 'gk_kicking'])

print('\nAfter Missing values:\n', player_attr_df.isna().sum())
print('\nAfter shape:', player_attr_df.shape)

Before Missing values:
 gk_reflexes    836
gk_kicking     836
gk_handling    836
dtype: int64

Before shape: (183978, 3)

After Missing values:
 gk_reflexes    0
gk_kicking     0
gk_handling    0
dtype: int64

After shape: (183142, 3)


For this classifying, we'll be using the ` gk_reflexes` and `gk_kicking`. Pull these values into `x` and `y`.

In [19]:
x = player_attr_df[['gk_reflexes']].values # Your Code Here
y = player_attr_df[['gk_kicking']].values # Your Code Here

the target variable should be reduced to just 5 classes. 

In [None]:
x_classes =  pd.cut(player_attr_df['gk_reflexes'], bins=5, labels=['Class 1', 'Class 2', 'Class 3', 'Class 4', 'Class 5'])

Let's split the data set into test and training sets using the `train_test_split()` function. We'll want to transform our `x` variable, which can be done by calling the `transform()` function.

In [None]:
X_train, X_test, y_train, y_test=  # Your Code Here
sc= StandardScaler()
sc.fit(X_train)
X_train_std= sc._(X_train) # Your Code Here
X_test_std= sc._(X_test) # Your Code Here

To preform a logistic regression, we'll use the `LogisticRegression()` function. This may take a couple moments to run.

In [None]:
lr= _(C=1000.0, random_state=0,max_iter=1000) #Your Code Here
lr.fit(X_train_std, y_train.ravel())
y_pred= lr.predict(X_test_std)

print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

Great! Let's try applying SVM instead. Try using `SVC()` below, then use the same prediction and output methods as the above cell.

In [None]:
svm= _(kernel='linear', C=1.0, random_state=0, cache_size=7000) # Your Code Here
svm.fit(X_train_std, y_train.ravel())
y_pred = # Your Prediction Code Here

print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

Let's try using a KNeightbors Classifier. We can call the `KNeighborsClassifier()` function, and supply 2 parameters: `n_neighbors=5` and `matric='euclidean`. Once you run this method, display the accuracy of your model as you did in the above cells.

In [None]:
knn= # Your Code here

#Your Code to fit the model here

y_pred= # Your Prediction Code Here

# Your Accuracy Output Code Here

let's repeat the above steps agian with gk_kicking and gk_handling.

In [None]:
 # Your Code Here

Lastly, in the cell below, answer the question:
Since this assignment (Classification) and the previous assignment (Regression) are with the same data, can you compare and conclude which technique is yielding best results?

In [None]:
#Your Answer Here