# Encoding Points dataset

This notebook covers the encoding steps for **Points_cleaned.csv**  
- **Data Source:** Cleaned CSV file (`Points_cleaned.csv`)  
- **Goal:** Encode categorical features (teams) using **One-Hot Encoding**.  

In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

## Loading the cleaned dataset

In [2]:
df = pd.read_csv('../Points_cleaned.csv')

In [3]:
df.head()

Unnamed: 0,team,position,gf,ga,gd,points
0,Manchester Utd,1,67,31,36,84
1,Aston Villa,2,57,40,17,74
2,Norwich City,3,61,65,-4,72
3,Blackburn,4,68,46,22,71
4,QPR,5,63,55,8,63


## One Hot Encoding

We’ll apply One-Hot Encoding to **team names** (`team`) since it is a nominal category.  
This creates a new binary column for each team.  

In [5]:
# Show shape before encoding
print("Before one-hot encoding:", df.shape)

# Apply one-hot encoding
df_encoded = pd.get_dummies(df, columns=['team'])

# Show shape after encoding
print("After one-hot encoding:", df_encoded.shape)

Before one-hot encoding: (646, 6)
After one-hot encoding: (646, 56)


## Preview encoded team columns

In [6]:
# Preview first 5 team one-hot columns
df_encoded.filter(like="team").head()

Unnamed: 0,team_Arsenal,team_Aston Villa,team_Barnsley,team_Birmingham City,team_Blackburn,team_Blackpool,team_Bolton,team_Bournemouth,team_Bradford City,team_Brentford,...,team_Sunderland,team_Swansea City,team_Swindon Town,team_Tottenham,team_Watford,team_West Brom,team_West Ham,team_Wigan Athletic,team_Wimbledon,team_Wolves
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,True,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Saving the encoded dataset

In [7]:
df_encoded.to_csv('../Points_encoded.csv', index=False)