<a href="https://colab.research.google.com/github/fjadidi2001/Machine_Learning_Journey/blob/main/Predict_ATP_tennis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> ATP stands for the Association of Tennis Professionals. It is the governing body for men's professional tennis worldwide. Founded in 1972, the ATP is responsible for organizing the ATP Tour, which includes a series of tournaments such as the ATP Masters 1000, ATP 500, and ATP 250 events, along with the prestigious ATP Finals and the Davis Cup.

> The ATP also ranks male players based on their performance in tournaments, creating the ATP Rankings, which reflect players' results over the previous 52 weeks. The ATP plays a significant role in promoting the sport, securing player rights, and managing tournament standards.




# Problem Statement:
> “Analyzing and Predicting Player Performance in Professional Tennis: An Exploration of Court-Type Specialization and Peak Performance Age”





## Project Objectives
1. **Exploratory Analysis of Player Characteristics**  
   - Analyze the distribution of player ages, rankings, and Elo ratings across different surfaces (hard court, clay court, grass court).
   - Identify trends in age and performance metrics to determine how they vary across court types.

2. **Court-Type Specialization Prediction**  
   - Build a classification model to predict a player's specialization (e.g., "hard-court specialist," "clay-court specialist," or "all-rounder") based on historical performance data.

3. **Peak Performance Prediction**  
   - Investigate the relationship between age and peak performance, specifically focusing on the "Peak Age" and "Peak Elo" features.
   - Use regression techniques to predict a player’s peak Elo rating and the age at which they will reach peak performance.

4. **Gender-Based Performance Analysis**  
   - Conduct a comparative analysis of performance metrics between male and female players, examining if and how gender influences court-type specialization and peak performance.

5. **Evaluation and Interpretability**  
   - Evaluate model performance using metrics such as accuracy, F1-score, mean squared error (MSE), etc., depending on the task (classification or regression).
   - Incorporate interpretability techniques (e.g., SHAP values or feature importance) to provide insights into the factors contributing to a player’s peak performance and specialization.




### Contribution and Impact
This project could provide valuable insights into tennis player development and training, helping coaches and analysts better understand the factors that contribute to a player’s success on different court types. It could also provide insights into the ideal age for peak performance, which is valuable for athlete management and career planning.

# Step 1: load the dataset

In [2]:
from google.colab import files
uploaded = files.upload()

Saving archive.zip to archive.zip


In [5]:
import zipfile
import io

zip_file_name = 'archive.zip'
with zipfile.ZipFile(zip_file_name, 'r') as zip_ref:
    zip_ref.extractall("./dataset")

In [6]:
import pandas as pd

csv_file_path = "./dataset/ATP.csv"
data = pd.read_csv(csv_file_path)

In [7]:
print(data.head())

             Player   Age     Elo HardRaw ClayRaw GrassRaw  \
0    Novak Djokovic  34.5  2185.2    2068  2016.3   1942.4   
1   Daniil Medvedev  25.9  2166.2  2095.3  1714.1   1723.4   
2  Alexander Zverev  24.7  2141.3  2056.7  1999.1   1671.4   
3     Roger Federer  39.9  2043.0  1936.7  1749.6   1813.4   
4    Carlos Alcaraz  18.5  2029.5  1906.3  1834.4   1441.4   

   hard court elo rating  clay-court elo rating  grass-court elo rating  \
0                 2126.6                 2100.8                  2063.8   
1                 2130.7                 1940.1                  1944.8   
2                 2099.0                 2070.2                  1906.4   
3                 1989.8                 1896.3                  1928.2   
4                 1967.9                 1932.0                  1735.5   

              Peak Match  Peak Age  Peak Elo Gender  Rank  
0           2016 Miami F      28.8    2470.0   Male     1  
1        2022 Atp Cup RR      25.9    2175.1   Male     

Detailed explanation of each column in dataset:

1. **Player**:
   - **Description**: The name of the tennis player. This is typically the player’s full name, which may include first and last names or additional titles.
   - **Use Case**: Useful for identifying individual players within the dataset and for any player-specific analyses.

2. **Age**:
   - **Description**: The age of the player, likely calculated at the time of data collection.
   - **Use Case**: Important for analyzing the impact of age on player performance, rankings, and potential career longevity.

3. **Elo**:
   - **Description**: The player's overall Elo rating, which is a metric used to estimate a player’s skill level based on match results. **Higher Elo ratings indicate better performance.**
   - **Use Case**: Often used as a measure of player strength in predictive modeling or comparative analysis against other players.

4. **HardRaw**:
   - **Description**: The player's Elo rating specifically for hard court surfaces.
   - **Use Case**: Useful for analyzing player performance specifically on hard courts, helping to understand strengths and weaknesses on different surfaces.

5. **ClayRaw**:
   - **Description**: The player's Elo rating specifically for clay court surfaces.
   - **Use Case**: Similar to **HardRaw**, it provides insights into player performance on clay courts, which can differ significantly from performance on other surfaces.

6. **GrassRaw**:
   - **Description**: The player's Elo rating specifically for grass court surfaces.
   - **Use Case**: Highlights player strengths and weaknesses on grass courts, important for tournaments played on this surface, like Wimbledon.

7. **hard court elo rating**:
   - **Description**: This might represent a more refined or updated ranking for hard court surfaces.
   - **Use Case**: This column can provide an up-to-date assessment of player performance on hard courts.

8. **clay-court elo rating**:
   - **Description**: Similar to the previous column, but focused on clay courts.
   - **Use Case**: Indicates how the player's skills are evaluated on clay surfaces, potentially offering comparisons with historical or surface-specific performance.

9. **grass-court elo rating**:
   - **Description**: Represents the player's Elo rating for grass courts, possibly indicating recent performance trends.
   - **Use Case**: Helps in understanding how well the player performs on grass compared to their overall performance.

10. **Peak Match**:
    - **Description**: The name of a specific match where the player achieved their peak Elo rating or notable performance.
    - **Use Case**: This column can be used to identify key matches in a player's career that contributed to their highest performance metrics.

11. **Peak Age**:
    - **Description**: The age at which the player achieved their peak Elo rating or performance.
    - **Use Case**: Useful for analyzing career trajectories and understanding the age at which players typically perform their best.

12. **Peak Elo**:
    - **Description**: The highest Elo rating achieved by the player.
    - **Use Case**: This column is crucial for assessing historical performance and potential peaks in skill level over the player’s career.

13. **Gender**:
    - **Description**: Indicates the player's gender (e.g., Male, Female).
    - **Use Case**: Important for gender-based analysis in tennis, helping to identify trends and differences in performance or rankings between genders.

14. **Rank**:
    - **Description**: The player's current ATP/WTA ranking, which is a reflection of their performance over a specified period.
    - **Use Case**: This is key for understanding a player’s standing in professional tennis and is often the target variable in ranking-related analyses.

### Summary

These columns offer a diverse array of metrics that facilitate in-depth analyses of player performance across different stages of their careers and on various surfaces. You can leverage this dataset for predictive modeling, performance analysis, or comparative studies between players.

If you need more details on how to use this data effectively or have any specific analyses in mind, let me know!


# Step 2: EDA(Exploratory Data Analysis)

# Step 3: Preprocessing

# Step 4: Split dataset

# Step 5: Train Model

# step 6: Evaluate the model utilizing multiple metrics