# Step 1: Business Understanding (Not Graded)

---

## Business Objective 
The client, CAICLE, is an investment firm with the strategic goal of establishing a new, world-class professional cycling "super team." To gain a competitive edge in the highly competitive rider recruitment market, CAICLE wants to move beyond traditional scouting methods.
The primary business objective is to implement a data-driven rider selection strategy. This project serves as the foundation for that strategy by developing a machine learning model capable of identifying riders with high potential for future success. The model's insights will directly inform multi-million dollar contract decisions, aiming to maximize the team's future performance and return on investment.
No manasdfg kanker github

---
## The Machine Learning Problem 
The client's request is to predict "future race event performance." This is a broad objective that must be translated into a specific, measurable machine learning problem. While predicting a rider's exact final rank in a race is a possibility, it presents significant challenges due to high variance and randomness (e.g., crashes, mechanical failures, team tactics).
Therefore, we define the problem as a binary classification task. The model will not predict the exact placement, but rather the probability of a rider finishing in the top 10 of a race.
#### **Justification**:
* Robustness: This approach is less sensitive to the noise of exact rankings and focuses on a more stable signal of high performance.
* Business Alignment: A "super team" is built on riders who are consistently competitive, not just occasional winners. Identifying frequent top-10 finishers is directly aligned with the goal of building a strong, reliable team for season-long competitions.
* Technical Feasibility: A classification model is more tractable and often yields more reliable and interpretable results for this type of problem.
---

## Success Criteria
To ensure the project delivers tangible value, we define success across two dimensions: technical performance and business impact.
### Technical Success
The model's predictive power must be statistically significant and outperform simple baseline methods (e.g., predicting based on past UCI points alone). Our primary evaluation metric will be Precision.
* Target: Achieve a Precision score of over 75% for the "Top 10 Finisher" class.
* Rationale: Precision is critical because it measures the reliability of our positive predictions. A high precision ensures that when the model identifies a rider as a high-potential candidate, we can be confident in that assessment, minimizing the risk of investing in underperforming talent.
* Example: If the model identifies 100 rider performances as potential top-10
---

## Research

Our methodology will be grounded in established research within sports analytics and performance modeling to ensure a robust and effective approach. Our investigation will focus on several key areas to inform feature engineering and model selection.

### Key Research Areas
* **Dynamic Rating Systems:** We will explore the adaptation of dynamic rating systems, such as the **Elo rating system**, which is traditionally used in chess and other head-to-head competitions. Applying a similar concept could allow us to create a feature that quantifies a rider's current form and competitive standing relative to their peers.
* **Performance Profiling:** Our feature engineering will be guided by academic work on cyclist power profiling and performance modeling. We will consult research on how to use historical race data (e.g., results from mountain stages, time trials, and flat stages) to create features that act as proxies for a rider's physiological specialties (e.g., Climber, Sprinter, Time Trialist).
* **Predictive Modeling in Endurance Sports:** We will conduct a literature review of existing machine learning applications for predicting outcomes in endurance sports. This will help us select the most appropriate models (e.g., Gradient Boosting, Random Forest) and validation strategies for the specific characteristics of cycling data.

### Example Metric Application (Recall)
While **Precision** is our primary metric to avoid bad investments, we must also track **Recall** to ensure the model doesn't miss out on high-potential talent (i.e., avoid False Negatives).

* **Scenario:** Imagine in a future Grand Tour, 20 riders who were not considered "superstars" finish in the top 10 of various stages.
* **Calculation:** If our model's shortlist had predicted that 15 of these 20 riders would perform well, our Recall for identifying surprise talent would be:
    $$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} = \frac{15}{20} = 75\%$$
* **Business Impact:** A low Recall would mean our model is too conservative and is failing to identify potential breakthrough stars, representing a significant missed opportunity for CAICLE's recruitment strategy.

---

### Professional Analytics & Media
* **ProCyclingStats.com:** ([https://www.procyclingstats.com/](https://www.procyclingstats.com/))  
  * **Relevance:** The source of much of our data. Their internal "PCS Ranking" is a sophisticated statistical model that ranks riders based on performance. Understanding how their point system works can provide a baseline and inspire features for our own model.

### Fantasy Sports Platforms
* **Velogames:** ([https://www.velogames.com/](https://www.velogames.com/))  
  * **Relevance:** Popular fantasy cycling platform. Their rider pricing is effectively a predictive model of expected performance. Analyzing their rider costs can highlight which features the "market" values.

### Academic & Research Projects
* **A Machine Learning Approach for Road Cycling Race Performance Prediction (University of Antwerp):** ([Repository Link](https://repository.uantwerpen.be/link/irua/174561))  
  * **Relevance:** Framework to predict one-day race results using historical data. Strong example of problem framing, feature selection, and handling race variability.
* **Predicting the Next Pogačar (PubMed):** ([https://pubmed.ncbi.nlm.nih.gov/35068645/](https://pubmed.ncbi.nlm.nih.gov/35068645/))  
  * **Relevance:** Focused on talent identification, using junior race data to predict future stars. Helpful for CAICLE’s objective of spotting high-potential riders before they peak.
* **Data-driven Support of Coaches in Professional Cycling (TU Eindhoven):** ([Research Link](https://research.tue.nl/en/publications/data-driven-support-of-coaches-in-professional-cycling-using-race))  
  * **Relevance:** Predicts riders’ performance metrics from training and past results, providing decision support to coaches. Similar evaluation methods can be adapted for CAICLE’s scouting strategy.

  ---

## Responsibilities
- Translate the business problem into an ML target
- Write up the business problem and success criteria
- Research similar problems in sports analytics and summarize findings

| Task | Description |
|------|------------|
| Define ML target | Decide what “future race performance” means quantitatively |
| Success criteria | Specify acceptable prediction error and business impact |
| Research | Investigate similar sports analytics projects |

---