# Project Proposal: Predicting NCAA Basketball Championship Winner

## Team Members
- Brandon Poblette
- Geof Spilker

## Project Name
**Predicting NCAA Basketball Championship Winner**

## Dataset Description

**Source:** The dataset for this project will be sourced from publicly available NCAA basketball statistics, such as from Kaggle (e.g., "NCAA Basketball Dataset" or "March Madness Dataset").

**Format:** CSV or JSON format, depending on the source.

**Contents:**
The dataset will include historical data from NCAA basketball tournaments, possibly spanning several years. It will contain information about the teams, their performance, and attributes such as:

- **Team Name**: The name of the college basketball team.
- **Win-Loss Record**: The win-loss record of each team during the regular season.
- **Seed**: The seed of the team in the tournament (1 through 16 for each region).
- **Points Scored**: Total points scored by the team.
- **Points Allowed**: Total points allowed by the team.
- **Field Goal Percentage**: Shooting efficiency.
- **Free Throw Percentage**: Performance from the free-throw line.
- **Rebounds**: Total rebounds per game.
- **Assists**: Total assists per game.
- **Turnovers**: Total turnovers committed.
- **Tournament Result**: A binary outcome (win/loss) indicating whether the team won the NCAA championship in that year.

## Class Information
- **Target Variable (Class to Predict):** The class we will predict is whether a team wins the NCAA championship in a given year. This can be modeled as a binary classification problem:
  - **1**: The team won the NCAA championship.
  - **0**: The team did not win.

## Implementation/Technical Merit

The project will involve building a classification model that can predict the likelihood of a college basketball team winning the NCAA championship. This will involve:
1. **Data Pre-processing**: Handling missing values, normalizing numeric attributes, and encoding categorical features (e.g., team name).
2. **Feature Engineering**: Creating new features that may improve the model's performance (e.g., team performance metrics like average points per game).
3. **Model Selection**: Exploring classification models such as Logistic Regression, k-Nearest Neighbors (kNN), and Decision Trees to predict the outcome.
4. **Evaluation**: Evaluating the performance of models using accuracy, precision, recall, and F1-score metrics.
5. **Hyperparameter Tuning**: Optimizing model parameters to enhance prediction accuracy.

## Anticipated Challenges

- **Data Imbalance**: There may be a disproportionate number of teams that did not win compared to those that did, leading to a skewed dataset. Techniques like oversampling, undersampling, or adjusting class weights will be considered.
- **Feature Selection**: With many attributes potentially influencing the outcome, deciding which features are the most impactful will be important.
- **Data Preprocessing**: The dataset might have missing or inconsistent data, requiring thorough cleaning and transformation.
  
## Feature Selection Techniques

If the dataset has a large number of features, we will explore various feature selection techniques, such as:
- **Correlation Matrix**: To identify highly correlated features that might be redundant.
- **Recursive Feature Elimination (RFE)**: A method to select the most relevant features based on model performance.
- **Principal Component Analysis (PCA)**: To reduce dimensionality by transforming the features into a smaller set of uncorrelated components.

## Potential Impact of the Results

- **Predictive Insights**: This project will provide insights into the factors that influence a college basketball team's chances of winning the NCAA championship.
- **Sports Analytics**: The results could be used by sports analysts, coaches, and enthusiasts to predict outcomes of future tournaments.
- **Fan Engagement**: Sports fans may use this tool to evaluate their teams' chances and engage more deeply with the tournament.

## Why are these Results Useful?

This model will help stakeholders in college basketball, such as:
- **Coaches**: To understand the strengths and weaknesses of their team relative to other teams in the tournament.
- **Sports Analysts**: To provide data-driven predictions and insights about the tournament's likely outcomes.
- **Fans**: To get a better understanding of their favorite team's chances of winning and to make more informed predictions or bets.

## Stakeholders

- **College Basketball Teams**: Coaches and management could use the predictions to assess their team's chances in the tournament.
- **Sports Analysts and Journalists**: To create statistical models that forecast game outcomes and report on trends in the sport.
- **Fans and Betting Companies**: To predict team performance and improve engagement with the tournament.