<a href="https://colab.research.google.com/github/mrowsey16/IS4487Final/blob/main/NCAA_NIL_ANALYSIS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Title**: Final Project NCAA Case Study Notebook

**Group Members**: Madalyn Rowsey, Anthony Long, Ashley Fannon, Ziji Rong

**Short Summary of Business Problem and Goals of Analysis**: The National Collegiate Athletic Association (NCAA) oversees competition, eligibility, and compliance for over 1,000 institutions, and the recent expansion of Name, Image, and Likeness (NIL) rights has transformed the college athletics landscape. As student-athletes gain the ability to earn compensation through sponsorships and endorsements, universities face increasing uncertainty around how NIL activity influences program competitiveness, recruiting dynamics, and financial performance.
Our project aims to evaluate this emerging landscape by analyzing data on NIL deals, team performance, recruiting rankings, and program revenue. The goal is to identify measurable relationships between NIL activity and athletic or financial outcomes, enabling universities and stakeholders to make informed decisions about resource allocation, recruitment strategies, and long-term planning.


**NCAA Description:** The National Collegiate Athletic Association (NCAA) is the governing organization that oversees college sports in the United States. It is responsible for setting competition rules, managing eligibility standards for student-athletes, and organizing major national championships including March Madness, one of the most watched sporting events in the country. The NCAA includes over 1,100 colleges and universities across three divisions, ranging from large powerhouse athletic programs to smaller schools with limited resources.

Historically, the NCAA controlled athlete compensation, but recent policy changes such as NIL (Name, Image, and Likeness) rules have reshaped the college sports landscape by allowing student-athletes to earn money through endorsements, sponsorships, and personal branding.

**Business Challenge:** The NCAA’s Name, Image, and Likeness (NIL) rule change created a rapidly expanding marketplace where student-athletes can now earn compensation through sponsorships, endorsements, and personal branding. However, the distribution of NIL earnings is uneven, influenced by variables such as sport type, school exposure, conference affiliation, social media presence, and on-field performance.

Because of this imbalance, stakeholders including universities, brands, athletes, and agents lack a clear understanding of what factors drive NIL valuation, which athletes hold the highest market potential, and where opportunities exist to improve equity and revenue growth.

**Business Impact:** Understanding NIL value drivers is critical for improving competitive balance, optimizing recruiting strategy, and increasing revenue opportunities for athletes and universities. Programs that successfully identify high-potential athletes can invest more effectively in branding support, secure additional sponsorship funding, and improve athlete retention in an increasingly competitive recruiting environment.

On the commercial side, brands benefit from knowing which athletes generate the highest ROI, while athletes gain clarity on how to increase their personal market value through performance, visibility, and content strategy. A data-driven approach to NIL valuation therefore improves decision-making, increases sponsorship efficiency, and ultimately elevates the financial sustainability of college athletics.

**Relevant Industry/Market Factors:**

*   Sponsorship demand is rising rapidly, with brands seeking authentic influencer-style partnerships with athletes

*   NIL became legal only recently (2021) → rules are evolving fast, leaving uncertainty in regulation and long-term structure

*   Social media presence strongly influences NIL value follower count often outweighs on-field performance.

*   Power Five schools dominate exposure, creating revenue gaps between conferences, sports, and institutional resources.

*  Gender & sport disparities remain significant, with football and men’s basketball leading NIL earnings.

*   Collectives and third-party agencies are emerging, shifting negotiation power away from schools and toward athletes.

*   Market competition is increasing as more athletes enter the NIL space and brands diversify endorsement spending.

**Data Source:** NCAA Finances

**Explanation of Dataset:** This dataset provides financial information for NCAA athletic programs, showing how much each school generates in revenue and how much they spends on athletics. Each row represents a university, and the columns break down multiple financial categories including total revenue, total expenses, money allocated to specific sports, coaching salaries, and operating costs. The dataset helps illustrate the financial landscape of collegiate athletics, highlighting the gap between high-earning Power Five schools and smaller programs.

This dataset can be used to analyze spending efficiency, revenue generation by institution size or conference, and the sustainability of athletic departments, which directly connects to NIL opportunities, visibility, and resource access for student-athletes.

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))


Saving NCAA Finances Revenue and Expenses by School.xlsx to NCAA Finances Revenue and Expenses by School.xlsx
User uploaded file "NCAA Finances Revenue and Expenses by School.xlsx" with length 38936 bytes


In [None]:
import pandas as pd

# Assuming 'fn' contains the name of the uploaded file
df = pd.read_excel(fn)

print(df.head())

   Rank      School Conference  Total Revenue  Total Expenses Total Allocated  \
0     1  Ohio State     Big 10      251615345       225733418             $0*   
1     2       Texas     Big 12      239290648       225153011             $0*   
2     3     Alabama        SEC      214365357       195881911    $11,378,871*   
3     4    Michigan     Big 10      210652287       193559375       $153,059*   
4     5     Georgia        SEC      203048566       169026503     $3,530,802*   

   Percent Allocated  
0             0.0000  
1             0.0000  
2             0.0531  
3             0.0007  
4             0.0174  


**Data Source 2:** College Basketball Dataset

**Explanation of Dataset:** This dataset provides financial information for NCAA athletic programs, showing how much each school generates in revenue and how much they spends on athletics. Each row represents a university, and the columns break down multiple financial categories including total revenue, total expenses, money allocated to specific sports, coaching salaries, and operating costs. The dataset helps illustrate the financial landscape of collegiate athletics, highlighting the gap between high-earning Power Five schools and smaller programs.

This dataset can be used to analyze spending efficiency, revenue generation by institution size or conference, and the sustainability of athletic departments which directly connects to NIL opportunities, visibility, and resource access for student-athletes.

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("andrewsundberg/college-basketball-dataset")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/andrewsundberg/college-basketball-dataset?dataset_version_number=10...


100%|██████████| 361k/361k [00:00<00:00, 95.7MB/s]

Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10





In [None]:
import os
import pandas as pd

# List files in the downloaded Kaggle dataset directory
print(f"Files in {path}:")
for dirname, _, filenames in os.walk(path):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Assuming a main CSV file exists, let's try to read one.
# If there are multiple, user might need to specify which one.
# For now, I'll pick one if available, or instruct user if ambiguity.

data_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.csv') or f.endswith('.xlsx')]

if data_files:
    # Try to load the first identified data file (e.g., CSV)
    # If there are multiple, this might need refinement based on user intent
    file_to_load = data_files[0]
    print(f"\nLoading file: {file_to_load}")

    if file_to_load.endswith('.csv'):
        df_basketball = pd.read_csv(file_to_load)
    elif file_to_load.endswith('.xlsx'):
        df_basketball = pd.read_excel(file_to_load)
    else:
        print("No suitable data file (CSV or XLSX) found to load.")
        df_basketball = None

    if df_basketball is not None:
        print("\nFirst 5 rows of the basketball dataset:")
        print(df_basketball.head())
else:
    print("No CSV or XLSX files found in the dataset directory.")

Files in /root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10:
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb18.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb13.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb16.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb21.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb17.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb23.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb15.csv
/root/.cache/kagglehub/datasets/andrewsundberg/college-basketball-dataset/versions/10/cbb22.csv
/root/.cache/kagglehub/datasets/andrewsund

**Data Source 3:** NIL Valuations

**Data Explanation:** The NIL Valuations dataset contains estimated earnings for NCAA athletes based on Name, Image, and Likeness (NIL) value. Each row represents an individual athlete, and the dataset includes fields such as athlete name, sport, school, social media following, NIL valuation estimate, number of deals, and sometimes engagement or exposure metrics depending on the file. These values help measure an athlete’s market influence and earning potential within the college sports landscape.

This dataset is useful for identifying which sports generate the highest NIL value, how social media presence impacts earning levels, and what factors correlate with greater sponsorship success. It provides a quantitative foundation for predicting NIL potential, comparing athlete markets, and understanding financial distribution within the NCAA system.

In [None]:
from google.colab import files

uploaded_new = files.upload()

for fn_new in uploaded_new.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn_new, length=len(uploaded_new[fn_new])))

In [None]:
import pandas as pd

# Assuming 'fn_new' contains the name of the recently uploaded file
df_nil = pd.read_excel(fn_new)

print(df_nil.head())

             Name       Year      School NIL Valuation
0     AJ Dybantsa   Freshman         BYU          4.4M
1  Jeremiah Smith  Sophomore  Ohio State          4.2M
2    Arch Manning     Junior       Texas            4M
3       JT Toppin     Junior  Texas Tech          3.3M
4     Carson Beck  RS-Senior       Miami          3.2M


**Data Acquisition and Loading:**

**Data Exploration:**



**Data Cleaning and Preprocessing:**

**Modeling Approach:**

**Model Implementation:**

**Model Evaluation:**

**Conclusions:**

**Recommendations:**

**Dashboard:**