# **Milestone** | College Football Data Analysis

<div style="text-align: center;">
<img src="https://upload.wikimedia.org/wikipedia/en/thumb/c/cf/NCAA_football_icon_logo.svg/2560px-NCAA_football_icon_logo.svg.png" alt="NCAA Football logo" width="200"/>
</div>


## Introduction
Welcome to the world of college football. For this Milestone, you're part of a data-driven sports analytics team, working with a major sports organization or university to help improve the performance of football teams. Your task is to dive into the data of Division I universities with football teams, extracting
insights that just might influence decisions on recruitment, conference performance, and enrollment strategies.

With the college football season in full swing, your work will help the organization answer key questions.

Let's begin by exploring the data at hand.

<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      To import this data, remember to run the following cell.
  </span>
</div>



In [3]:
# load the data
import sys
sys.path.append('../files')

from ncaa_fbs import nickname, enrollment

## Data Exploration: Nicknames and Enrollments

We have two primary datasets: `nickname` and `enrollment`. The `nickname` dictionary provides the university name as keys and their respective team nicknames as values. The `enrollment` dictionary contains the university names as keys and their student enrollments as values.

Here are code snippets of each dicitonary to give you an understanding of their structure.

```python
nickname = {'Air Force': 'Falcons',
            'Akron': 'Zips',
            'Alabama': 'Crimson Tide',
            'Appalachian State': 'Mountaineers',
            'Arizona': 'Wildcats',
            'Arizona State': 'Sun Devils',
            'Arkansas': 'Razorbacks',
            'Arkansas State': 'Red Wolves',
            'Army': 'Black Knights',
            'Auburn': 'Tigers',
            'Ball State': 'Cardinals',
              ...
            'Wisconsin': 'Badgers',
            'Wyoming': 'Cowboys'}
```

The team from `Arizona State`, Dr. Crider's and Dr. Alvarez's alma mater, has the nickname `Sun Devils`.

```python
enrollment  = {'Air Force': 4181,
               'Akron': 14516,
               'Alabama': 38316,
               'Appalachian State': 20641,
               'Arizona': 49471,
               'Arizona State': 77881,
               'Arkansas': 29068,
               'Arkansas State': 12863,
               'Army': 4594,
               'Auburn': 31526,
               'Ball State': 19337,
                 ...
               'Wisconsin': 47932,
               'Wyoming': 11479}
```

The student population at `Arizona State` is `77,881` students.

To get an even better sense of this data, let's examine a few entries. Preview the data for `Purdue`, by passing this *key* to each one of the dictionaries and printing the associated *value*.

In [4]:
# Preview the data for Purdue
print(nickname['Purdue'])       
print(enrollment['Purdue'])

Boilermakers
49639


The nickname for Purdue is "Boilermakers" and have a student population of 49,639 students. Let's move on to
analyzing this information.

### Task 1: Team Count

As a first step, determine the total number of Division I Football Bowl
Subdivision (FBS) teams in our dataset. This will give us a sense of the
scope of our analysis. Store the result in a variable named `n_teams`.

Use an f-string to print out a complete sentence with the number of teams in the Division I FBS.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>Remember that the number of keys in any of the dictionaries will give you the total number of teams.
</span>
</div>





In [5]:
# How many FBS teams are there?
n_teams = len(nickname)
# print the required output
print(f"There are {n_teams} teams in the Division I FBS.")

There are 131 teams in the Division I FBS.


### Task 2: Identifying Team Nicknames

Your team is interested in identifying all universities with the nickname "Tigers". Use a loop to iterate through each team in the `nickname` dictionary. For each team, use a conditional to check `if` their nickname is "Tigers". If so, append the university name to a Python list named `matches`.

Print `matches` to the screen.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>In order to access the school's nickname, you have to pass the name of the school as the key.
</span>
</div>



In [6]:
# set up an empty list for matching universities
matches = []

# loop over universities and append to list of matches when a match is found
for school in nickname:
    if nickname[school] == "Tigers":
        matches.append(school)
        
# print the required output
print(matches)

['Auburn', 'Clemson', 'LSU', 'Memphis', 'Missouri']


<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      If done correctly, your output should be a list containing <strong>5 universities</strong>.
  </span>
</div>



### Task 3: University Enrollment

Next, you'll analyze the enrollment data to identify universities with enrollments smaller than 10,000 and those larger than 50,000 in order to determine if there are more FBS teams with a small student body or more FBS teams with a large student body.

Use a loop to iterate through each team in the `enrollment` dictionary.
*   If a university's enrollment is smaller than 10,000, increase the counting variable named `small_unis` by `1`.
*   If a university's enrollment is larger than 50,000, increase the counting variable named `small_unis` by 1.


In [9]:
# set up counter variables
small_unis = 0
large_unis = 0

# loop over all universities
for school in enrollment:
    if enrollment[school] < 10000:
        small_unis += 1
    elif enrollment[school] > 50000:
        large_unis += 1

# print the required outputs
print(f"Number of universities with <10,000 students: {small_unis}")
print(f"Number of universities with >50,000 students: {large_unis}")


Number of universities with <10,000 students: 7
Number of universities with >50,000 students: 14


Are there more FBS universities with enrollments smaller than 10,000, or with enrollments larger than 50,000?

*Yes, there are twice as many (14) universities with enrollments over 50k than there are universities with less than 10k students*

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> How might university enrollment be relevant to a college football conference when making strategic decisions?
  </span>
</div>

In [10]:
# run this cell to import more data!
from ncaa_fbs import fbs_teams

### Task 4: Conference Enrollment Breakdown

The dictionary variable used in this task is called `fbs_teams`. In it, each university is now associated with a list of values, instead of one dictionary for each value.

For each university, the associated list contains items in this order:
* Index 0: Nickname
* Index 1: City
* Index 2: State
* Index 3: Enrollment
* Index 4: Conference

Here is a snippet of `fbs_teams` for a few entries in the dictionary:

```python
{ ...
 'Arizona State': ['Sun Devils', 'Tempe', 'Arizona', 77881, 'Pac-12'],
 'Arkansas': ['Razorbacks', 'Fayetteville', 'Arkansas', 29068, 'SEC'],
 'Arkansas State': ['Red Wolves', 'Jonesboro', 'Arkansas', 12863, 'Sun Belt'],
  ...
}
```

Just like before, preview the data for one university, `Arizona State`, by passing this *key* to the `fbs_teams` dictionary, and printing the associated *value*.

In [13]:
# preview the data for Arizona State
# print the whole list of values
print(fbs_teams['Arizona State'])
# print just the nickname
print(fbs_teams['Arizona State'][0])
# print just the city
print(fbs_teams['Arizona State'][1])
# print just the state
print(fbs_teams['Arizona State'][2])
# print just the enrollment
print(fbs_teams['Arizona State'][3])
# print just the conference
print(fbs_teams['Arizona State'][4])

['Sun Devils', 'Tempe', 'Arizona', 77881, 'Pac-12']
Sun Devils
Tempe
Arizona
77881
Pac-12


In the code above we can see that Arizona State are called the **Sun Devils**, are located in the city of **Tempe** in the state of **Arizona**. They have a total student population of **77,881**, and play in the **Pac-12** conference!

We'll use this dictionary to answer the following questions:
1. **Does the Big 12 conference actually have twelve schools?**  
2. **What is the total enrollment across these Big 12 universities?**
3. **What is the average enrollment across these Big 12 universities?**


One approach to answering these questions is to store the enrollment for each school in a `list`, called `students`, defined below.

Then the provided print statement will use different functions to calculate the values needed. Remember, each entry in `students` will correspond to a single university. It's `len`gth will be the number of universities in the Big 12. The `sum` will be the total enrolllment in the Big 12 conference. And the average enrollment can be calculated by combining the `len` and `sum` functions.

Use the code template below to complete Task 4.

In [15]:
# approach 1: using a list followed by summary functions
students = []

# loop over universities, and append enrollments to the list
for school in fbs_teams:
    if fbs_teams[school][4] == 'Big 12':
        students.append(fbs_teams[school][3])

# print the statistics
print(f"Number of Big 12 universities: {len(students)}")
print(f"Total enrollment in Big 12: {sum(students)}")
print(f"Average enrollment in Big 12: {sum(students) / len(students)}")

Number of Big 12 universities: 10
Total enrollment in Big 12: 280990
Average enrollment in Big 12: 28099.0


<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      If done correctly, you will have **10** schools in the Big 12 conference. Weird, right?!?!
  </span>
</div>



A different approach is to use **accumulator** variables. An accumulator variable is a variable that is modified in each iteration of the loop. For example, if you are storing the number of students for each university, outside of the loop you will define `n_students = 0` but within the loop you will increment the value of `n_students` for every university that matches your criteria. Incrementing and assigning back to the same variable can be done with the `+=` operator.

In the code below, you have two accumulator variables, `n_students` and `n_universities` both set to zero intially. For each item in your loop that matches the criteria you will update the `n_students` variable based on the enrollment at that university and increment the `n_universities` by `1`.

The print statement provided at the end will show the results in the cell output.

In [16]:
# approach 2: using two numeric variables to accumulate the results
n_students = 0
n_universities = 0

# loop over universities, and accumulate values
for school in fbs_teams:
    if fbs_teams[school][4] == 'Big 12':
        n_students += fbs_teams[school][3]
        n_universities += 1


# print the results
print(f"Number of Big 12 universities: {n_universities}")
print(f"Total enrollment in Big 12: {n_students}")
print(f"Average enrollment in Big 12: {n_students / n_universities}")

Number of Big 12 universities: 10
Total enrollment in Big 12: 280990
Average enrollment in Big 12: 28099.0


<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> I explored two distinct methods for aggregating data: using a list to store values and then calculating statistics versus using accumulator variables within a loop. Compare and contrast the two approaches. Explain a scenario where one approach might be preferred over the other.
  </span>
</div>

## Task 5: Highlighting Areas for Further Investigation

Based on your initial findings, what are 2-3 key questions or areas that the athletic department should explore further? What additional data would be valuable to gather?

Remember, you can use ChatGPT to brainstorm and structure your recommendations in a clear, professional manner!


* Questions: 
Does enrollment size correlate with football team performance or recruitment success?

How does conference size and average enrollment affect media exposure and revenue?

What is the geographic or demographic diversity of the student body in relation to team support?

* Additional data to gather: 

Win/loss records for each team over the past 5 years

Recruitment budgets and outcomes

Athletic department revenue by sport

Geographic distribution of student enrollment


## LevelUp

Yet another way of storing the Division I universities information is by using a dictionary with *dictionaries* for values. In the task above you had a dictionary with a *list* for values. To get to the desired value, you would use the appropriate index.

In the case of a dictionary with dictionary values, you would instead use the *key* associated with the desired value.

A snippet of the `fbs_teams_dict` would look like:

```python
{...
 'Arizona': {'nickname': 'Wildcats',
             'city': 'Tucson',
             'state': 'Arizona',
             'enrollment': 49471,
             'conference': 'Pac-12'},
 'Arizona State': {'nickname': 'Sun Devils',
                   'city': 'Tempe',
                   'state': 'Arizona',
                   'enrollment': 77881,
                   'conference': 'Pac-12'},
 'Arkansas': {'nickname': 'Razorbacks',
              'city': 'Fayetteville',
              'state': 'Arkansas',
              'enrollment': 29068,
              'conference': 'SEC'},
...
```

<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      To import this data, run the following cell.
  </span>
</div>

In [17]:
# import data
from ncaa_fbs import fbs_teams_dict

Compare this to the `fbs_teams` structure you used for Task 4. Store all the enrollment for Purdue University in a variable called `purdue_data`. Print `purdue_data` to the screen.


In [18]:
# Print out all the enrollment for Purdue University
purdue_data = fbs_teams_dict['Purdue']
print(purdue_data)

{'nickname': 'Boilermakers', 'city': 'West Lafayette', 'state': 'Indiana', 'enrollment': 49639, 'conference': 'Big Ten'}


Notice that 'purdue_data' is itself a dictionary. If we want to get the enrollment for Purdue University, we still have 'Purdue' as the outer dictionary key, but now we specify the `'enrollment'` feature for the inner dictionary key:


In [20]:
# use the appropriate key to get the enrollment for Purdue University.
# recall that before we used the appropriate *index* to get the enrollment.
print(purdue_data['enrollment'])

49639


Your task is to replicate the conference enrollment analysis from Task 4, but using the `fbs_teams_dict` structure. This exercise will reinforce your understanding of nested dictionaries and data manipulation.

Specifically, calculate the following for the 'Big 12' conference:

1.  The number of universities.
2.  The total student enrollment.
3.  The average student enrollment.


<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Note: </strong>The key order is really important when accessing nested dictionaries! Attempting to access the conference first will result in an error!!
</span>
</div>



In [21]:
# Task 4 -- approach 2: using two numeric variables to accumulate the statistics
n_students = 0
n_universities = 0

# loop over universities, and accumulate values
for school in fbs_teams_dict:
    if fbs_teams_dict[school]['conference'] == 'Big 12':
        n_students += fbs_teams_dict[school]['enrollment']
        n_universities += 1

# print the statistics
print(f"Number of Big 12 universities: {n_universities}")
print(f"Total enrollment in Big 12: {n_students}")
print(f"Average enrollment in Big 12: {n_students / n_universities}")


Number of Big 12 universities: 10
Total enrollment in Big 12: 280990
Average enrollment in Big 12: 28099.0
