# Exploratory Data Analysis

## Audience/Stakeholders

- Clearly identify who your are writing your final report for

The

## Problem Statement

- Create a concise and compelling problem or question that guides your analysis. Example
  - "The men's basketball team had a worse performance this year. Did preparation around games change? How did playerload, jumps, high accelerations, and change of direction compare before games over the current season and the previous season?"

The

Ian Proposals: 

1. As the season progressed, did UVA's men's basketball team’s physical output - measured by PlayerLoad, jumps, accelerations, and changes of direction — display any noticeable trends? Did these trends correlate with shifts in game performance or outcomes? 

2. Were periods of elevated team workload throughout the season followed by declines in on-court performance metrics (shooting percentage, turnovers, defensive rating, etc.)?

Brian Proposal:

1. The UVA men's basketball team lost multiple games which they led at halftime over the last 2 seasons, especially later in the seasons. Did the team's physical exertion, as measured by _, _, and _, decrease in the second halves of games? How did second half physical performance differ throughout the season?


## Important Variables

- List which ones are important for your analysis and why.

The

## Merging and Cleaning the Dataset

- Clean the data: Remove duplicates, handle missing values, correct data types

- Your final dataset should include only variables relevant to your problem

In [24]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Loading in data from season 1 and 2
s1 = pd.read_csv("../data/catapult season 1.csv")
s2 = pd.read_csv("../data/catapult season 2.csv")

# Adding column with season to each dataset
s1["Season"] = 1
s2["Season"] = 2

# Combining the two seasons of data into a single data frame
data = pd.concat([s1, s2])
# Ignoring the UserWarning raised by the above line. The warning raised tells me that not setting a specific format or handing errors could be problematic, but further
# examination of the Date column showed that is not, so I am hiding the warning from printing
warnings.simplefilter(action = "ignore", category = UserWarning)

# Converting date column to datetime
data["Date"] = pd.to_datetime(data["Date"])

data.head(1)

Unnamed: 0,Date,About,Position,Period Number,Period,Total Acceleration Efforts,Total Player Load,Player Load Per Minute,IMA Accel Low,IMA Decel Low,...,Session Total Jump,Session Jumps Per Minute,Total CoD Left,Total CoD Right,Total High IMA,Total IMA,IMA/Min,event-uuid,group-uuid,Season
0,2023-03-14,Athlete I,Guard,1,1. Pre Practice,0,87.437,4.1,3,17,...,95.0,1.05,269.0,306.0,89.0,899.0,,c4e1f0fe-b87a-42ca-8f41-b5b0e4cdfab3,c4e1f0fe-b87a-42ca-8f41-b5b0e4cdfab3,1


In [25]:
dg_filter1 = data[data["Period"].str.contains("Period")]
dg_filter2 = dg_filter1[dg_filter1["Period"].str.contains("Play") == False]
dg_filter2.loc[dg_filter2["Period"].str.contains("Period 1"), "Period"] = "Period 1"
dg_filter2.loc[dg_filter2["Period"].str.contains("Period 2"), "Period"] = "Period 2"
dg_filter2.loc[dg_filter2["Period"].str.contains("Period 3"), "Period"] = "Period 3"
data_games_filter2 = dg_filter2[dg_filter2 ["Period"].str.contains("Auto") == False]

games = dg_filter2
games.head()

Unnamed: 0,Date,About,Position,Period Number,Period,Total Acceleration Efforts,Total Player Load,Player Load Per Minute,IMA Accel Low,IMA Decel Low,...,Session Total Jump,Session Jumps Per Minute,Total CoD Left,Total CoD Right,Total High IMA,Total IMA,IMA/Min,event-uuid,group-uuid,Season
273,2023-03-11,Athlete H,Guard,2,Period 1,0,301.568,12.8,25,25,...,,,,,,,,6e1547d1-89bb-42fa-9104-2147a68ae4ca,6e1547d1-89bb-42fa-9104-2147a68ae4ca,1
274,2023-03-11,Athlete H,Guard,3,Period 2,0,351.628,10.2,26,26,...,,,,,,,,6e1547d1-89bb-42fa-9104-2147a68ae4ca,6e1547d1-89bb-42fa-9104-2147a68ae4ca,1
276,2023-03-11,Athlete J,Forward,2,Period 1,0,317.828,10.7,54,48,...,,,,,,,,6d5ee637-9c93-4a4e-9d43-43ec7a09bac4,6d5ee637-9c93-4a4e-9d43-43ec7a09bac4,1
277,2023-03-11,Athlete J,Forward,3,Period 2,0,270.702,10.9,37,32,...,,,,,,,,6d5ee637-9c93-4a4e-9d43-43ec7a09bac4,6d5ee637-9c93-4a4e-9d43-43ec7a09bac4,1
279,2023-03-11,Athlete B,Guard,2,Period 1,0,59.454,12.6,5,9,...,,,,,,,,d8bb1440-3849-41f3-b47b-5acc7fb521b4,0320909a-5222-4549-b23d-451f0e645c00,1


In [26]:
games.groupby(["Date", "Period"])[["Total Player Load", "Player Load Per Minute"]].agg(['mean'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Total Player Load,Player Load Per Minute
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,mean
Date,Period,Unnamed: 2_level_2,Unnamed: 3_level_2
2022-11-07,Period 1,189.095333,9.066667
2022-11-07,Period 2,206.104000,9.600000
2022-11-11,Period 1,259.719222,12.755556
2022-11-11,Period 2,206.403100,12.650000
2022-11-18,Period 1,233.654222,12.100000
...,...,...,...
2024-03-01,1. AutoCreatedPeriod,3.137000,0.200000
2024-03-02,Period 1,170.847600,10.730000
2024-03-02,Period 2,146.909091,10.872727
2024-03-09,Period 1,217.625875,13.712500


## Descriptive Statistics & Distributions

- Provide Summaries of important variables

- Use visualizations to explore distributions

## Examine Correlations (If Relevant)

- Interpret Findings: What variables appear related?

## Explore Relationship (If Relevant)

- Dig into potential causal or descriptive relationships

- Use visualizations and statistical summaries