# Analyzing The Data

So far we have a lot of game data and very little understanding of it. So let's try to make some nice looking graphs and look at all of our variables, to see what the most important factors are when we look at the bigger picture. 

## Stating the obvious

First let's state the obvious, the factor which will always determine if a team wins or not is whoever has the most points. 

Therefore we can easily state that we're trying to find which team will have the most points at the end of a match.

Let's look at a small number of game recordings. In this case we'll be looking at eight games at a time. eight games of the same team to see what we can see from their wins and losses.(eight games of one team going against eight unique teams) Then another set of eight games, this time with different teams, all of which should appear once in the set. (eight games with a total of ten unique teams) to see what factors might be affecting them.

This should give us a nice place to start.

### The Atlanta Hawks

For our first set of analyzing one team's performance against different teams, we'll be looking at the Atlanta Hawks since their name is sorted first alphabetically making it easier to find. We'll look at:

- two games at Home where they lost,
- two games at Home and won,
- two games Away and won,
- two where Away and lost

This is the reason we chose eight games, so we can look at more than one outcome from the two different environments the teams will be playing in.

In [22]:
# Initialize imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [23]:
# Initialize DataFrames

# Home wins
## ATL vs GSW
ATL_GSW = pd.read_csv('../finalized_scripts/datasets/game_data/atlanta-hawks-golden-state-warriors-2024-02-04.csv')
## ATL vs LAL
ATL_LAL = pd.read_csv('../finalized_scripts/datasets/game_data/atlanta-hawks-los-angeles-lakers-2024-01-31.csv')

# Home losses
## ATL vs CLV
ATL_CLV = pd.read_csv('../finalized_scripts/datasets/game_data/atlanta-hawks-cleveland-cavaliers-2024-01-21.csv')
## ATL vs MEM
ATL_MEM = pd.read_csv('../finalized_scripts/datasets/game_data/atlanta-hawks-memphis-grizzlies-2023-12-24.csv')

# Away wins
## ATL vs MIA
MIA_ATL = pd.read_csv('../finalized_scripts/datasets/game_data/miami-heat-atlanta-hawks-2024-01-20.csv')
## ATL vs MIL
MIL_ATL = pd.read_csv('../finalized_scripts/datasets/game_data/milwaukee-bucks-atlanta-hawks-2023-12-03.csv')

# Away losses
## ATL vs BOS
BOS_ATL = pd.read_csv('../finalized_scripts/datasets/game_data/boston-celtics-atlanta-hawks-2024-02-08.csv')
## ATL vs CHA
CHA_ATL = pd.read_csv('../finalized_scripts/datasets/game_data/charlotte-hornets-atlanta-hawks-2023-10-26.csv')


For those not well versed in American city names and basketball teams let me define what all these acronyms mean:

- ATL: Atlanta (Hawks)
<br><br>
- GSW: Golden State Warriors (San Francisco)
<br><br>
- LAL: Los Angeles (Lakers)
<br><br>
- CLV: Cleveland (Cavaliers)
<br><br>
- MEM: Memphis (Grizzlies)
<br><br>
- MIA: Miami (Heat)
<br><br>
- MIL: Milwaukee (Bucks)
<br><br>
- BOS: Boston (Celtics)
<br><br>
- CHA: Charlotte (Hornets)

Now that we're all on the same page let's try comparing these sets. We'll compare one game of each result against either a matching environment or the opposing environment.

Since we all know that more points equals win, we're gonna exclude the 2M,3M,1M column. This is because knowing how many field goals, three pointers and free throws a team made will all add up to their total points.

Instead I would like to look at the 2A,3A,1A since these are all attempts, as well every other variable like rebounds, assists, blocks, fouls, +/- (which I will get into what that statistic means), Eff(also very interesting) and To. (Turnovers)

In [24]:
ATL_GSW_home = ATL_GSW[ATL_GSW['Team'] == 'AtlantaHawks']
ATL_GSW_away = ATL_GSW[ATL_GSW['Team'] == 'GoldenStateWarriors']

ATL_CLV_home = ATL_CLV[ATL_CLV['Team'] == 'AtlantaHawks']
ATL_CLV_away = ATL_CLV[ATL_CLV['Team'] == 'ClevelandCavaliers']

In [31]:

ATL_GSW

Unnamed: 0,PLAYER,Pts,Reb,2M,2A,3M,3A,1M,1A,Ast,...,1%,Or,Dr,To,Stl,Blk,Fo,+/-,Eff,Team
0,TraeYoung,35,0,5,10,7,11,4,5,6,...,80.0,0,0,3,1,0,3,5,29,AtlantaHawks
1,OnyekaOkongwu,22,16,6,8,2,4,4,4,0,...,100.0,6,10,0,0,2,4,9,36,AtlantaHawks
2,JalenT.Johnson,21,13,7,14,2,6,1,1,8,...,100.0,3,10,0,0,1,3,-3,32,AtlantaHawks
3,DejounteMurray,19,5,9,18,0,6,1,1,7,...,100.0,0,5,2,1,0,5,4,15,AtlantaHawks
4,ClintCapela,17,15,8,11,0,0,1,2,2,...,50.0,6,9,1,0,0,0,4,29,AtlantaHawks
5,BogdanBogdanovic,13,4,3,6,2,10,1,1,4,...,100.0,0,4,2,2,0,3,1,10,AtlantaHawks
6,GarrisonMathews,8,3,1,1,1,4,3,3,0,...,100.0,1,2,0,0,1,1,14,9,AtlantaHawks
7,De&#039;AndreHunter,6,1,2,3,0,2,2,5,0,...,40.0,0,1,0,1,0,1,-5,2,AtlantaHawks
8,PattyMills,0,0,0,0,0,0,0,0,0,...,0.0,0,0,0,0,0,0,6,0,AtlantaHawks
9,AtlantaHawks,141,58,41,71,14,43,17,22,27,...,0.0,16,42,10,5,4,20,35,162,AtlantaHawks


In [26]:
ATL_GSW_home['FG%'].mean()

43.85

In [27]:
ATL_GSW_away['FG%'].mean()

35.89090909090909

In [28]:
three_point_GSW = ATL_GSW_away['3M'].sum()
three_pointA_GSW = ATL_GSW_away['3A'].sum()
(three_point_GSW/three_pointA_GSW)*100

32.142857142857146

In [29]:
three_point_ATL = ATL_GSW_home['3M'].sum()
three_pointA_ATL = ATL_GSW_home['3A'].sum()
(three_point_ATL/three_pointA_ATL)*100


32.55813953488372