# **NBA - Team and players analysis - season 2017**

#### For my first project on Kaggle, i searched a dataset on a subject i am passionate about: The NBA.
#### Here, i will be looking to analyze teams and try to understand what makes them valueable. Later on, i will analyse players, their performance and more.

#### On a personal level, I'm doing this to practice both my vizualisation and my analysis skills in Python. I'm still learning, so any recommendation would be of great value for me.

#### Hope you like it!

## Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Palette

In [None]:
color = sns.diverging_palette(220, 20, as_cmap=True)
color

## Team valuations (2017)


### What makes a team valuable?
#### Possible answers to check:
##### 1. Their actual performance.
##### 2. Their past performance.
##### 3. The number of people who attend the games.
##### 4. Their players.

In [None]:
team_valuation = pd.read_csv('../input/social-power-nba/nba_2017_team_valuations.csv')
team_valuation.head()

In [None]:
fig, ax = plt.subplots(figsize=(16,9))

sns.set(font_scale = 1.5)

sns.scatterplot(y=team_valuation['TEAM'], x=team_valuation['VALUE_MILLIONS'],
                palette=color,
                ec='black',linewidth=1,
                hue=team_valuation['VALUE_MILLIONS'],
                size=team_valuation['VALUE_MILLIONS'],
                sizes=(100,300)).set_title('NBA Team Valuations 2017')

ax.legend(title='Value in millions',bbox_to_anchor= (1,1))

for s in ['top','right','bottom','left']:
    ax.spines[s].set_visible(False)

### Most teams values are between 750 and 1500 millions. Only 2 teams value 3000 or more millions.

## ELO:
### Elo: "How do you rate an NBA team across decades of play? One method is Elo, a simple measure of strength based on game-by-game results. We calculated Elo ratings for every NBA (and ABA) franchise after every game in history — over 60,000 ratings in total".

### "Elo ratings have a simple formula; the only inputs are the final score of each game, and where and when it was played. Teams always gain Elo points for winning. But they get more credit for upset victories and for winning by larger margins. Elo ratings are zero-sum, however. When the Houston Rockets gained 49 Elo points by winning the final three games of their Western Conference semifinal during the 2014-15 playoffs, that meant the Los Angeles Clippers lost 49 Elo points".

##### source: https://projects.fivethirtyeight.com/complete-history-of-the-nba/#lakers

In [None]:
elo = pd.read_csv('../input/social-power-nba/nba_2017_elo.csv')
elo.head()

In [None]:
elo_val_att_ = pd.read_csv('../input/social-power-nba/nba_2017_att_val_elo.csv')
elo_val_att_.head(10)

## Value vs. ELO

### Are teams with a higher ELO more valuable?

In [None]:
fig, ax = plt.subplots(figsize=(16,6))

sns.scatterplot(data=elo_val_att_,
               x='ELO',
               y='VALUE_MILLIONS',
               palette=color,
               ec='black',linewidth=1,
               hue=elo_val_att_['VALUE_MILLIONS'],
               size=elo_val_att_['VALUE_MILLIONS'],
               sizes=(100,300))

ax.legend(title='Value in millions',bbox_to_anchor= (1,1))
plt.show()



### The graph shows that not necessarily the teams with the highest ELO are the ones that are worth the most. Although you have to keep in mind that:

##### The ELO, is a historical score.So, there may be teams that historically did very well, but in recent years very poorly, or vice versa, and with that their valuation was framed in these historical results.

##### The age factor of each team is also a key variable. There are teams that, having played for many more years, will have a higher or lower ELO score (depending on their performance), but with a higher sample size than younger teams.

##### Two of the highest rating teams are in the worst ELO. These are the Lakers, and the Knicks. Which goes hand in hand with our theory, teams that because of their location or history are valuable, and not because of their current performance.

In [None]:
elo_val_att_.sort_values(by='ELO', ascending=False).head()

## Attendance vs Value

### This shows something interesting. The teams that are worth the most, tend to be the ones that have the biggest audiences. Which makes sense from a business perspective. What doesn't make so much sense is why a team that doesn't do well, that plays poorly, gets so many people, despite its performance.

#### Why does this happen? Why do people go to see bad teams?
#### Possible reasons:
##### 1. A tourist issue. Being historical teams, tourists don't care about the result, but about the show and the fact of visiting to those stadiums.


In [None]:
fig, ax = plt.subplots(figsize=(19,7))

sns.scatterplot(data=elo_val_att_,
               x='TOTAL',
               y='VALUE_MILLIONS',
               palette = color,
               ec='black',linewidth=1,
               hue=elo_val_att_['VALUE_MILLIONS'],
               size='VALUE_MILLIONS',
               sizes=(100,300))

plt.axhline(np.mean(elo_val_att_['VALUE_MILLIONS']))


# ELO vs Attendence

In [None]:
fig, ax = plt.subplots(figsize=(19,7))

sns.scatterplot(data=elo_val_att_,
               x='TOTAL',
               y='ELO',
               palette = color,
               ec='black',linewidth=1,
               hue=elo_val_att_['VALUE_MILLIONS'],
               size='VALUE_MILLIONS',
               sizes=(100,300))

plt.axhline(np.mean(elo_val_att_['ELO']))

### What we can see here is that for a team to bring people to their stadium, it is not necessary to play well (high ELO).  

# Salary

## Here, I analyze salaries to see if theres is a correlation between the teams that spend the most, and other variables such as ELO and Attendance and the team's Valuation.

In [None]:
salary = pd.read_csv('../input/social-power-nba/nba_2017_salary.csv')
salary = salary.sort_values(by='TEAM')

In [None]:
fig, ax = plt.subplots(figsize =(16,9))

sns.barplot(data = salary,
           y = 'TEAM',
           x = 'SALARY',
           color = ('#fdbb30'))

plt.title('Distribution of Team Salaries')

In [None]:
# I group the data so i can concatenate it.

ts = salary.groupby(['TEAM']).sum()
ts.head()

In [None]:
#I order it in alphatecic order, so i can concatenate them.

t = elo_val_att_.sort_values(by='TEAM')
t.head()

In [None]:
#Union of df

concatenated = pd.concat([t.reset_index(drop=True), ts.reset_index(drop=True)], axis=1)
concatenated.head()

## ELO, Salary and Team Valuation

In [None]:
fig, ax = plt.subplots(figsize=(19,7))

sns.scatterplot(y='SALARY', x='ELO', data=concatenated,
                hue=concatenated['VALUE_MILLIONS'],
                size='VALUE_MILLIONS',
                palette = color,
                ec = 'black',linewidth=1,
                sizes=(100,300),
                legend='brief')

plt.title('ELO distribution according to salaries', fontsize=20)
plt.axhline(np.mean(concatenated['SALARY']))



### Here we can see that just one of the most valuable teams have a salary higher that the average. The outlier, the team that has more than 1700 ELO points, the Golden State Warriors, has a salary slighter less than the average.

### Of the top 5 of the best teams:
#### - Just 2 have a salary higher than the average.
#### - Just 2 have a valuation higher than the average.

In [None]:
mean1 = np.mean(concatenated['VALUE_MILLIONS'])
mean1

## Attendace, Salaries and Team Valuation

In [None]:
fig, ax = plt.subplots(figsize=(19,7))

sns.scatterplot(data=concatenated,
               y = 'SALARY',
               x = 'TOTAL',
               hue=concatenated['VALUE_MILLIONS'],
                 palette = color,
                ec = 'black',linewidth=1,
                size='VALUE_MILLIONS',
                sizes=(100,300)
               )

plt.title('Total Attendance distribution according to salaries', fontsize=20)
plt.axhline(np.mean(concatenated['SALARY']))


### We can see that, of the top 10 of the teams that bring more public to the court:
#### - 3 have a valuation higher than the average.
#### - 2 have a salary higher than the average.
#### - The team that pays the least salary is here.
#### - The team that pays the most salary is here.

In [None]:
Top_10_Attendance = concatenated.sort_values(by = 'TOTAL', ascending=False).head(10)
Top_10_Attendance

In [None]:
Top_10_Values = concatenated.sort_values(by = 'VALUE_MILLIONS', ascending=False).head(10)
Top_10_Values


In [None]:
concatenated.ELO.mean()

In [None]:
concatenated['VALUE_MILLIONS'].corr(concatenated['ELO'])

In [None]:
concatenated['VALUE_MILLIONS'].corr(concatenated['TOTAL'])

In [None]:
concatenated['ELO'].corr(concatenated['SALARY'])

# Conclusion

### Team´s valuation is not determined by how good a team plays (ELO). The best way to predict the value of a team, is by looking at the total attendance the team brings. Although not all teams that attract people have a high valuation, for the team to be valuable, it needs to have a high audience.

### Audience, the audience, for its part, does not depend on the performance of the team. There are very good teams that bring the same or fewer people to their stadium than there are teams that play badly.

### What can be seen is that the audience is something that depends on the city. and among the variables of the cities, which affect the atendence, are tourism and the number of people who live in the state, they affect this variable. As you can see in this link (https://www.infoplease.com/us/states/state-population-by-rank), the four largest states coincide with the location of the most valuable teams: New york, Miami, Los Angeles, San Francisco, Chicago, Texas. The only exception to this rule is the Boston Celtics, who don't have a large crowd but are in the top 10 of the most valuable teams.

### Attendance, which is derived from the location, is the most important variable when evaluating a basketball team.


