<a href="https://colab.research.google.com/github/noahgift/core-stats-datascience/blob/master/data_science_workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Walking through Social Power NBA Data Science Project

* *[Read related material covered in Chapter 6 of Pragmatic AI](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/ch06.xhtml#ch06)*

* *[Watch Video Lesson 9:  Walking through Social Power NBA EDA and ML Project](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_00)*

**Topics Covered**


* Data Collection Sources
* Importing and merging DataFrames in Pandas 
* Creating correlation heatmaps 
* Using seaborn lmplot 
* Using linear regression in Python
* Using ggplot in Python 
* Doing KMeans clustering 
* Doing PCA with scikit-learn 
* Doing ML classification prediction with scikit-learn 
* Doing ML Regression prediction with scikit-learn 
* Using Plotly for interactive Data Visualization



#### Data Collection Sources 

* *[Watch Video Lesson 9.1:  Data Collection of Social Media Data](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_01)*

![Collection of Data](https://user-images.githubusercontent.com/58792/40758183-e64ba7c4-6440-11e8-97c5-c408e0bc321e.png)

**Twitter Code:**

https://github.com/noahgift/socialpowernba/blob/master/socialpower/sptwitter.py

**Wikipedia Code:**

https://github.com/noahgift/socialpowernba/blob/master/socialpower/spwikipedia.py

#### Import and merge DataFrames in Pandas

* *[Watch Video Lesson 9.2:  Import and merge DataFrames in Pandas](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_02)*

In [0]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

In [0]:
attendance_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_attendance.csv");attendance_df.head()

In [0]:
endorsement_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_endorsements.csv");endorsement_df.head()

In [0]:
valuations_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_team_valuations.csv");valuations_df.head()

In [0]:
salary_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_salary.csv");salary_df.head()

In [0]:
pie_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_pie.csv");pie_df.head()

In [0]:
plus_minus_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_real_plus_minus.csv");plus_minus_df.head()

In [0]:
br_stats_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_br.csv");br_stats_df.head()

In [0]:
elo_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_elo.csv");elo_df.head()

### Exploratory Data Analysis (EDA)

In [0]:
attendance_valuation_df = attendance_df.merge(valuations_df, how="inner", on="TEAM")

In [0]:
attendance_valuation_df.head()

#### Understand correlation heatmaps and pairplots

Exploratory Data Analysis and Feature Engineering

* [Watch Video Lesson 9.3:  Understand correlation heatmaps and pairplots](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_03)


In [0]:
attendance_valuation_df.corr()

**Correlation Heatmap**

In [0]:
corr = attendance_valuation_df.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)

**Correlation DataFrame Output**

In [0]:
corr

**Creating a Pivot Table Based Heatmap in Seaborn**

A few patterns are detected:  Look at the *three highest valued signals*

In [0]:
valuations = attendance_valuation_df.pivot("TEAM", "TOTAL_MILLIONS", "VALUE_MILLIONS")

In [0]:
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("NBA Team AVG Attendance vs Valuation in Millions:  2016-2017 Season")
sns.heatmap(valuations,linewidths=.5, annot=True, fmt='g')

#### Using linear regression in Python

There is a signal here, attendence and valuation do seem to be related, but residual values look non-uniform.

* *[Watch Video Lesson 9.4:  Use linear regression in Python](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_04)*

In [0]:
results = smf.ols('VALUE_MILLIONS ~TOTAL_MILLIONS', data=attendance_valuation_df).fit()

In [0]:
print(results.summary())

In [0]:
sns.residplot(y="VALUE_MILLIONS", x="TOTAL_MILLIONS", data=attendance_valuation_df)

In [0]:
attendance_valuation_predictions_df = attendance_valuation_df.copy()

In [0]:
attendance_valuation_predictions_df["predicted"] = results.predict()
attendance_valuation_predictions_df

#### Use seaborn lmplot to plot predicted vs actual values



In [0]:
sns.lmplot(x="predicted", y="VALUE_MILLIONS", data=attendance_valuation_predictions_df)

##### Generating a RMSE (Root Mean Squared Error Prediction)

In [0]:
import statsmodels
rmse = statsmodels.tools.eval_measures.rmse(attendance_valuation_predictions_df["predicted"], attendance_valuation_predictions_df["VALUE_MILLIONS"])
rmse

#### Adding ELO (Strength of Schedule Ranking to DataFrame)

In [0]:
attendance_valuation_elo_df = attendance_valuation_df.merge(elo_df, how="inner", on="TEAM")

In [0]:
attendance_valuation_elo_df.head()

In [0]:
corr_elo = attendance_valuation_elo_df.corr()
plt.subplots(figsize=(10,5))
ax = plt.axes()
ax.set_title("NBA Team Correlation Heatmap:  2016-2017 Season (ELO, AVG Attendance, VALUATION IN MILLIONS)")
sns.heatmap(corr_elo, 
            xticklabels=corr_elo.columns.values,
            yticklabels=corr_elo.columns.values)

In [0]:
corr_elo

In [0]:
ax = sns.lmplot(x="ELO", y="TOTAL_MILLIONS", data=attendance_valuation_elo_df, hue="CONF", size=6)
ax.set(xlabel='ELO Score', ylabel='TOTAL ATTENDANCE IN MILLIONS', title="NBA Team AVG Attendance vs ELO Ranking:  2016-2017 Season")

In [0]:
attendance_valuation_elo_df.groupby("CONF")["ELO"].median()


In [0]:
attendance_valuation_elo_df.groupby("CONF")["TOTAL_MILLIONS"].median()

In [0]:
results = smf.ols('TOTAL_MILLIONS ~ELO', data=attendance_valuation_elo_df).fit()


In [0]:
print(results.summary())
      


In [0]:
val_housing_win_df = pd.read_csv("https://raw.githubusercontent.com/noahgift/socialpowernba/master/data/nba_2017_att_val_elo_win_housing.csv");val_housing_win_df.head()

In [0]:
val_housing_win_df.columns

In [0]:
results = smf.ols('VALUE_MILLIONS ~COUNTY_POPULATION_MILLIONS+TOTAL_ATTENDANCE_MILLIONS+MEDIAN_HOME_PRICE_COUNTY_MILLIONS', data=val_housing_win_df).fit()
print(results.summary())

#### Using ggplot in Python

* *[Watch Vidoe Lesson 9.5:  Use ggplot in Python](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_05)*

In [0]:
!pip -q install ggplot

In [0]:
from ggplot import *
ggplot(val_housing_win_df, aes(x="TOTAL_ATTENDANCE_MILLIONS", y="VALUE_MILLIONS",
                               color="WINNING_SEASON")) + geom_point(size=400)

#### Use k-means clustering

**Unsupervised Machine Learning**

*   Unlabeled Data
*   "Discovers" Labels
*  Finds Hidden Patterns


**References:**



* *[Watch Video Lesson 9.6:  Use k-means clustering](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_06)*
*  [Read Chapter 6:  Pragmatic AI: Social Power and Influence](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/ch06.html#ch06) 
*   [Python Machine Learning](https://www.safaribooksonline.com/library/view/Python+Machine+Learning+-+Second+Edition/9781787125933/ch11.html#ch11lvl2sec114) Cluster and Silhoutte plot examples.



*NBA Season Faceted Cluster Plot *

![Discovering Clusters in the NBA](https://user-images.githubusercontent.com/58792/40759110-6a93a2f8-6445-11e8-980b-ecbb1a2cc029.png)

#### Data Preparation for Clustering

* Clustering on four columns:  Attendence, ELO, Valuation and Median Home Prices
* Scaling the data


In [0]:
numerical_df = val_housing_win_df.loc[:,["TOTAL_ATTENDANCE_MILLIONS", "ELO", "VALUE_MILLIONS", "MEDIAN_HOME_PRICE_COUNTY_MILLIONS"]]

In [0]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
print(scaler.fit(numerical_df))
print(scaler.transform(numerical_df))

In [0]:
from sklearn.cluster import KMeans
k_means = KMeans(n_clusters=3)
kmeans = k_means.fit(scaler.transform(numerical_df))
val_housing_win_df['cluster'] = kmeans.labels_
val_housing_win_df.head()

In [0]:
# Yellowbrick method
from yellowbrick.cluster import KElbowVisualizer

k_means = KMeans()
visualizer = KElbowVisualizer(kmeans, k=(3,12))

visualizer.fit(scaler.transform(numerical_df))    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data

 Elbow method shows that 3 clusters is decent choice

In [0]:
distortions = []
for i in range(1, 11):
    km = KMeans(n_clusters=i,
            init='k-means++',
            n_init=10,
            max_iter=300,
            random_state=0)
    km.fit(scaler.transform(numerical_df))
    distortions.append(km.inertia_)
    
plt.plot(range(1,11), distortions, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Distortion')
plt.title("Team Valuation Elbow Method Cluster Analysis")
plt.show()

##### Silhouette Plot



In [0]:
km = KMeans(n_clusters=3,
            init='k-means++',
            n_init=10,
            max_iter=300,
            random_state=0)
y_km = km.fit_predict(scaler.transform(numerical_df))

In [0]:
import numpy as np
from matplotlib import cm
from sklearn.metrics import silhouette_samples
cluster_labels = np.unique(y_km)
n_clusters = cluster_labels.shape[0]
silhouette_vals = silhouette_samples(scaler.transform(numerical_df),
                                     y_km,
                                     metric='euclidean')
y_ax_lower, y_ax_upper = 0, 0
yticks = []
for i, c in enumerate(cluster_labels):
    c_silhouette_vals = silhouette_vals[y_km == c]
    c_silhouette_vals.sort()
    y_ax_upper += len(c_silhouette_vals)
    color = cm.jet(float(i)/n_clusters)
    plt.barh(range(y_ax_lower, y_ax_upper), c_silhouette_vals, height=1.0, edgecolor='none',color=color)
    yticks.append((y_ax_lower + y_ax_upper)/2)
    y_ax_lower += len(c_silhouette_vals)
silhouette_avg = np.mean(silhouette_vals)
plt.axvline(silhouette_avg,
            color="red",
            linestyle="--")
plt.yticks(yticks, cluster_labels + 1)
plt.ylabel('Cluster')
plt.xlabel('Silhouette coefficient')
plt.title('Silhouette Plot Team Valuation')
plt.figure(figsize=(20,10))
plt.show()

##### Agglomerative clustering (Hierachial) vs KMeans clustering


In [0]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 3))
km = KMeans(n_clusters=2,
            random_state=0)
X = scaler.transform(numerical_df)
y_km = km.fit_predict(X)
ax1.scatter(X[y_km==0,0],
            X[y_km==0,1],
            c='lightblue',
            edgecolor='black',
            marker='o',
            s=40,
            label='cluster 1')
ax1.scatter(X[y_km==1,0],
            X[y_km==1,1],
            c='red',
            edgecolor='black',
            marker='s',
            s=40,
            label='cluster 2')
ax1.set_title('NBA Team K-means clustering')
from sklearn.cluster import AgglomerativeClustering

X = scaler.transform(numerical_df)
ac = AgglomerativeClustering(n_clusters=2,
                             affinity='euclidean',
                             linkage='complete')
y_ac = ac.fit_predict(X)
ax2.scatter(X[y_ac==0,0],
             X[y_ac==0,1],
             c='lightblue',
             edgecolor='black',
             marker='o',
            s=40,
            label='cluster 1')
ax2.scatter(X[y_ac==1,0],
            X[y_ac==1,1],
            c='red',
            edgecolor='black',
            marker='s',
            s=40,
            label='cluster 2')
ax2.set_title('NBA Team Agglomerative clustering')
plt.legend()
plt.show()

##### 3D Plot in R

![Valuation 3D Plot](https://user-images.githubusercontent.com/58792/36056809-7f87a266-0dbc-11e8-8877-9bb87905adbd.png)

Source Code:  https://github.com/noahgift/socialpowernba/blob/master/plot_team_cluster.R

#### Use PCA with sklearn

References:



1.  [ PCA sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)





In [0]:
import pandas as pd
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(numerical_df)
X = pca.transform(numerical_df)
print(f"Before PCA Reduction{numerical_df.shape}")
print(f"After PCA Reduction {X.shape}")

##### Simple Scatter Plot of Reduced Dimensions

In [0]:
plt.scatter(X[:, 0], X[:, 1])
plt.show()


#### Using yellowbrick road for Feature Ranking

![yb viz](https://user-images.githubusercontent.com/58792/48080128-d7b8d900-e1a1-11e8-8f8c-aba2473bfc81.png)

Another "road" to travel

```python
!pip install yellowbrick
from yellowbrick.features import Rank2D

visualizer = Rank2D(algorithm="pearson")
visualizer.fit_transform(val_housing_win_df.as)
visualizer.poof()```

In [0]:
!pip -q install -U yellowbrick

In [0]:
from yellowbrick.features import Rank2D

visualizer = Rank2D(algorithm="pearson")
visualizer.fit_transform(numerical_df)
visualizer.poof()

#### Use ML classification prediction with scikit-learn

Create supervized classification prediction

In [0]:
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit?

#### ML Regression prediction with scikit-learn

Create supervized regression prediction

In [0]:
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors=2)
neigh.fit?


#### ML auto-sklearn

automated machine learning toolkit:  [automl drop-in replacement for scikit-learn estimator](http://automl.github.io/auto-sklearn/stable/)

Emerging trend is to automatically pick the right model using "Automl"


```

!pip install auto-sklearn

```



```python
import autosklearn.classification
```





#### Using Plotly for interactive Data Visualization

* *[Read related material covered in Chapter 10 of Pragmatic AI](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/ch10.xhtml#ch10)*

* *[Watch Video Lesson 9:10:  Use Plotly for interactive data visualization](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118/9780135261118-EMLA_01_09_10)*


Cell configuration to setup Plotly
Further documentation available from [Google on Plotly Colab Integration](https://colab.research.google.com/notebooks/charts.ipynb#scrollTo=YVhMPxwa-wmS)



In [0]:
def configure_plotly_browser_state():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-1.5.1.min.js?noext',
            },
          });
        </script>
        '''))


##### Going Further with Real Estate Exploration

In [0]:
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn as sns; sns.set(color_codes=True)
from sklearn.cluster import KMeans
color = sns.color_palette()
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
%matplotlib inline

In [0]:
df = pd.read_csv("https://raw.githubusercontent.com/noahgift/real_estate_ml/master/data/Zip_Zhvi_SingleFamilyResidence.csv")

In [0]:
df.describe()

**Clean Up DataFrame**
Rename RegionName to ZipCode and Change Zip Code to String



In [0]:
df.rename(columns={"RegionName":"ZipCode"}, inplace=True)
df["ZipCode"]=df["ZipCode"].map(lambda x: "{:.0f}".format(x))
df["RegionID"]=df["RegionID"].map(lambda x: "{:.0f}".format(x))
df.head()

In [0]:
median_prices = df.median()

In [0]:
median_prices.tail()

In [0]:
marin_df = df[df["CountyName"] == "Marin"].median()
sf_df = df[df["City"] == "San Francisco"].median()
palo_alto = df[df["City"] == "Palo Alto"].median()
df_comparison = pd.concat([marin_df, sf_df, palo_alto, median_prices], axis=1)
df_comparison.columns = ["Marin County", "San Francisco", "Palo Alto", "Median USA"]

**Plotly visualization**

[Shortcut view of plot if slow to load](http://nbviewer.jupyter.org/github/noahgift/real_estate_ml/blob/648361ce7392a0af29ce79780e6e5159c1a378e9/notebooks/explore_zillow_data_sets.ipynb)

In [0]:
import cufflinks as cf
cf.go_offline()

from plotly.offline import init_notebook_mode
configure_plotly_browser_state()
init_notebook_mode(connected=False)


df_comparison.iplot(title="Bay Area Median Single Family Home Prices 1996-2017",
                    xTitle="Year",
                    yTitle="Sales Price",
                   #bestfit=True, bestfit_colors=["pink"],
                   #subplots=True,
                   shape=(4,1),
                    #subplot_titles=True,
                    fill=True,)

**Cluster on Size Rank and Price**

In [0]:
from sklearn.preprocessing import MinMaxScaler

In [0]:
columns_to_drop = ['RegionID', 'ZipCode', 'City', 'State', 'Metro', 'CountyName']
df_numerical = df.dropna()
df_numerical = df_numerical.drop(columns_to_drop, axis=1)

In [0]:
df_numerical.describe()

In [0]:
scaler = MinMaxScaler()
scaled_df = scaler.fit_transform(df_numerical)
kmeans = KMeans(n_clusters=3, random_state=0).fit(scaled_df)
print(len(kmeans.labels_))

In [0]:
cluster_df = df.copy(deep=True)
cluster_df.dropna(inplace=True)
cluster_df.describe()
cluster_df['cluster'] = kmeans.labels_
cluster_df['appreciation_ratio'] = round(cluster_df["2017-09"]/cluster_df["1996-04"],2)
cluster_df['CityZipCodeAppRatio'] = cluster_df['City'].map(str) + "-" + cluster_df['ZipCode'] + "-" + cluster_df["appreciation_ratio"].map(str)
cluster_df.head()

**Create a 3D Plot**

[Shortcut view of plot if slow to load](http://nbviewer.jupyter.org/github/noahgift/real_estate_ml/blob/648361ce7392a0af29ce79780e6e5159c1a378e9/notebooks/explore_zillow_data_sets.ipynb)


In [0]:
import plotly.offline as py
import plotly.graph_objs as go

from plotly.offline import init_notebook_mode
configure_plotly_browser_state()
init_notebook_mode(connected=False)

trace1 = go.Scatter3d(
    x=cluster_df["appreciation_ratio"],
    y=cluster_df["1996-04"],
    z=cluster_df["2017-09"],
    mode='markers',
    text=cluster_df["CityZipCodeAppRatio"],
    marker=dict(
        size=12,
        color=cluster_df["cluster"],                # set color to an array/list of desired values
        colorscale='Viridis',   # choose a colorscale
        opacity=0.8
    )
)
#print(trace1)
data = [trace1]
layout = go.Layout(
    showlegend=False,
    title="30 Year History USA Real Estate Prices (Clusters Colored)",
    scene = dict(
        xaxis = dict(title='X: Appreciation Ratio'),
        yaxis = dict(title="Y:  1996 Prices"),
        zaxis = dict(title="Z:  2017 Prices"),
    ),
    width=1000,
    height=900,
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='3d-scatter-colorscale')

**NBA Player Endorsements Interactive Plotly Graph**

Reference:

*   https://plot.ly/~ngift/17/






In [0]:
from plotly.offline import init_notebook_mode
configure_plotly_browser_state()
init_notebook_mode(connected=False)


import plotly.offline as py
from plotly.graph_objs import *
trace1 = {
  "x": ["LeBron James", "Kevin Durant", "James Harden", "Russell Westbrook", "Carmelo Anthony", "Dwyane Wade", "Chris Paul", "Derrick Rose", "Kyrie Irving", "Stephen Curry"], 
  "y": [55, 36, 20, 15, 8, 13, 8, 14, 13, 35], 
  "name": "Endorsements in Millions", 
  "type": "bar", 
  "uid": "df2707", 
  "xsrc": "ngift:16:53adec", 
  "ysrc": "ngift:16:0e0504"
}
trace2 = {
  "x": ["LeBron James", "Kevin Durant", "James Harden", "Russell Westbrook", "Carmelo Anthony", "Dwyane Wade", "Chris Paul", "Derrick Rose", "Kyrie Irving", "Stephen Curry"], 
  "y": [14.7, 6.29, 3.28, 4.28, 3.77, 4.67, 2.69, 3.27, 4.8, 17.57], 
  "name": "Wikipedia Pageviews", 
  "type": "bar", 
  "uid": "c9d073", 
  "xsrc": "ngift:16:53adec", 
  "ysrc": "ngift:16:fea27a"
}
trace3 = {
  "x": ["LeBron James", "Kevin Durant", "James Harden", "Russell Westbrook", "Carmelo Anthony", "Dwyane Wade", "Chris Paul", "Derrick Rose", "Kyrie Irving", "Stephen Curry"], 
  "y": [20.43, 12.24, 15.54, 17.34, 5.26, 2.52, 13.48, 1.17, 8.28, 18.8], 
  "name": "Wins Attributed to Player", 
  "type": "bar", 
  "uid": "cfe1ac", 
  "xsrc": "ngift:16:53adec", 
  "ysrc": "ngift:16:f3c87e"
}
trace4 = {
  "x": ["LeBron James", "Kevin Durant", "James Harden", "Russell Westbrook", "Carmelo Anthony", "Dwyane Wade", "Chris Paul", "Derrick Rose", "Kyrie Irving", "Stephen Curry"], 
  "y": [30.96, 26.5, 26.5, 26.5, 24.56, 23.2, 22.87, 21.32, 17.64, 12.11], 
  "name": "Salary in Millions", 
  "type": "bar", 
  "uid": "f83635", 
  "xsrc": "ngift:16:53adec", 
  "ysrc": "ngift:16:2cdf3e"
}
trace5 = {
  "x": ["LeBron James", "Kevin Durant", "James Harden", "Russell Westbrook", "Carmelo Anthony", "Dwyane Wade", "Chris Paul", "Derrick Rose", "Kyrie Irving", "Stephen Curry"], 
  "y": [5.53, 1.43, 0.97, 2.13, 0.72, 0.35, 0.83, 1.86, 1.54, 12.28], 
  "name": "Twitter Favorite Count/1000", 
  "type": "bar", 
  "uid": "9d1aad", 
  "xsrc": "ngift:16:53adec", 
  "ysrc": "ngift:16:191da9"
}
data = Data([trace1, trace2, trace3, trace4, trace5])
layout = {
  "barmode": "group", 
  "title": "2016-2017 NBA Season Endorsement and Social Power", 
  "xaxis": {
    "autorange": True, 
    "range": [-0.5, 9.5], 
    "type": "category"
  }, 
  "yaxis": {
    "autorange": True, 
    "range": [0, 57.8947368421], 
    "type": "linear"
  }
}
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='3d-scatter-colorscale')