<div align='center'><picture><source srcset="https://plotly-marketing-website-2.cdn.prismic.io/plotly-marketing-website-2/Z7eNlZ7c43Q3gCJw_Plotly-Logo-White.svg" type="image/webp"><img src="https://plotly-marketing-website-2.cdn.prismic.io/plotly-marketing-website-2/Z7eNlZ7c43Q3gCJw_Plotly-Logo-White.svg" width="300" height="300"></picture></div>

# **Article 126 : Plotly Graph Objects** [![Static Badge](https://img.shields.io/badge/Open%20in%20Colab%20-%20orange?style=plastic&logo=googlecolab&labelColor=grey)](https://colab.research.google.com/github/sshrizvi/DataScienceMastery/blob/main/DataVisualization/Notebooks/126_plotly_graph_objects.ipynb)

|üî¥ **NOTE** üî¥|
|:-----------:|
| This notebook contains the practical implementations of the concepts discussed in the following article.|
| Here is Article 126 - [Plotly Graph Objects](../Articles/126_plotly_graph_objects.md) |

### üì¶ **Importing Relevant Libraries**

In [2]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

### ‚ö†Ô∏è **Data Warning**  
For the visualizations ahead, we will be using the IPL Matches and Deliveries Data.

In [85]:
deliveries_df = pd.read_csv('../Resources/Data/IPL_Ball_by_Ball_2008_2022.csv')

In [83]:
matches_df = pd.read_csv('../Resources/Data/IPL_Matches_2008_2022.csv')

Preparing `ipl_df` by merging `deliveries_df` and `matches_df`.

In [88]:
ipl_df = matches_df.merge(deliveries_df, on='ID')

### üéØ **Scatter Plots**

**Problem :** We are going to draw a Scatter Plot between *Batsman Average (X-Axis)* and *Batsman Strike Rate (Y-Axis)* of the *TOP 50 Batsmen* in IPL (All Time)

First, we will find TOP 50 Batsman and their data from the `ipl_df`.

In [89]:
top50 = ipl_df.groupby(by='batter')['batsman_run'].sum().sort_values(ascending=False).head(50).index.to_list()
top50_ipl_df = ipl_df[ipl_df['batter'].isin(top50)]

Now, gonna calculate Batsman Average with the following formula:  
$$
\text{Average} = \frac{\text{Total Runs}}{\text{Number of Outs}}
$$

In [90]:
outs = ipl_df[ipl_df['player_out'].isin(top50)]
nouts = outs['player_out'].value_counts()
runs = top50_ipl_df.groupby('batter')['batsman_run'].sum()

average = runs / nouts
average = average.reset_index(name='average').rename({'index':'batter'}, axis=1)

Now, gonna calculate Batsman Strike Rate with the following formula:  
$$
\text{Strike Rate} = \frac{\text{Runs Scored}}{\text{Number of Balls Played}} \times 100
$$

In [91]:
runs = top50_ipl_df.groupby(by='batter')['batsman_run'].sum()
nballs = top50_ipl_df.groupby(by='batter')['batsman_run'].count()

strike_rate_df = (runs / nballs) * 100
strike_rate_df = strike_rate_df.reset_index(name='strike_rate')

Now, gonna merge `average` and `strike_rate` into a single DataFrame.

In [203]:
avg_sr_top50_df = average.merge(strike_rate_df, on='batter')

Now, plotting the Scatter Plot.

In [93]:
scatter_trace = go.Scatter(x=avg_sr_top50_df['average'],
                           y=avg_sr_top50_df['strike_rate'],
                           text=avg_sr_top50_df['batter'],
                           mode='markers',
                           marker={'color' : '#CF4B00'})
data = [scatter_trace]
layout = go.Layout(title='Batsman Average VS Strike Rate',
                   xaxis={'title' : 'Average'},
                   yaxis={'title' : 'Strike Rate'})
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Line Chart**

**Problem :** We will be plotting a Line Chart for Performance of a Particular Batsman across several years.

In [None]:
def batsman_year_summary(batter_name, df):
    batter_df = df[df['batter'] == batter_name]
    performance_summary = batter_df.groupby(by='Season')['batsman_run'].sum().sort_values()
    return performance_summary.reset_index()

In [None]:
batter_performance_df = batsman_year_summary('V Kohli', ipl_df)

Now, plotting the Line Chart.

In [110]:
line_trace = go.Scatter(x=batter_performance_df['Season'],
                           y=batter_performance_df['batsman_run'],
                           marker={'color' : '#473472'})
data = [line_trace]
layout = go.Layout(title='Batter Performance History',
                   xaxis={'title' : 'Season'},
                   yaxis={'title' : 'Total Runs'})
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Bar Chart**

**Problem :** Lets make a Bar Chart representing Total Runs Scored by the TOP 10 Batsman.

In [166]:
top10_batters = ipl_df.groupby(by='batter')['batsman_run'].sum().sort_values(ascending=False).head(10).index.to_list()

In [170]:
top10_batter_df = ipl_df[ipl_df['batter'].isin(top10_batters)]
top10_runs_summary = top10_batter_df.groupby(by='batter')['batsman_run'].sum().sort_values(ascending=False).reset_index()

In [171]:
bar_trace = go.Bar(x=top10_runs_summary['batter'],
                   y=top10_runs_summary['batsman_run'],
                   marker={'color' : '#F5AD18'},
                   name='TOP10')
data = [bar_trace]
layout = go.Layout(title='Bar Chart of TOP 10 Batters',
                   xaxis={'title' : 'Batter'},
                   yaxis={'title' : 'Total Runs'})
fig = go.Figure(data=data, layout=layout)
fig.show()

#### **1. Grouped Bar Chart**

**Problem :** We want to create a bar chart of TOP10 batsman and display Total Runs scored in Innings 1 and Innings 2.

In [180]:
innings_runs = top10_batter_df.groupby(by=['batter', 'innings'])['batsman_run'].sum().reset_index()
first_inning = innings_runs[innings_runs['innings'] == 1].rename({'batsman_run' : 'first_inning'}, axis=1)
second_inning = innings_runs[innings_runs['innings'] == 2].rename({'batsman_run' : 'second_inning'}, axis=1)
final_innings_runs = first_inning.merge(second_inning, on='batter')[['batter', 'first_inning', 'second_inning']]

To create a grouped bar chart, you will have to create multiple traces of single Bar Charts. Also you will have to set value of `barmode='group'` parameter of `go.Layout()` Class. 

In [195]:
inning1_trace = go.Bar(x=final_innings_runs['batter'],
                   y=final_innings_runs['first_inning'],
                   marker={'color' : '#F5AD18'},
                   name='Inning 1')
inning2_trace = go.Bar(x=final_innings_runs['batter'],
                   y=final_innings_runs['second_inning'],
                   marker={'color' : '#9E1C60'},
                   name='Inning 2')
data = [inning1_trace, inning2_trace]
layout = go.Layout(title='Grouped Bar Chart of TOP 10 Batters Innings Wise',
                   xaxis={'title' : 'Batter'},
                   yaxis={'title' : 'Total Runs'},
                   barmode='group')
fig = go.Figure(data=data, layout=layout)
fig.show()

#### **2. Stacked Bar Chart**

To create a grouped bar chart, you will have to create multiple traces of single Bar Charts. Also you will have to set value of `barmode='stack'` parameter of `go.Layout()` Class. 

In [194]:
inning1_trace = go.Bar(x=final_innings_runs['batter'],
                   y=final_innings_runs['first_inning'],
                   marker={'color' : '#F5AD18'},
                   name='Inning 1')
inning2_trace = go.Bar(x=final_innings_runs['batter'],
                   y=final_innings_runs['second_inning'],
                   marker={'color' : '#9E1C60'},
                   name='Inning 2')
data = [inning1_trace, inning2_trace]
layout = go.Layout(title='Stacked Bar Chart of TOP 10 Batters Innings Wise',
                   xaxis={'title' : 'Batter'},
                   yaxis={'title' : 'Total Runs'},
                   barmode='stack')
fig = go.Figure(data=data, layout=layout)
fig.show()

#### **3. Overlay Bar Chart**

To create a grouped bar chart, you will have to create multiple traces of single Bar Charts. Also you will have to set value of `barmode='overlay'` parameter of `go.Layout()` Class. 

In [193]:
inning1_trace = go.Bar(x=final_innings_runs['batter'],
                   y=final_innings_runs['first_inning'],
                   marker={'color' : '#F5AD18'},
                   name='Inning 1')
inning2_trace = go.Bar(x=final_innings_runs['batter'],
                   y=final_innings_runs['second_inning'],
                   marker={'color' : '#9E1C60'},
                   name='Inning 2')
data = [inning1_trace, inning2_trace]
layout = go.Layout(title='Overlay Bar Chart of TOP 10 Batters Innings Wise',
                   xaxis={'title' : 'Batter'},
                   yaxis={'title' : 'Total Runs'},
                   barmode='overlay')
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Bubble Chart**

**Problem :** We want to create a bubble chart where the *size of the bubble represents the number of sixes* hit by a batsman. We will be using the TOP50 Batsman Dataset.

In [196]:
top50 = ipl_df.groupby(by='batter')['batsman_run'].sum().sort_values(ascending=False).head(50).index.to_list()
top50_ipl_df = ipl_df[ipl_df['batter'].isin(top50)]

In [200]:
sixes_count = top50_ipl_df[top50_ipl_df['batsman_run'] == 6].groupby(by='batter')[
    'batsman_run'].count().sort_values(ascending=False).reset_index(name='sixes_count')

We will be using the `avg_sr_top50_df` DataFrame from the Scatter Plot Section of this Notebook.  
Now, lets merge it with the `sixes_count` to add some more information to it.

In [204]:
avg_sr_top50_df = avg_sr_top50_df.merge(sixes_count, on='batter')

Keep in mind that Bubble Chart is just an enhanced version of Scatter Plot which uses an extra variable to change the size of the markers and make them look like a bubble.

In [211]:
bubble_trace = go.Scatter(x=avg_sr_top50_df['average'],
                           y=avg_sr_top50_df['strike_rate'],
                           text=avg_sr_top50_df['batter'],
                           mode='markers',
                           marker={'color' : '#CF4B00',
                                   'size' : avg_sr_top50_df['sixes_count']})
data = [bubble_trace]
layout = go.Layout(title='Batsman Average VS Strike Rate',
                   xaxis={'title' : 'Average'},
                   yaxis={'title' : 'Strike Rate'},
                   height=700)
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Box Plot**

**Problem :** We want to create a Box Plot of Runs Scored in all matches of IPL.

First, we will find out runs scored in every match.

In [219]:
match_runs = ipl_df.groupby(by='ID')['total_run'].sum().sort_values(ascending=False).reset_index()

Second, we will find out the Season of the match.

In [229]:
match_summary_df = match_runs.merge(ipl_df[['ID', 'Season']], on='ID').drop_duplicates()

Now, lets plot a Box Plot on runs scored in each match of IPL.

In [233]:
box_trace = go.Box(x=match_summary_df['total_run'],
                   name='All Seasons',
                   marker={'color' : '#41A67E'})
data = [box_trace]
layout = go.Layout(title='Box Plot of Runs Scored in Matches of IPL',
                   xaxis={'title' : 'Total Runs'},
                   yaxis={'title' : 'Seasons'})
fig = go.Figure(data=data, layout=layout)
fig.show()

Now, lets plot Multiple box plot on Runs Scored in Matches in Season 2017 and 2018.

In [244]:
box_trace1 = go.Box(x=match_summary_df[match_summary_df['Season'] == '2017']['total_run'],
                   name='2017',
                   marker={'color' : '#41A67E'})
box_trace2 = go.Box(x=match_summary_df[match_summary_df['Season'] == '2018']['total_run'],
                   name='2018',
                   marker={'color' : '#1055C9'})
data = [box_trace1, box_trace2]
layout = go.Layout(title='Box Plot of Runs Scored in Matches of IPL',
                   xaxis={'title' : 'Total Runs'},
                   yaxis={'title' : 'Seasons'})
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Histograms**

**Problem :** We want to plot a Histogram to analyze the Strike Rate Distribution of TOP 50 Batsman.

We will be using the `avg_sr_top50_df` series to plot this histogram. 

Now, lets plot a Histogram.

In [255]:
hist_trace = go.Histogram(x=avg_sr_top50_df['strike_rate'],
                   marker={'color' : '#E5C95F'})
data = [hist_trace]
layout = go.Layout(title='Histogram of Strike Rate of Batsman',
                   xaxis={'title' : 'Strike Rate'},
                   yaxis={'title' : 'No. of Batsman'})
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Distplot**

In [260]:
import plotly.figure_factory as ff

hist_data = [avg_sr_top50_df['average']]
group_labels = ['Average']
colors = ['#BF124D']
fig = ff.create_distplot(hist_data=hist_data, group_labels=group_labels,
                         colors=colors)
fig.show()

Multiple Distplots.

In [262]:
import plotly.figure_factory as ff

hist_data = [avg_sr_top50_df['average'], avg_sr_top50_df['strike_rate']]
group_labels = ['Average', 'Strike Rate']
colors = ['#BF124D', '#67B2D8']
fig = ff.create_distplot(hist_data=hist_data, group_labels=group_labels,
                         colors=colors)
fig.show()

### üéØ **Heatmaps**

**Problem :** We want to create a Heatmap for representing number of sixes scored by each IPL Team in a every over.

In [296]:
sixes_deliveries = deliveries_df[deliveries_df['batsman_run'] == 6]
sixes_count_df = sixes_deliveries.groupby(by=['BattingTeam', 'overs'])['batsman_run'].count().reset_index(name='sixes_count')

Now, lets plot the Heatmap using `sixes_count_df`.

In [295]:
heatmap_trace = go.Heatmap(x=sixes_count_df['BattingTeam'],
                           y=sixes_count_df['overs'],
                           z=sixes_count_df['sixes_count'],
                           text=sixes_count_df['sixes_count'],
                           colorscale='Reds')
data = [heatmap_trace]

# Manual Annotation Logic
annotations = []
for i, count in enumerate(sixes_count_df['sixes_count']):
        annotations.append(
            go.layout.Annotation(
                text=str(count),
                x=sixes_count_df['BattingTeam'][i],
                y=sixes_count_df['overs'][i],
                xref='x',
                yref='y',
                showarrow=False,
                font=dict(color='white' if count >= 80 else 'black')
            )
        )

layout = go.Layout(title='Sixes Count Heatmap',
                   xaxis={'title' : 'Batting Team'},
                   yaxis={'title' : 'Overs'},
                   height=700,
                   annotations=annotations)
fig = go.Figure(data=data, layout=layout)
fig.show()

### üéØ **Subplots**

**Problem :** We want to plot multiple plots side by side.  
**Solution :** Ofcourse, the solution is Subplots.  
To create Subplots in Plotly, we need to import `subplots` from `plotly`, and use the `create_subplot` method to create subplots.

In [307]:
from plotly import subplots

Here is our `sixes_count_df` for the first Heatmap.

In [302]:
sixes_deliveries = deliveries_df[deliveries_df['batsman_run'] == 6]
sixes_count_df = sixes_deliveries.groupby(by=['BattingTeam', 'overs'])['batsman_run'].count().reset_index(name='sixes_count')

And here is our `dots_count_df` for the second Heatmap.

In [303]:
dots_deliveries = deliveries_df[deliveries_df['batsman_run'] == 0]
dots_count_df = dots_deliveries.groupby(by=['BattingTeam', 'overs'])['batsman_run'].count().reset_index(name='dots_count')

Now, lets plot two Heatmaps side by side using Subplots.

In [328]:
sixes_heatmap_trace = go.Heatmap(x=sixes_count_df['BattingTeam'],
                                 y=sixes_count_df['overs'],
                                 z=sixes_count_df['sixes_count'],
                                 text=sixes_count_df['sixes_count'],
                                 colorscale='Reds',
                                 name='Sixes')
dots_heatmap_trace = go.Heatmap(x=dots_count_df['BattingTeam'],
                                y=dots_count_df['overs'],
                                z=dots_count_df['dots_count'],
                                text=dots_count_df['dots_count'],
                                colorscale='Greens',
                                name='Dots')
fig = subplots.make_subplots(rows=1, cols=2, subplot_titles=['Sixes', 'Dots'],
                             shared_yaxes=True,
                             shared_xaxes=True)
fig.append_trace(sixes_heatmap_trace, 1, 1)
fig.append_trace(dots_heatmap_trace, 1, 2)
fig.show()