# Session on Plotly Graph Objects (GO)

In [1]:
import pandas as pd
import numpy as np
import plotly.offline as pyo
import plotly.graph_objs as go

## Introduction to Plotly Graph Objects (plotly.go)

Plotly Graph Objects, commonly referred to as **plotly.go**, is the low-level, object-oriented interface of the Plotly visualization library in Python. It allows explicit construction and control of visualizations by defining individual components such as traces, layouts, axes, annotations, and interactions.

Unlike high-level interfaces, plotly.go exposes the full Plotly figure schema, making it suitable for complex, highly customized, and production-oriented visualizations.

---

## Position of plotly.go in the Plotly Ecosystem

Plotly provides multiple abstraction layers for data visualization:

* **plotly.express (px)**: High-level interface designed for simplicity and speed
* **plotly.graph_objects (go)**: Low-level interface designed for precision and control

So far, visualizations have been created using **plotly.express**, which internally generates figures composed of graph objects. Learning plotly.go means working directly with these underlying objects rather than relying on automated defaults.

---

## Comparison: Plotly Express vs Plotly Graph Objects

### Level of Abstraction

* **Plotly Express**
  A high-level API that abstracts away most configuration details, enabling rapid visualization with minimal code.

* **Plotly Graph Objects**
  A low-level API where each component of the figure is explicitly defined, offering complete control.

---

### Code Structure and Style

* **Plotly Express**
  Declarative and concise, well-suited for exploratory data analysis and quick insights.

* **Plotly Graph Objects**
  Object-oriented and more verbose, suitable for complex visualizations and application-level usage.

---

### Customization Capabilities

* **Plotly Express**
  Limited to predefined parameters. Advanced customization often requires transitioning to graph objects.

* **Plotly Graph Objects**
  Near-unlimited customization, including detailed styling, layout control, and interactivity.

---

### Typical Use Cases

* **Plotly Express**

  * Exploratory Data Analysis
  * Rapid prototyping
  * Learning and academic use

* **Plotly Graph Objects**

  * Enterprise dashboards
  * Research and publication-grade visuals
  * Visualization components in applications

---

## Relationship Between plotly.express and plotly.go

Plotly Express is built on top of Plotly Graph Objects. Every figure created using plotly.express ultimately results in a `go.Figure` object containing one or more trace objects.

This implies:

* plotly.express simplifies figure creation
* plotly.go provides direct access to the underlying figure structure

Understanding plotly.go enables developers to modify, extend, and fine-tune figures initially created using plotly.express.

---

## plotly.go and Live Data Limitations

Plotly Graph Objects **do not directly support live or continuously updating data sources**. Once rendered, the figure remains static unless explicitly re-rendered.

For use cases involving:

* Real-time data updates
* Interactive callbacks
* Dynamic user-driven behavior

a separate framework is required.

---

## Introduction to Dash and Its Dependency on plotly.go

**Dash** is a Python framework used to build interactive and real-time analytical web applications. It is built on:

* Plotly for visualizations
* Flask for backend logic
* React.js for frontend rendering

Dash applications operate directly on **Plotly Graph Objects**, making prior knowledge of plotly.go essential. Therefore, learning plotly.go is a prerequisite before moving on to Dash.

---

## Recommended Learning Progression

1. Use plotly.express for quick and efficient visualizations
2. Learn plotly.graph_objects for full control and customization
3. Advance to Dash for building interactive and live dashboards

This progression aligns well with industry practices and is particularly relevant for data science, analytics engineering, and applied research roles.

---

## Summary

* plotly.go is the core, low-level API of the Plotly visualization library
* It offers precise control over figure construction and styling
* plotly.express is a higher-level wrapper built on top of plotly.go
* plotly.go alone does not handle live data sources
* Dash extends plotly.go to enable interactive and real-time applications

Mastering plotly.go bridges the gap between exploratory visualization and production-grade analytical systems.

In [2]:
match = pd.read_csv('datasets/matches.csv')
delivery = pd.read_csv('datasets/deliveries.csv')

ipl = delivery.merge(match, left_on='match_id', right_on='id')
ipl.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,...,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,1,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,2,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
2,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,3,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
3,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,4,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
4,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,5,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,


## A Scatter Plot Using Plotly Graph Objects (plotly.go)

A **scatter plot** is used to visualize the relationship between two **continuous variables**. In this example, Plotly Graph Objects are used to construct a scatter plot that analyzes the performance of the **top 50 batsmen of all time (by total runs)**.

* **X-axis**: Batsman Average
* **Y-axis**: Strike Rate

This type of visualization is commonly used in sports analytics to study trade-offs between consistency (average) and scoring speed (strike rate).

---

## Data Preparation

Before plotting, the required metrics—**Strike Rate** and **Average**—must be computed for the top 50 batsmen.

### Selecting the Top 50 Batsmen

```python
# Avg vs SR graph of top 50 batsman (in terms of total runs)

# fetching a new dataframe with top 50 batsman
top50 = ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(50).index
new_ipl = ipl[ipl['batsman'].isin(top50)]
```

The dataset is filtered to include only the top 50 batsmen based on total runs scored across all seasons.

---

## Calculating Strike Rate (SR)

Strike Rate is defined as:

[
\text{Strike Rate} = \left(\frac{\text{Total Runs}}{\text{Total Balls Faced}}\right) \times 100
]

```python
# calculating SR
# SR = [(number of runs scored) / (number of balls played)] * 100
runs = new_ipl.groupby('batsman')['batsman_runs'].sum()
balls = new_ipl.groupby('batsman')['batsman_runs'].count()

sr = (runs / balls) * 100

sr = sr.reset_index()
sr
```

This computation derives the scoring speed of each batsman.

---

## Calculating Batting Average

Batting Average is defined as:

$$
\text{Average} = \frac{\text{Total Runs}}{\text{Number of Outs}}
$$

```python
# calculating Avg
# Avg = Total number of runs / Number of outs

# number of outs for top 50 batsman
out = ipl[ipl['player_dismissed'].isin(top50)]

nouts = out['player_dismissed'].value_counts()

avg = runs / nouts

avg = avg.reset_index()
avg = avg.rename(columns={'index': 'batsman', 0: 'avg'})
avg = avg.merge(sr, on='batsman')
```

At this stage, the final dataframe contains:

* Batsman name
* Batting average
* Strike rate

This dataframe is now ready for visualization.

---

## Creating the Scatter Plot with plotly.go

```python
# plotting scatter plot

trace = go.Scatter(
    x=avg['avg'],
    y=avg['batsman_runs'],
    mode='markers',
    text=avg['batsman'],
    marker={'color': '#00a65a', 'size': 10}
)

data = [trace]

layout = go.Layout(
    title='Batsman Avg vs SR',
    xaxis={'title': 'Batsman Average'},
    yaxis={'title': 'Batsman Strike Rate'}
)

fig = go.Figure(data=data, layout=layout)

# fig.show()
pyo.plot(fig, filename='customFilename.html')
```

This code explicitly defines:

* A **Scatter trace**
* Marker styling
* Axis titles
* Figure layout

This level of control is a defining feature of Plotly Graph Objects.

---

## Displaying the Plot: fig.show() vs pyo.plot()

### Using `fig.show()`

* Displays the plot **inline within the notebook cell**
* Best suited for:

  * Exploratory Data Analysis
  * Quick experimentation
  * Jupyter Notebook environments

**Pros**

* Fast and convenient
* No external files created
* Ideal for iterative analysis

**Cons**

* Limited portability
* Output is tied to the notebook environment

---

### Using `pyo.plot(fig)`

* Generates an **HTML file**
* Saves it to the active working directory
* Automatically opens the plot in a web browser

**Pros**

* Portable and shareable as a standalone HTML file
* Suitable for reports, demos, and deployment
* Independent of notebook environments

**Cons**

* Creates additional files
* Slightly slower for rapid iteration

---

## Summary

* Scatter plots in plotly.go are built using explicit trace and layout definitions
* The example visualizes the relationship between batting average and strike rate
* plotly.go provides fine-grained control over styling and structure
* `fig.show()` is best for quick, notebook-based analysis
* `pyo.plot()` is better suited for sharing and production-level visualization

This example highlights why plotly.go is preferred when precision, customization, and deployment-readiness are required.

In [3]:
# Avg vs SR graph of top 50 batsman (in terms of total runs)

# fetching a new dataframe with top 50 batsman
top50 = ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(50).index
new_ipl = ipl[ipl['batsman'].isin(top50)]

In [4]:
# calculating SR
# SR = [(number of runs scored) / (number of balls played)] * 100
runs = new_ipl.groupby('batsman')['batsman_runs'].sum()
balls = new_ipl.groupby('batsman')['batsman_runs'].count()

sr = (runs / balls) * 100

sr = sr.reset_index()
sr

Unnamed: 0,batsman,batsman_runs
0,AB de Villiers,145.129059
1,AC Gilchrist,133.054662
2,AJ Finch,126.299213
3,AM Rahane,117.486549
4,AT Rayudu,123.014257
5,BB McCullum,126.318203
6,BJ Hodge,121.422376
7,CH Gayle,144.194313
8,DA Miller,137.709251
9,DA Warner,138.318401


In [5]:
# calculating Avg
# Avg =  Total umber of runs / Number of outs

# number of outs for top 50 batsman
out = ipl[ipl['player_dismissed'].isin(top50)]

nouts = out['player_dismissed'].value_counts()

avg = runs / nouts

avg = avg.reset_index()
avg = avg.rename(columns={'index': 'batsman', 0: 'avg'})
avg = avg.merge(sr, on='batsman')

In [6]:
# plotting scatter plot

trace = go.Scatter(x=avg['avg'], y= avg['batsman_runs'],
                   mode='markers',
                   text=avg['batsman'],
                   marker={'color': '#00a65a', 'size': 10})
data = [trace]
layout = go.Layout(title='Batsman Avg vs SR',
                   xaxis={'title': 'Batsman Average'},
                   yaxis={'title': 'Batsman Strike Rate'})
fig = go.Figure(data=data, layout=layout)
fig.show()
# pyo.plot(fig, filename='customFilename.html')

## Line Chart

A **line chart** is an extension of a scatter plot and is primarily used to represent **time-series (temporal) data**. Similar to a scatter plot, individual data points are plotted on the graph; however, in a line chart, these points are connected using lines to emphasize the **trend or progression of a variable over time**.

Line charts are especially useful for performance analysis, trend detection, and comparative temporal studies.

---

## Single Line Chart

A single line chart is used to analyze how a variable changes over time for **one entity**. In this example, the year-by-year batting performance of a single batsman is visualized.

```python
# year by year batsman performance

single = ipl[ipl['batsman'] == 'V Kohli']
performance = single.groupby('season')['batsman_runs'].sum().reset_index()

# plot line chart

trace = go.Scatter(
    x=performance['season'],
    y=performance['batsman_runs'],
    mode='lines + markers',
    marker={'color': "#008da6"}
)

data = [trace]

layout = go.Layout(
    title='Year by Year Performance',
    xaxis={'title': 'Season'},
    yaxis={'title': 'Total Runs'}
)

fig = go.Figure(data=data, layout=layout)

fig.show()
# pyo.plot(fig, filename='singleComparison.html')
```

This visualization highlights how the batsman’s run-scoring performance evolves across different seasons.

---

## Multiple Line Charts in a Single Plot (Static)

Multiple line charts can be plotted on the same graph to **compare trends across multiple entities**. In this static approach, each player is manually defined and plotted.

```python
# multiple line chart

# p1 df
single = ipl[ipl['batsman'] == 'V Kohli']
performance = single.groupby('season')['batsman_runs'].sum().reset_index()

# p2 df
single1 = ipl[ipl['batsman'] == 'RG Sharma']
performance1 = single1.groupby('season')['batsman_runs'].sum().reset_index()

# plot line chart

# trace p1
trace = go.Scatter(
    x=performance['season'],
    y=performance['batsman_runs'],
    mode='lines + markers',
    marker={'color': "#008da6"},
    name='V Kohli'
)

# trace p2
trace1 = go.Scatter(
    x=performance1['season'],
    y=performance1['batsman_runs'],
    mode='lines + markers',
    name='RG Sharma'
)

data = [trace, trace1]

layout = go.Layout(
    title='Year by Year Performance',
    xaxis={'title': 'Season'},
    yaxis={'title': 'Total Runs'}
)

fig = go.Figure(data=data, layout=layout)

fig.show()
# pyo.plot(fig, filename='singleComparison.html')
```

This approach is effective when the number of entities to be compared is small and known in advance.

---

## Multiple Line Charts (Dynamic / Variable)

For scalable comparisons, a **dynamic approach** can be used where line charts are generated automatically based on user input. This method is more flexible and suitable for analytical tools and dashboards.

```python
# automatically plotting line charts for multiple players

def batsman_comp(*players):
    data = []
    for name in players:
        single = ipl[ipl['batsman'] == name]
        performance = single.groupby('season')['batsman_runs'].sum().reset_index()

        trace = go.Scatter(
            x=performance['season'],
            y=performance['batsman_runs'],
            mode='lines + markers',
            name=name
        )
        
        data.append(trace)

    layout = go.Layout(
        title="Batsman Record Comparator",
        xaxis={'title': 'Season'},
        yaxis={'title': 'Runs'}
    )
    
    fig = go.Figure(data, layout)
    fig.show()
#   pyo.plot(fig)

batsman_comp('V Kohli', 'RG Sharma', 'MS Dhoni')
```

This approach enables:

* Reusability of code
* Easy comparison of any number of players
* Better scalability for analytical applications

---

## Summary

* Line charts are an extension of scatter plots designed to show trends over time
* Single line charts analyze temporal performance for one entity
* Multiple line charts enable comparison across entities
* Static multiple line charts are suitable for fixed comparisons
* Dynamic line charts provide flexibility and scalability

Line charts constructed using plotly.go offer precise control over traces and layouts, making them well-suited for analytical, research, and production-grade visualizations.

In [7]:
# year by year batsman performance

single = ipl[ipl['batsman'] == 'V Kohli']
performance = single.groupby('season')['batsman_runs'].sum().reset_index()
performance

Unnamed: 0,season,batsman_runs
0,2008,165
1,2009,246
2,2010,307
3,2011,557
4,2012,364
5,2013,639
6,2014,359
7,2015,505
8,2016,973
9,2017,308


In [8]:
# plot line chart

trace = go.Scatter(x=performance['season'], y=performance['batsman_runs'],
                   mode='lines + markers',
                   marker={'color': "#008da6"})

data = [trace]

layout = go.Layout(title='Year by Year Performance',
                   xaxis={'title': 'Season'},
                   yaxis={'title': 'Total Runs'})

fig = go.Figure(data=data, layout=layout)

fig.show()
# pyo.plot(fig, filename='singleComparison.html')

In [9]:
# multiple line chart

# p1 df
single = ipl[ipl['batsman'] == 'V Kohli']
performance = single.groupby('season')['batsman_runs'].sum().reset_index()

# p2 df
single1 = ipl[ipl['batsman'] == 'RG Sharma']
performance1 = single1.groupby('season')['batsman_runs'].sum().reset_index()

# plot line chart

# trace p1
trace = go.Scatter(x=performance['season'], y=performance['batsman_runs'],
                   mode='lines + markers',
                   marker={'color': "#008da6"},
                   name='V Kohli')

# trace p2
trace1 = go.Scatter(x=performance1['season'], y=performance1['batsman_runs'],
                   mode='lines + markers',
                   name='RG Sharma')

data = [trace, trace1]

layout = go.Layout(title='Year by Year Performance',
                   xaxis={'title': 'Season'},
                   yaxis={'title': 'Total Runs'})

fig = go.Figure(data=data, layout=layout)

fig.show()
# pyo.plot(fig, filename='singleComparison.html')

In [10]:
# automatically plotting line charts for multiple players

def batsman_comp(*players):
  data = []
  for name in players:
    single = ipl[ipl['batsman'] == name]
    performance = single.groupby('season')['batsman_runs'].sum().reset_index()

    trace = go.Scatter(x=performance['season'], y=performance['batsman_runs'],
                       mode='lines + markers', name=name)
    
    data.append(trace)

  layout = go.Layout(title="Batsman Record Comparator",
                     xaxis={'title': 'Season'},
                     yaxis={'title': 'Runs'})
  
  fig = go.Figure(data, layout)
  fig.show()
#   pyo.plot(fig)

batsman_comp('V Kohli', 'RG Sharma', 'MS Dhoni')

## Bar Plot

A **bar plot** is used to represent the relationship between **one categorical variable** and **one numerical variable**. Each category is represented by a rectangular bar whose height (or length) corresponds to the numerical value.

Bar plots are commonly used for:

* Category-wise comparisons
* Ranking entities
* Aggregated summaries

---

## Simple Bar Plot

In this example, a bar plot is created to visualize the **top 10 IPL batsmen based on total runs scored**.

```python
top10 = ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(10).index
top10_df = ipl[ipl['batsman'].isin(top10)]

top10_score = top10_df.groupby('batsman')['batsman_runs'].sum().reset_index()

# plot bar graph
trace = go.Bar(
    x=top10_score['batsman'],
    y=top10_score['batsman_runs'],
    marker={'color': '#008da6'}
)

data = [trace]

layout = go.Layout(
    title='Top 10 IPL Batsman',
    xaxis={'title': 'Batsman'},
    yaxis={'title': 'Total Runs'}
)

fig = go.Figure(data=data, layout=layout)

fig.show()
# pyo.plot(fig)
```

This bar plot enables quick comparison of total runs scored by the top-performing batsmen.

---

## Types of Bar Graphs

In Plotly Graph Objects, bar graphs can be broadly categorized into three types:

1. **Nested (Grouped) Bar Graph**
2. **Stacked Bar Graph**
3. **Overlayed Bar Graph**

Each type serves a different analytical purpose depending on how multiple numerical variables are to be compared across categories.

---

## Summary

* Bar plots represent categorical vs numerical relationships
* Simple bar plots are ideal for ranking and direct comparison
* Plotly supports nested, stacked, and overlayed bar graphs
* Overlayed bar graphs can suffer from data hiding and visual ambiguity
* For accurate multi-variable comparison, stacked or grouped bars are often preferred

Bar plots in plotly.go offer explicit control over structure and layout, making them suitable for analytical and presentation-grade visualizations when used appropriately.

In [11]:
top10 = ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(10).index
top10_df = ipl[ipl['batsman'].isin(top10)]

top10_score = top10_df.groupby('batsman')['batsman_runs'].sum().reset_index()
top10_score

Unnamed: 0,batsman,batsman_runs
0,AB de Villiers,3486
1,CH Gayle,3651
2,DA Warner,4014
3,G Gambhir,4132
4,MS Dhoni,3560
5,RG Sharma,4207
6,RV Uthappa,3778
7,S Dhawan,3561
8,SK Raina,4548
9,V Kohli,4423


In [12]:
# plot bar graph
trace = go.Bar(x=top10_score['batsman'], y=top10_score['batsman_runs'],
               marker={'color': '#008da6'})

data = [trace]

layout = go.Layout(title='Top 10 IPL Batsman',
                   xaxis={'title': 'Batsman'},
                   yaxis={'title': 'Total Runs'})

fig = go.Figure(data=data, layout=layout)

fig.show()
# pyo.plot(fig)

## Overlayed Bar Graph

An **overlayed bar graph** plots multiple bars for the same category **on top of each other** using transparency or color distinction. In Plotly, this is achieved using:

```python
barmode='overlay'
```

### Key Limitation

A major drawback of overlayed bar graphs is **occlusion**:

* If the bar in the foreground has a greater value than the bar behind it,
* The bar at the back may be partially or completely hidden

This makes overlayed bar graphs **less reliable for precise comparison** and is considered a significant flaw of this visualization type.

---

## Data Preparation for Overlayed Bar Graph

The following steps compute **inning-wise runs** for the top 10 batsmen.

```python
iw = top10_df.groupby(['batsman', 'inning'])['batsman_runs'].sum().reset_index()

mask = iw['inning'] == 1
mask2 = iw['inning'] == 2

one = iw[mask]
two = iw[mask2]

one = one.rename(columns={'batsman_runs': '1st Inning'})
two = two.rename(columns={'batsman_runs': '2nd Inning'})

final = one.merge(two, on='batsman')[['batsman', '1st Inning', '2nd Inning']]
final
```

The resulting dataset contains inning-wise run totals for each batsman and is suitable for comparative visualization.

---

## Creating an Overlayed Bar Graph with plotly.go

```python
trace1 = go.Bar(
    x=final['batsman'],
    y=final['1st Inning'],
    name='1st Inning',
    marker={'color': '#00a65a'}
)

trace2 = go.Bar(
    x=final['batsman'],
    y=final['2nd Inning'],
    name='2nd Inning',
    marker={'color': '#a6a65a'}
)

data = [trace1, trace2]

layout = go.Layout(
    title="Inning Wise Scores",
    xaxis={'title': 'Batsman'},
    yaxis={'title': 'Runs'},
    barmode='overlay'
)

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)
```

## Stacked Bar Graph

A **stacked bar graph** is used to represent **both the total value and its internal composition** across categories. Multiple numerical variables are stacked vertically on top of each other for each category.

In Plotly Graph Objects, stacking is enabled using:

```python
barmode='stack'
```

```python
trace1 = go.Bar(
    x=final['batsman'],
    y=final['1st Inning'],
    name='1st Inning',
    marker={'color': '#00a65a'}
)

trace2 = go.Bar(
    x=final['batsman'],
    y=final['2nd Inning'],
    name='2nd Inning',
    marker={'color': '#a6a65a'}
)

data = [trace1, trace2]

layout = go.Layout(
    title="Inning Wise Scores",
    xaxis={'title': 'Batsman'},
    yaxis={'title': 'Runs'},
    barmode='stack'
)

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)
```

### How a Stacked Bar Graph Works

* Bars representing different variables are placed **on top of each other**
* The total height of the bar equals the **sum of all components**
* Individual segments show the **contribution of each variable** to the total

### Suitable Use Cases

* Analyzing **part-to-whole relationships**
* Understanding **total output along with its breakdown**
* Comparing cumulative values across categories

---

## Nested Bar Graph

A **nested bar graph** (also known as a grouped bar graph) places multiple bars **side by side** for each category, allowing direct comparison between variables.

In Plotly Graph Objects, nested bars are created by default when multiple bar traces are used without specifying `barmode`.

```python
trace1 = go.Bar(
    x=final['batsman'],
    y=final['1st Inning'],
    name='1st Inning',
    marker={'color': '#00a65a'}
)

trace2 = go.Bar(
    x=final['batsman'],
    y=final['2nd Inning'],
    name='2nd Inning',
    marker={'color': '#a6a65a'}
)

data = [trace1, trace2]

layout = go.Layout(
    title="Inning Wise Scores",
    xaxis={'title': 'Batsman'},
    yaxis={'title': 'Runs'}
)

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)
```

### How a Nested Bar Graph Works

* Bars for different variables are placed **adjacent to each other**
* Each bar starts from the same baseline
* Heights of bars can be directly compared

### Suitable Use Cases

* Comparing **multiple metrics side by side**
* Highlighting **differences between variables**
* Scenarios where precise comparison is more important than cumulative totals

---

## Key Differences: Stacked vs Nested Bar Graphs

| Aspect              | Stacked Bar Graph             | Nested Bar Graph       |
| ------------------- | ----------------------------- | ---------------------- |
| Bar Placement       | Vertically stacked            | Side by side           |
| Emphasis            | Total + composition           | Direct comparison      |
| Baseline            | Varies for each segment       | Common baseline        |
| Total Visibility    | Clearly visible               | Not emphasized         |
| Comparison Accuracy | Lower for individual segments | High for each variable |

---

## When to Use Which

* Use a **stacked bar graph** when:

  * The focus is on **overall totals**
  * You want to show **how components contribute** to a whole

* Use a **nested bar graph** when:

  * Direct comparison between variables is required
  * Accurate visual comparison is critical

Understanding this distinction ensures that bar graphs are chosen appropriately for analytical clarity and effective communication.


In [13]:
iw = top10_df.groupby(['batsman', 'inning'])['batsman_runs'].sum().reset_index()
mask = iw['inning'] == 1
mask2 = iw['inning'] == 2
one = iw[mask]
two = iw[mask2]

one = one.rename(columns={'batsman_runs' : '1st Inning'})
two = two.rename(columns={'batsman_runs' : '2nd Inning'})

final = one.merge(two, on='batsman')[['batsman', '1st Inning', '2nd Inning']]
final

Unnamed: 0,batsman,1st Inning,2nd Inning
0,AB de Villiers,2128,1345
1,CH Gayle,2003,1623
2,DA Warner,2118,1896
3,G Gambhir,1699,2433
4,MS Dhoni,2232,1328
5,RG Sharma,2344,1863
6,RV Uthappa,1516,2262
7,S Dhawan,2262,1299
8,SK Raina,2647,1893
9,V Kohli,2391,2027


In [14]:
trace1 = go.Bar(x=final['batsman'], y=final['1st Inning'],
                name='1st Inning',
                marker={'color': '#00a65a'})

trace2 = go.Bar(x=final['batsman'], y=final['2nd Inning'],
                name='2nd Inning',
                marker={'color': '#a6a65a'})

data = [trace1, trace2]

layout = go.Layout(title="Inning Wise Scores",
                   xaxis={'title': 'Batsman'},
                   yaxis={'title': 'Runs'},
                   barmode='overlay')

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)

In [15]:
trace1 = go.Bar(x=final['batsman'], y=final['1st Inning'],
                name='1st Inning',
                marker={'color': '#00a65a'})

trace2 = go.Bar(x=final['batsman'], y=final['2nd Inning'],
                name='2nd Inning',
                marker={'color': '#a6a65a'})

data = [trace1, trace2]

layout = go.Layout(title="Inning Wise Scores",
                   xaxis={'title': 'Batsman'},
                   yaxis={'title': 'Runs'},
                   barmode='stack')

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)

In [16]:
trace1 = go.Bar(x=final['batsman'], y=final['1st Inning'],
                name='1st Inning',
                marker={'color': '#00a65a'})

trace2 = go.Bar(x=final['batsman'], y=final['2nd Inning'],
                name='2nd Inning',
                marker={'color': '#a6a65a'})

data = [trace1, trace2]

layout = go.Layout(title="Inning Wise Scores",
                   xaxis={'title': 'Batsman'},
                   yaxis={'title': 'Runs'})

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)

## Bubble Plot

A **bubble plot** is an extension of the **scatter plot**. Similar to a scatter plot, data points are positioned using two continuous variables on the **x-axis** and **y-axis**. The key distinction is that a bubble plot incorporates **marker size** to represent an additional numerical variable, thereby adding another visual dimension to the plot.

Markers belonging to the same group can share the same color (as in a standard scatter plot), while the **size of each marker varies based on a third factor**.

### Dimensional Representation in a Bubble Plot

In total, a bubble plot can visually encode **four different variables**:

1. **X-axis** → First numerical variable
2. **Y-axis** → Second numerical variable
3. **Marker (bubble) size** → Third numerical variable
4. **Marker color** → Group or categorical variable

This makes bubble plots particularly useful when analyzing **multi-dimensional relationships** in a compact visual form.

---

## Example: Bubble Plot Using plotly.go

In the following example:

* **X-axis** represents batting average
* **Y-axis** represents strike rate
* **Bubble size** represents the number of sixes hit
* **Hover text** displays detailed contextual information for each batsman

### Data Preparation

```python
new_ipl = new_ipl[new_ipl['batsman_runs'] == 6]

six = new_ipl.groupby('batsman')['batsman_runs'].count().reset_index()

x = avg.merge(six, on='batsman')
```

Here:

* Only six-run deliveries are selected
* The total number of sixes per batsman is calculated
* The result is merged with the average and strike-rate dataset

---

## Creating the Bubble Plot

```python
trace = go.Scatter(
    x=x['avg'],
    y=x['batsman_runs_x'],
    mode='markers',
    marker={'size': x['batsman_runs_y']},
    text=x['batsman'],
    hovertemplate=
        'Batsman: %{text}<br>' +
        'Average: %{x}<br>' +
        'Strike Rate: %{y}<br>' +
        'Sixes: %{marker.size}<extra></extra>'
)

data = [trace]

layout = go.Layout(
    title='Bubble Chart',
    xaxis={'title': 'Average'},
    yaxis={'title': 'SR'}
)

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)
```

---

## Interpretation and Use Cases

* Larger bubbles indicate batsmen who hit more sixes
* Position on the plot shows the trade-off between average and strike rate
* Hover information enhances interpretability without cluttering the chart

---

## Summary

* Bubble plots extend scatter plots by introducing variable marker sizes
* They allow visualization of up to four dimensions of data simultaneously
* Marker size encodes a numerical variable, while color can encode grouping
* Bubble plots are effective for exploratory analysis of multi-dimensional datasets

When used judiciously, bubble plots provide a powerful way to uncover patterns that are not visible in two-dimensional visualizations.

In [17]:
new_ipl = new_ipl[new_ipl['batsman_runs'] == 6]

six = new_ipl.groupby('batsman')['batsman_runs'].count().reset_index()

x = avg.merge(six, on='batsman')

trace = go.Scatter(x=x['avg'], y=x['batsman_runs_x'], mode='markers',
                   marker={'size': x['batsman_runs_y']},
                   text=x['batsman'],
                   hovertemplate=
                     'Batsman: %{text}<br>' + 'Average: %{x}<br>' +
                     'Strike Rate: %{y}<br>' + 'Sixes: %{marker.size}<extra></extra>'
                   )

data = [trace]

layout = go.Layout(title='Bubble Chart',
                   xaxis={'title': 'Average'},
                   yaxis={'title': 'SR'})

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)

## Box Plot

A **box and whisker plot**, commonly referred to as a **box plot**, is a statistical visualization used to display the **distribution of a numerical dataset** through its **five-number summary**. It provides a concise way to understand central tendency, spread, variability, and the presence of outliers.

Box plots are especially useful for:

* Comparing distributions
* Identifying skewness
* Detecting outliers
* Summarizing large datasets

---

## The Five-Number Summary

A box plot is constructed using the following five statistical values:

1. **Minimum**
2. **First Quartile (Q1)**
3. **Median (Q2)**
4. **Third Quartile (Q3)**
5. **Maximum**

These values divide the dataset into **four quartiles**, each representing 25% of the data.

---

## Quartiles Explained

A dataset is divided into four equal parts after sorting in ascending order:

* **First Quartile (Q1)**
  The value below which 25% of the data falls.

* **Second Quartile (Q2 / Median)**
  The middle value of the dataset.
  50% of the data lies below and 50% lies above this value.

* **Third Quartile (Q3)**
  The value below which 75% of the data falls.

* **Fourth Quartile (Q4)**
  Extends from Q3 to the maximum value and contains the upper 25% of the data.

---

## Interquartile Range (IQR)

The **Interquartile Range (IQR)** measures the spread of the middle 50% of the data and is calculated as:

[
\text{IQR} = Q3 - Q1
]

IQR is a robust measure of variability because it is not affected by extreme values.

---

## Components of a Box Plot

### The Box

* Extends from **Q1 to Q3**
* Represents the **middle 50%** of the data
* The length of the box indicates the variability of this middle range

### Median Line

* A line inside the box at **Q2**
* Shows the central tendency of the dataset

### Whiskers

* Lines extending from the box to the **minimum and maximum non-outlier values**
* Indicate the overall spread of the data excluding outliers

### Outliers

* Data points lying outside the whiskers
* Typically defined as values:

  * Less than ( Q1 - 1.5 \times IQR )
  * Greater than ( Q3 + 1.5 \times IQR )

Outliers represent unusually high or low values and may indicate anomalies or special cases.

---

## How to Interpret a Box Plot

* **Median position** indicates skewness

  * Median closer to Q1 → right-skewed distribution
  * Median closer to Q3 → left-skewed distribution

* **Box length** reflects variability

  * Longer box → higher variability
  * Shorter box → lower variability

* **Whisker length** shows spread beyond the central range

* **Outliers** highlight extreme observations that may require further investigation

---

## Single Box Plot Example

The following example analyzes the distribution of **total match scores across all seasons**.

### Data Preparation

```python
match_agg = delivery.groupby(['match_id'])['total_runs'].sum().reset_index()
season_wise = match_agg.merge(
    match,
    left_on='match_id',
    right_on='id'
)[['match_id', 'total_runs', 'season']]
```

### Box Plot Construction

```python
trace = go.Box(
    x=season_wise['total_runs'],
    name='All Seasons',
    marker={'color': '#008da6'}
)

data = [trace]

layout = go.Layout(
    title='Total Score Analysis',
    xaxis={'title': 'Total Score'}
)

fig = go.Figure(data, layout)

fig.show()
```

This plot summarizes the overall distribution of match scores and highlights variability and outliers.

---

## Multiple Box Plots

Multiple box plots can be constructed **side by side** to compare distributions across different categories or time periods.

```python
# make multiple box plots

trace1 = go.Box(
    x=season_wise[season_wise['season'] == 2017]['total_runs'],
    name='2017',
    marker={'color': '#008da6'}
)

trace2 = go.Box(
    x=season_wise['total_runs'],
    name='2008'
)

data = [trace1, trace2]

layout = go.Layout(
    title='Total Score Analysis',
    xaxis={'title': 'Total Score'}
)

fig = go.Figure(data, layout)

fig.show()
# pyo.plot(fig)
```

---

## When to Use Box Plots

* To compare distributions across multiple categories
* To detect outliers and data anomalies
* To understand spread and skewness
* To summarize large datasets efficiently

---

## Summary

* Box plots visualize data using the five-number summary
* They divide data into four quartiles using Q1, Q2, and Q3
* IQR captures the variability of the middle 50% of data
* Whiskers show the non-outlier range
* Outliers highlight extreme observations
* Multiple box plots enable comparative distribution analysis

Box plots in plotly.go provide a statistically rigorous and visually compact way to analyze and compare data distributions.

In [18]:
match_agg = delivery.groupby(['match_id'])['total_runs'].sum().reset_index()
season_wise = match_agg.merge(match, left_on='match_id', right_on='id')[['match_id', 'total_runs', 'season']]
season_wise

Unnamed: 0,match_id,total_runs,season
0,1,379,2017
1,2,371,2017
2,3,367,2017
3,4,327,2017
4,5,299,2017
...,...,...,...
631,632,277,2016
632,633,317,2016
633,634,302,2016
634,635,325,2016


In [None]:
# plot box plot
trace = go.Box(x=season_wise['total_runs'], name='All Seasons', marker={'color': '#008da6'})

data = [trace]

layout = go.Layout(title='Total Score Analysis',
                   xaxis={'title': 'Total Score'})

fig = go.Figure(data, layout)

fig.show()

In [22]:
# make multiple box plots

trace1 = go.Box(x=season_wise[season_wise['season'] == 2017]['total_runs'], name='2017', marker={'color': '#008da6'})

trace2 = go.Box(x=season_wise['total_runs'], name='2008')

data = [trace1, trace2]

layout = go.Layout(title='Total Score Analysis',
                   xaxis={'title': 'Total Score'})

fig = go.Figure(data, layout)

fig.show()
# pyo.plot(fig)

## Distplot (Distribution Plot)

A **distribution plot (distplot)** is used to visualize the distribution of a **single continuous variable**. In Plotly, a distplot is a **composite visualization** that overlays three different plots on the same axes to provide both statistical and visual insight into the data distribution.

The three components of a distplot are:

1. **Histogram**
2. **Kernel Density Estimate (KDE) Plot**
3. **Rug Plot**

Each component contributes a different perspective on how the data is distributed.

---

## Components of a Distplot

### Histogram

* Displays the **frequency distribution** of the data
* The x-axis is divided into bins (intervals)
* The height of each bar represents the number of observations within that interval
* Provides a coarse but intuitive view of the data shape

---

### Kernel Density Estimate (KDE) Plot

* A smooth, continuous curve that represents the **probability density function** of the data
* Eliminates the discontinuities introduced by histogram bins
* Helps identify:

  * Distribution shape
  * Peaks (modes)
  * Skewness

The KDE plot is particularly useful for comparing distributions visually.

---

### Rug Plot

* A rug plot places **small tick marks** (usually along the x-axis) at the position of each data point
* Each tick represents a single observation

#### Why Rug Plots Are Useful

* Reveal the **exact location of individual data points**
* Help identify **clustering** and **gaps** in the data
* Provide transparency about data density without aggregating values

Rug plots are especially helpful when the dataset is small to medium in size, as they expose the raw data underlying the histogram and KDE curve.

---

## Distplot Example Using Plotly

```python
# plot distplot

import plotly.figure_factory as ff

hist_data = [avg['avg'], avg['batsman_runs']]
group_labels = ['Average', 'Strike Rate']

fig = ff.create_distplot(hist_data, group_labels, bin_size=[10, 20])

fig.show()
# pyo.plot(fig)
```

In this example:

* Multiple distributions are plotted together
* Each variable has its own histogram, KDE curve, and rug plot
* Bin sizes are customized separately for better representation

---

## Histogram

A **histogram** is a statistical plot used to discover and visualize the **underlying frequency distribution (shape)** of a set of continuous data.

---

## How a Histogram Works

1. The range of the data is divided into **bins (intervals)**
2. Each observation is assigned to a bin
3. The height of each bar represents the **count of observations** within that bin

Histograms help answer questions such as:

* Where is the data concentrated?
* Is the distribution symmetric or skewed?
* Are there multiple peaks?

---

## Histogram Example Using plotly.go

```python
# plot histogram

trace = go.Histogram(
    x=sr['batsman_runs'],
    name='Strike Rate Variations',
    xbins={'size': 2}
)

data = [trace]

layout = go.Layout(
    title='Strike Rate Analysis',
    xaxis={'title': 'Strike Rates'}
)

fig = go.Figure(data, layout)

fig.show()
# pyo.plot(fig)
```

---

## Bin Size Customization Using `xbins`

The `xbins` parameter controls how the data is divided along the x-axis.

In this example:

* A bin size of **2** is used
* Smaller bin sizes increase **granularity and accuracy**
* Larger bin sizes reduce noise but may hide details

To further control the histogram range, additional arguments can be provided:

```python
xbins={'size': 2, 'start': 50, 'end': 100}
```

### `xbins` Parameters

* **size**: Width of each bin
* **start**: Lower bound of the histogram
* **end**: Upper bound of the histogram

This allows precise control over readability and focus on specific value ranges.

---

## Summary

* Distplots visualize the distribution of a continuous variable using multiple layered plots
* Histograms show frequency distribution
* KDE plots provide a smooth density estimate
* Rug plots expose individual data points
* Histograms rely heavily on bin size selection
* `xbins` allows control over bin width and range

Distplots and histograms together provide both aggregated and granular insight into data distribution, making them essential tools in exploratory data analysis.

In [31]:
# plot distplot

import plotly.figure_factory as ff

hist_data = [avg['avg'], avg['batsman_runs']]

group_labels = ['Average', 'Strike Rate']

fig = ff.create_distplot(hist_data, group_labels, bin_size=[10, 20])

fig.show()
# pyo.plot(fig)

In [32]:
x = delivery.groupby('batsman')['batsman_runs'].count()>150
x = x[x].index.tolist()

new = delivery[delivery['batsman'].isin(x)]

runs = new.groupby('batsman')['batsman_runs'].sum()
balls = new.groupby('batsman')['batsman_runs'].count()

sr = (runs/balls)*100
sr = sr.reset_index()
sr

Unnamed: 0,batsman,batsman_runs
0,A Ashish Reddy,142.857143
1,A Mishra,89.756098
2,A Symonds,124.711908
3,AA Jhunjhunwala,99.541284
4,AB Agarkar,111.875000
...,...,...
169,Y Nagar,105.166052
170,Y Venugopal Rao,113.872832
171,YK Pathan,138.860326
172,YV Takawale,104.918033


In [36]:
# plot histogram
trace = go.Histogram(x = sr['batsman_runs'], name='Strike Rate Variations', xbins={'size': 2})
data = [trace]
layout = go.Layout(title='Strike Rate Analysis',
                   xaxis={'title': 'Strike Rates'},
                   )

fig = go.Figure(data, layout)

fig.show()
# pyo.plot(fig)

## Heatmap

A **heatmap** is a graphical representation of data in which the individual values of a matrix are represented using **colors**. Heatmaps are particularly effective for identifying patterns, intensities, and concentrations across two dimensions.

In a heatmap, **three pieces of information are represented simultaneously**:

1. **X-axis** → First categorical or numerical variable
2. **Y-axis** → Second categorical or numerical variable
3. **Color intensity (hue / saturation / gradient)** → Magnitude of the third numerical variable

This makes heatmaps suitable for multi-dimensional pattern analysis.

---

## Example: Heatmap Using plotly.go

In this example, the heatmap visualizes the **number of sixes hit per over by each batting team**.

### Data Preparation

```python
six = delivery[delivery['batsman_runs'] == 6]
six = six.groupby(['batting_team', 'over'])['batsman_runs'].count().reset_index()
```

Here:

* Only deliveries where **six runs** were scored are selected
* Data is grouped by `batting_team` and `over`
* The count represents the frequency of sixes

---

## Creating a Heatmap

```python
# plot heatmap
trace = go.Heatmap(
    x=six['batting_team'],
    y=six['over'],
    z=six['batsman_runs']
)

data = [trace]

layout = go.Layout(title='Sixes per Over Heatmap')

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)
```

In this plot:

* Teams are displayed on the x-axis
* Overs are displayed on the y-axis
* Color intensity represents the number of sixes

---

## Multiple Heatmaps in the Same Plot

Sometimes, comparing multiple heatmaps side by side is required. For example, comparing **six-hitting patterns** with **dot-ball patterns**.

---

## What Is the `dots` DataFrame?

```python
dots = delivery[delivery['batsman_runs'] == 0]
dots = dots.groupby(['batting_team', 'over'])['batsman_runs'].count().reset_index()
```

The `dots` dataframe represents:

* Deliveries where **zero runs** were scored
* Count of **dot balls per over per batting team**

This enables a contrast between aggressive (sixes) and defensive (dot balls) phases of play.

---

## Why the `tools` Module Is Needed

To place multiple heatmaps **side by side within the same figure**, subplot functionality is required. Plotly provides this through the `tools` module.

```python
from plotly import tools
```

### Purpose of the `tools` Module

* Enables creation of **complex figure layouts**
* Allows multiple plots in a single figure using subplots
* Supports shared axes, titles, and layout alignment

Previously, `tools` was not required because:

* All earlier visualizations contained **only a single plot per figure**
* `go.Figure()` was sufficient for simple layouts

When multiple plots must coexist in the same canvas, subplot utilities become necessary.

---

## Creating Multiple Heatmaps Side by Side

```python
trace1 = go.Heatmap(
    x=six['batting_team'],
    y=six['over'],
    z=six['batsman_runs'].values.tolist()
)

trace2 = go.Heatmap(
    x=dots['batting_team'],
    y=dots['over'],
    z=dots['batsman_runs'].values.tolist()
)

fig = tools.make_subplots(
    rows=1,
    cols=2,
    subplot_titles=["6's", "0's"],
    shared_yaxes=True
)

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)

fig.show()
# pyo.plot(fig)
```

---

## Explanation of New Concepts in This Code

### `tools.make_subplots()`

* Creates a figure layout with multiple subplots
* `rows=1, cols=2` places two plots horizontally
* `subplot_titles` assigns titles to each heatmap
* `shared_yaxes=True` ensures both plots use the same y-axis scale

### `append_trace()`

* Adds individual traces to a specific subplot location
* Arguments specify the trace, row number, and column number

### `.values.tolist()`

* Converts Pandas Series into a native Python list
* Ensures compatibility with Plotly’s internal rendering engine

---

## Summary

* Heatmaps visualize three variables simultaneously using color intensity
* X and Y axes define the grid structure
* Color gradient represents the magnitude of values
* The `tools` module is required for subplot creation
* `dots` represents dot-ball frequency for comparative analysis
* Multiple heatmaps enable side-by-side pattern comparison

Heatmaps in plotly.go are powerful tools for uncovering spatial and temporal patterns that are difficult to detect using standard charts.

In [37]:
six = delivery[delivery['batsman_runs'] == 6]
six = six.groupby(['batting_team', 'over'])['batsman_runs'].count().reset_index()

six

Unnamed: 0,batting_team,over,batsman_runs
0,Chennai Super Kings,1,9
1,Chennai Super Kings,2,21
2,Chennai Super Kings,3,49
3,Chennai Super Kings,4,45
4,Chennai Super Kings,5,53
...,...,...,...
290,Sunrisers Hyderabad,16,31
291,Sunrisers Hyderabad,17,25
292,Sunrisers Hyderabad,18,49
293,Sunrisers Hyderabad,19,58


In [41]:
# plot heatmap
trace = go.Heatmap(x=six['batting_team'], y=six['over'], z=six['batsman_runs'])
data = [trace]

layout = go.Layout(title='Sixes per Over Heatmap')

fig = go.Figure(data, layout)
fig.show()
# pyo.plot(fig)

In [44]:
# multiple heatmap side by side

dots=delivery[delivery['batsman_runs'] == 0]
dots=dots. groupby(['batting_team', 'over'])['batsman_runs'].count().reset_index()

from plotly import tools

trace1 = go.Heatmap(x=six['batting_team'], y=six['over'], z=six['batsman_runs']
                    .values.tolist())

trace2 = go.Heatmap(x=dots['batting_team'], y=dots['over'], z=dots['batsman_runs']
                    .values.tolist())
       
fig = tools.make_subplots(rows=1, cols=2, subplot_titles=["6's","0's"], shared_yaxes=True)

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)

fig.show()
# pyo.plot(fig)


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead

