# Histograms

A histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. It differs from a bar graph, in the sense that a bar graph relates two variables, but a histogram relates only one. To construct a histogram, the first step is to *"bin"* (or "bucket") the range of values-that is, divide the entire range of values into a series of intervals-and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but not required to be) of equal size. [Histogram on Wikipedia](https://en.wikipedia.org/wiki/Histogram)
 ![Histogram](https://upload.wikimedia.org/wikipedia/commons/c/c3/Histogram_of_arrivals_per_minute.svg)

* Histograms are a special form of bar chart where the data represent continuous rather than discrete categories. 
* For example a histogram could be used to present details of the average number of hours exercise carried out by people of different ages because age is a continuous rather than a discrete category. 
* However, because a continuous category may have a large number of possible values the data are often grouped to reduce the number of data points. 
* For example, instead of drawing a bar for each individual age between 0 and 65, the data could be grouped into a series of continuous age ranges such as 16-24, 25-34, 35-44 etc.
* Unlike a bar chart, in a histogram both the x- and y-axes have a scale. 
* This means that it is the area of the bar that is proportional to the size of the category represented and not just its height. [Source](https://www.le.ac.uk/oerresources/ssds/numeracyskills/page_32.htm)


## Histogram with Plotly Express
In plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...). Also, the data to be binned can be numerical data but also categorical or date data.

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
#fig = px.histogram(df, x="total_bill", nbins=20) # choose number of bins
# Here we use a column with categorical data
#fig = px.histogram(df, x="day", nbins=2)
fig.show()

## Type of Normalization and Other Options
* The default mode is to represent the count of samples in each bin. 
* With the *histnorm* argument, it is also possible to represent:
* the percentage or fraction of samples in each bin (*histnorm='percent'* or *probability*)
* or a density histogram (the sum of bars is equal to 100, *density*)
* or a probability density histogram (sum equal to 1, *probability density*).

In [None]:
import plotly.express as px
df = px.data.tips()
#fig = px.histogram(df, x="total_bill", histnorm='probability')
fig = px.histogram(df, x="total_bill",
                   title='Histogram of bills',
                   labels={'total_bill':'total bill'}, # can specify one label per df column
                   opacity=0.8,
                   log_y=True, # represent bars with log scale
                   color_discrete_sequence=['indianred'] # color of histogram bars
                   )
fig.show()

## Several histograms for the different values of one column

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color="sex", opacity=0.8) 
fig.show()

## Visualizing the distribution
* With the *marginal* keyword, a subplot is drawn alongside the histogram, visualizing the distribution.
* More on distribution plots later

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color="sex", marginal="box", # can be `box`, `violin`
                         hover_data=df.columns)
fig.show()

## Histograms using Graphic Object

In [None]:
# Basic Histogram
import plotly.graph_objects as go

import numpy as np
np.random.seed(1)

x = np.random.randn(500)

#fig = go.Figure(data=[go.Histogram(x=x)])
#Normalized Histogram
fig = go.Figure(data=[go.Histogram(x=x, histnorm='probability')])
fig.show()

## Horizontal Histogram

In [None]:
import plotly.graph_objects as go

import numpy as np

y = np.random.randn(500)
# Use `y` argument instead of `x` for horizontal histogram

fig = go.Figure(data=[go.Histogram(y=y)])
fig.show()

## Overlaid Histogram

In [None]:
import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(500)
# Add 1 to shift the mean of the Gaussian distribution
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.55)
fig.show()

## Stacked Histograms

In [None]:
import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(2000)
x1 = np.random.randn(2000) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

# The two histograms are drawn on top of another
fig.update_layout(barmode='stack')
fig.update_traces(opacity=0.75)

fig.show()

## Styled Histogram

In [None]:
import plotly.graph_objects as go

import numpy as np
x0 = np.random.randn(500)
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(
    x=x0,
    histnorm='percent',
    name='control', # name used in legend and hover labels
    xbins=dict( # bins used for histogram
        start=-4.0,
        end=3.0,
        size=0.5
    ),
    marker_color='#EB89B5',
    opacity=0.75
))
fig.add_trace(go.Histogram(
    x=x1,
    histnorm='percent',
    name='experimental',
    xbins=dict(
        start=-3.0,
        end=4,
        size=0.5
    ),
    marker_color='#330C73',
    opacity=0.75
))

fig.update_layout(
    title_text='Sampled Results', # title of plot
    xaxis_title_text='Value', # xaxis label
    yaxis_title_text='Count', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)

fig.show()

## Cumulative Histogram
The cumulative histogram is a histogram in which the vertical axis gives not just the counts for a single bin, but rather gives the counts for that bin plus all bins for smaller values of the response variable. [Source](https://mipav.cit.nih.gov/pubwiki/index.php/Cumulative_Histogram)

In [None]:
import plotly.graph_objects as go

import numpy as np

x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x, cumulative_enabled=True)])

fig.show()

## Custom Binning

* For custom binning along *x-axis*, use the attribute *nbinsx*. 
* Please note that the autobin algorithm will choose a *'nice'* round bin size that may result in somewhat fewer than nbinsx total bins. 
* Alternatively, you can set the exact values for *xbins* along with *autobinx = False*.


In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

x = ['1970-01-01', '1970-01-01', '1970-02-01', '1970-04-01', '1970-01-02',
     '1972-01-31', '1970-02-13', '1971-04-19']

fig = make_subplots(rows=3, cols=2)

trace0 = go.Histogram(x=x, nbinsx=4)
trace1 = go.Histogram(x=x, nbinsx = 8)
trace2 = go.Histogram(x=x, nbinsx=10)
trace3 = go.Histogram(x=x,
                      xbins=dict(
                      start='1969-11-15',
                      end='1972-03-31',
                      size='M18'), # M18 stands for 18 months
                      autobinx=False
                     )
trace4 = go.Histogram(x=x,
                      xbins=dict(
                      start='1969-11-15',
                      end='1972-03-31',
                      size='M4'), # 4 months bin size
                      autobinx=False
                      )
trace5 = go.Histogram(x=x,
                      xbins=dict(
                      start='1969-11-15',
                      end='1972-03-31',
                      size= 'M2'), # 2 months
                      autobinx = False
                      )

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 2)
fig.append_trace(trace4, 3, 1)
fig.append_trace(trace5, 3, 2)

fig.show()

## Share bins between histograms
In this example both histograms have a compatible bin settings using *bingroup* attribute. Note that traces on the same subplot, and with the same *barmode* ("stack", "relative", "group") are forced into the same *bingroup*, however traces with *barmode = "overlay"* and on different axes (of the same axis type) can have compatible bin settings. 

In [None]:
import plotly.graph_objects as go
import numpy as np

fig = go.Figure(go.Histogram(
    x=np.random.randint(7, size=100),
    bingroup=1))

fig.add_trace(go.Histogram(
    x=np.random.randint(7, size=20),
    bingroup=1))

fig.update_layout(
    barmode="overlay",
    bargap=0.1)

fig.show()

## 2D Histogram of a Bivariate Normal Distribution

In [None]:
import plotly.graph_objects as go

import numpy as np

x = np.random.randn(500)
y = np.random.randn(500)+1

fig = go.Figure(go.Histogram2d(x=x, y=y, histnorm='probability',
        autobinx=False,
        xbins=dict(start=-3, end=3, size=0.1),
        autobiny=False,
        ybins=dict(start=-2.5, end=4, size=0.1),
        colorscale=[[0, 'rgb(12,51,131)'], [0.25, 'rgb(10,136,186)'], [0.5, 'rgb(242,211,56)'], [0.75, 'rgb(242,143,56)'], [1, 'rgb(217,30,30)']]
    ))
fig.show()

## Sharing bin settings between 2D Histograms

This example shows how to use *bingroup* attribute to have a compatible bin settings for both histograms. To define *start*, end and *size* value of *x-axis* and *y-axis* seperatly, set *ybins* and *xbins*.

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(2,2)
fig.add_trace(go.Histogram2d(
    x = [ 1, 2, 2, 3, 4 ],
    y = [ 1, 2, 2, 3, 4 ],
    coloraxis = "coloraxis",
    xbins = {'start':1, 'size':1}), 1,1)
fig.add_trace(go.Histogram2d(
    x = [ 4, 5, 5, 5, 6 ],
    y = [ 4, 5, 5, 5, 6 ],
    coloraxis = "coloraxis",
    ybins = {'start': 3, 'size': 1}),1,2)
fig.add_trace(go.Histogram2d(
    x = [ 1, 2, 2, 3, 4 ],
    y = [ 1, 2, 2, 3, 4 ],
    bingroup = 1,
    coloraxis = "coloraxis",
    xbins = {'start':1, 'size':1}), 2,1)
fig.add_trace(go.Histogram2d(
    x = [ 4, 5, 5, 5, 6 ],
    y = [ 4, 5, 5, 5, 6 ],
    bingroup = 1,
    coloraxis = "coloraxis",
    ybins = {'start': 3, 'size': 1}),2,2)
fig.show()