## Histograms

#### Histograms are great for visualizing the distribution of data, where the data falls within certain boundaries. It's a lot like a bar graph but, the histogram groups the data up into bins instead of ploting each individual value.

In [1]:
# import libraries
from matplotlib import pyplot as plt
%matplotlib notebook
from pandas import read_csv

In [2]:
# choose style
plt.style.use('seaborn')

In [3]:
# parse data to use
data = read_csv('data.csv')
data.head()

Unnamed: 0,Responder_id,Age
0,1,14
1,2,19
2,3,28
3,4,22
4,5,30


In [4]:
ages = data['Age']

In [5]:
# this time, our function is called hist
# it simply takes the x argument and creates the histogram

plt.hist(ages, edgecolor='#000000')

plt.title('Ages Of Sumbitters')
plt.xlabel('Age Range')
plt.ylabel('Total Submitters')
plt.show()

<IPython.core.display.Javascript object>

In [6]:
# As we can see, hist auto-calculates the bin size for us
# We can put our desired bin range manually 

bins = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

plt.hist(ages, bins=bins, edgecolor='#000000')

plt.title('Ages Of Sumbitters')
plt.xlabel('Age Range')
plt.ylabel('Total Submitters')
plt.show()

<IPython.core.display.Javascript object>

### Log scale histogram

In [7]:
# Due to big differences between the frequencies on each bin, some information might be dwarfed.
# We can plot the histogram in logarithm scale to solve this problem. To do so:

plt.hist(ages, bins=bins, edgecolor='#000000', log=True)

plt.title('Ages Of Sumbitters')
plt.xlabel('Age Range')
plt.ylabel('Total Submitters')
plt.show()

# That way, we revialed hidden information which in my opinion is very intresting.

<IPython.core.display.Javascript object>

### Adding extra graph information
#### axvline - axhline funcitons

In [8]:
# find median
median_age = ages.sort_values().values[ages.shape[0]//2]
median_age

29

In [9]:
plt.hist(ages, bins=bins, edgecolor='#000000', log=True)

# So this vertical line tells us more, for example, how many people are falling whithin which age groups who answered the survey
plt.axvline(median_age, color='#FC4F36', label='Median Age', linewidth=3)

plt.legend()

plt.title('Ages Of Sumbitters')
plt.xlabel('Age Range')
plt.ylabel('Total Submitters')
plt.show()

<IPython.core.display.Javascript object>