## Histograms

It is a common practice to create histograms to explore your data as it can give you a general idea of what your data looks like. A histogram is a summary of the variation in a measured variable. It shows the number of samples that occur in a category. A histogram is a type of frequency distribution.  

Histograms work by binning the entire range of values into a series of intervals and then counting how many values fall into each interval. While the intervals are often of equal size, they are not required to be.

In [None]:
# The ``inline`` flag will use the appropriate backend to make figures appear inline in the notebook.  
%matplotlib inline

import pandas as pd

# `plt` is an alias for the `matplotlib.pyplot` module
import matplotlib.pyplot as plt

### Load Data

The data we will use to demonstrate histograms is the House Sales in King County, USA dataset: https://www.kaggle.com/harlfoxem/housesalesprediction). 

In [None]:
df = pd.read_csv('data/kingCountyHouseData.csv')

df.head()

## Histograms using Pandas

The goal of this particular visualization is to make a histogram on the `price` column. In doing this, you will see creating a data visualization can be an iterative process. 

In [None]:
df['price'].head()

In [None]:
# Using the default settings is not a good idea
# Keep in mind that visualizations are an interative process.
df['price'].hist()

In [None]:
# One solution is to rotate your xticklabels
df['price'].hist()
plt.xticks(rotation = 90)

In [None]:
# if you want a quick solution to make the xticklabels readable,
# try changing the plot style 
plt.style.use('seaborn')

In [None]:
# Change the number of bins
# Seems better, but we still have empty space
df['price'].hist(bins = 30)

In [None]:
# visualizing a subset of the data
price_filter = df.loc[:, 'price'] <= 3000000
df.loc[price_filter, 'price'].hist(bins = 30)

In [None]:
# you can also change the edgecolor and linewidth
price_filter = df.loc[:, 'price'] <= 3000000

# you can also change the edgecolor and linewidth
df.loc[price_filter, 'price'].hist(bins = 30,
                                   edgecolor='black')