# Data Visualization

Data visualization is an important component of Machine Learning. This is a part of `Exploratory Data Analysis` and is useful for:
* Discovering patterns in data
* Discovering missing data
* Visualizing correlations

One common visualization package is Matplotlib

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.192, 4.263, 6.972]

#line plot
_ = plt.plot(year, pop)
plt.show()

As seen above, `plot()` will plot a line chart. It takes two numpy arrays or series data. The first is plotted on the x-axis, with the second plotted on y-axis.

`scatter()` will plot a scatter diagram with the same parameters

In [None]:
#scatter plot
_ = plt.scatter(year, pop)
plt.show()

`bar()` will plot a bar plot

In [None]:
#bar plot
_ = plt.bar(year, pop)
plt.show()

`hist()` will plot a histogram. This will take only one parameter, which is the array or series you would like to plot

In [None]:
# histogram
values = [0,0.6,1.4,1.6,2.2,2.5,2.6,3.2,3.5,3.9,4.2,6]
_ = plt.hist(values)
plt.show()

You can specify the number of `bins` in your histogram

In [None]:
# bin the histogram
_ = plt.hist(values, bins=5)
plt.show()

You can label your axes and provide a chart title

In [None]:
# label axes
_ = plt.plot(year, pop)
_ = plt.xlabel('year')
_ = plt.title('World Population')
_ = plt.ylabel('pop')
plt.show()

Fix the vertical numbering with `yticks`

In [None]:
_ = plt.plot(year, pop)
_ = plt.xlabel('year')
_ = plt.title('World Population')
_ = plt.ylabel('pop')
_ = plt.yticks([0, 2, 4, 6, 8, 10])
plt.show()

That population should have a suffix

In [None]:
_ = plt.plot(year, pop)
_ = plt.xlabel('year')
_ = plt.title('World Population')
_ = plt.ylabel('pop')
_ = plt.yticks([0, 2, 4, 6, 8, 10], ['0', '2B', '4B', '6B', '8B', '10B'])
plt.show()