# [Pandas-Visualization](http://pandas.pydata.org/)

Pandas has a rich set of built-in visualizations that are based on [matplotlib](http://matplotlib.org/), an old but de factor standard for plotting and visualization in Python.

In addition to the built-in visualizations, almost all Python visualization packages, such as [seaborn](https://stanford.edu/~mwaskom/software/seaborn/) and [bokeh](http://bokeh.pydata.org/en/latest/), can work directly with Pandas data structures. 

In this notebook, I demonstrate some of the built-in visualization tools.

In [1]:
% matplotlib nbagg

In [2]:
import os
import sqlite3 as sqlite
import time
DATADIR = os.path.join("..", "Resources")

In [3]:
import pandas as pd

In [4]:
import numpy as np

In [5]:
elevationA = pd.read_table(os.path.join("../", "Resources","elevation2.txt"),
                           thousands=",",index_col='State')

In [6]:
elevationA

Unnamed: 0_level_0,Unnamed: 0,Rank,Highest elevation,Lowest elevation,Average elevation
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Colorado,0,1,14440,3315,6800
Wyoming,1,2,13804,3099,6700
Utah,2,3,13528,2000,6100
New Mexico,3,4,13161,2842,5700
Nevada,4,5,13140,479,5500
Idaho,5,6,12662,710,5000
Arizona,6,7,12633,70,4100
Montana,7,8,12799,1800,3400
Oregon,8,9,11239,0,3300
Hawaii,9,10,13796,0,3030


### Basic line plot

In [7]:
print (elevationA['Highest elevation'])

State
Colorado          14440
Wyoming           13804
Utah              13528
New Mexico        13161
Nevada            13140
Idaho             12662
Arizona           12633
Montana           12799
Oregon            11239
Hawaii            13796
California        14494
Nebraska           5424
South Dakota       7242
Kansas             4039
Alaska            20320
North Dakota       3506
Washington        14410
Texas              8749
West Virginia      4863
Oklahoma           4973
Minnesota          2301
Pennsylvania       3213
Iowa               1670
Wisconsin          1951
New Hampshire      6288
New York           5344
Vermont            4393
Virginia           5729
Tennessee          6643
Michigan           1979
Ohio               1549
Missouri           1772
Kentucky           4139
North Carolina     6684
Indiana            1257
Arkansas           2753
Maine              5276
Georgia            4784
Illinois           1235
Massachusetts      3487
Alabama            2407
Connecticu

In [8]:
elevationA["Average elevation"].plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x7f2b558de390>

### Bar plot

In [9]:
elevationA["Highest elevation"].plot(kind="bar")

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x7f2b535c26a0>

In [10]:
bp = pd.read_csv(os.path.join(DATADIR,"abp_all.csv"),
                 na_values=[0]).dropna()
bp

FileNotFoundError: File b'../Resources/abp_all.csv' does not exist

In [None]:
bp.hist(column=["VALUE1NUM","VALUE2NUM"], bins=50)

## We can create numpy arrays from text data
### We can generate a data frame from numpy arrays

In [None]:
systolic = np.genfromtxt(os.path.join(DATADIR,"systolic.txt"),
                         delimiter=",")
diastolic = np.genfromtxt(os.path.join(DATADIR,"diastolic.txt"),
                         delimiter=",")

In [None]:
blood_pressure = pd.DataFrame.from_dict({'systolic':systolic,'diastolic':diastolic})

In [None]:
blood_pressure = blood_pressure[blood_pressure != 0].dropna()

In [None]:
blood_pressure

### We can make histograms with ``plot()``

In [None]:
blood_pressure.plot(kind="hist", alpha=0.5)

In [None]:
blood_pressure.plot(kind="hist", bins=100, 
                    colors = ['red','black'],alpha=0.5)

### Kernel-Density Estimates

In [None]:
blood_pressure.plot(kind="kde")


### Box plots

In [None]:
blood_pressure.plot(kind="box")

### Scatter plots

In [None]:
blood_pressure.plot(kind='scatter',x='diastolic',y='systolic')

In [None]:
blood_pressure.plot(kind='hexbin',
                    x='diastolic',
                    y='systolic',
                   gridsize=25)

In [None]:
from pandas.tools.plotting import scatter_matrix

In [None]:
scatter_matrix(blood_pressure,diagonal='kde')