# GA Data Science 19 (DAT19) - Class 5
## Developing Mastery of Pandas, Numpy & Bokeh
####  

Justin Breucop (with parts from Craig Sakuma)

## Lab goals

- NumPy: Entering the Matrix
- Pandas: DataFrames as Bamboo
- Bokeh: Picture-Perfect Visuals

##NumPy
As we've seen in lecture, linear algebra is the branch of mathematics describing navigation between different vector spaces. This core concept is very important as a big piece of data cleansing is converting data into various formats and certain algorithms require data to be in a specific shape.

NumPy is a package designed to be used in scientific computing, and specifically around building N-dimensional array objects.

###Creating an array

In [1]:
import numpy as np
a = np.arange(25).reshape(5,5)
# arange(n) is a function that creates a 1 row array of integers of length n 
# reshape(M,N) is a method converts a list to a matrix of size MxN
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

We can convert from lists to arrays. Note however unlike lists, elements of an array all have to be of the same datatype.

In [2]:
alist = [[ 0,  1,  2,  3,  4],[ 5,  6,  7,  8,  9],[10, 11, 12, 13, 14],[15, 16, 17, 18, 19],[20, 21, 22, 23, 24]]
type(alist)

list

In [3]:
np.array(alist)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [4]:
biga = a*10
biga

array([[  0,  10,  20,  30,  40],
       [ 50,  60,  70,  80,  90],
       [100, 110, 120, 130, 140],
       [150, 160, 170, 180, 190],
       [200, 210, 220, 230, 240]])

In [5]:
print biga.mean()
print biga.mean(0) #Average per column
print biga.mean(1) #average per row
type(biga.mean(1))

120.0
[ 100.  110.  120.  130.  140.]
[  20.   70.  120.  170.  220.]


numpy.ndarray

In [6]:
bigm = np.matrix(biga-20)
bigm

matrix([[-20, -10,   0,  10,  20],
        [ 30,  40,  50,  60,  70],
        [ 80,  90, 100, 110, 120],
        [130, 140, 150, 160, 170],
        [180, 190, 200, 210, 220]])

In [7]:
bigm * biga # equal to np.matrix(bigm) * np.matrix(biga)

matrix([[  5000,   5000,   5000,   5000,   5000],
        [ 30000,  32500,  35000,  37500,  40000],
        [ 55000,  60000,  65000,  70000,  75000],
        [ 80000,  87500,  95000, 102500, 110000],
        [105000, 115000, 125000, 135000, 145000]])

In [8]:
np.linalg.inv(biga-20)

array([[ -2.81474977e+13,  -1.52777778e-03,   5.62949953e+13,
         -2.22222222e-02,  -2.81474977e+13],
       [  3.51843721e+13,   2.25000000e-02,  -5.27765581e+13,
         -3.51843721e+13,   5.27765581e+13],
       [ -4.22212465e+13,   9.38249922e+13,  -7.97512434e+13,
          4.69124961e+13,  -1.87649984e+13],
       [  9.14793674e+13,  -1.87649984e+14,   9.26521798e+13,
          1.17281240e+13,  -8.20968682e+12],
       [ -5.62949953e+13,   9.38249922e+13,  -1.64193736e+13,
         -2.34562481e+13,   2.34562481e+12]])

In [9]:
print bigm,'\n', biga
bigm * biga

[[-20 -10   0  10  20]
 [ 30  40  50  60  70]
 [ 80  90 100 110 120]
 [130 140 150 160 170]
 [180 190 200 210 220]] 
[[  0  10  20  30  40]
 [ 50  60  70  80  90]
 [100 110 120 130 140]
 [150 160 170 180 190]
 [200 210 220 230 240]]


matrix([[  5000,   5000,   5000,   5000,   5000],
        [ 30000,  32500,  35000,  37500,  40000],
        [ 55000,  60000,  65000,  70000,  75000],
        [ 80000,  87500,  95000, 102500, 110000],
        [105000, 115000, 125000, 135000, 145000]])

####Slices

In [10]:
bigm = np.array(bigm)
bigm[0]

array([-20, -10,   0,  10,  20])

In [11]:
#Same thing, but demonstrating the full slice with a colon
bigm[0,:]
#biga

array([-20, -10,   0,  10,  20])

In [12]:
print biga
biga[:,3]

[[  0  10  20  30  40]
 [ 50  60  70  80  90]
 [100 110 120 130 140]
 [150 160 170 180 190]
 [200 210 220 230 240]]


array([ 30,  80, 130, 180, 230])

Slice rules work for even more complex dimensional data

In [13]:
compa = np.arange(30).reshape(5,3,2)
compa

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21],
        [22, 23]],

       [[24, 25],
        [26, 27],
        [28, 29]]])

In [14]:
# lets describe it
print compa.shape
print compa.ndim
print compa.dtype

(5, 3, 2)
3
int64


In [15]:
compa[3,:,1]

array([19, 21, 23])

In [16]:
compa[0,0,0]

0

In [17]:
compa[0,0,0] = 5.9
compa[0,0,0]

5

Numpy tries to resolve conflicting datatypes, sometimes to our dismay

In [18]:
compa = compa.astype(float)
compa[0,0,0] = 5.75
compa[0,0,0]
type(compa[1,1,1])

numpy.float64

####Random Numbers
Random numbers are very helpful and are necessary at times for testing data pipelines and running statistical analyses. Functions for creating random values are under numpy.random.

In [19]:
#Create a randomized array
# pick up 5*5 random numbers
rm = np.random.rand(5,5)
rm

array([[ 0.13351861,  0.55480941,  0.07942376,  0.56972252,  0.3267746 ],
       [ 0.69708476,  0.53650936,  0.79583082,  0.01526535,  0.81012134],
       [ 0.84529082,  0.38274455,  0.57474519,  0.81924727,  0.49087668],
       [ 0.81173757,  0.78744983,  0.28613107,  0.9953735 ,  0.4897338 ],
       [ 0.39528399,  0.41122346,  0.13808138,  0.76427907,  0.99712528]])

In [20]:
np.random.rand() #use shift + tab to see description of common

0.373317357853136

In [21]:
rm.shape

(5, 5)

In [22]:
np.random.normal(0,10,50) #(mean, standard division, array)

array([ 17.85358809,  -8.75587241,  11.43815005,   7.42940955,
         1.1597673 ,   1.06351058,  -5.70488944,   0.71933389,
         3.4758788 ,   6.43868367,   5.08922225,  -1.57733213,
       -12.76254188,  16.50844251,   8.15882513,  12.3626234 ,
         0.72009271, -11.84449629,  35.18217529,  -8.21622895,
        -3.2042497 ,  -8.23253297, -10.1235747 ,  11.96806863,
       -13.36898686,  -3.98777191,  -6.06837214,  17.65400078,
       -11.89193487, -22.09653353,   9.55702017,  28.70426807,
        -9.40055863,  -2.98238098,  -8.73609983,  -9.06529916,
        -1.5493297 ,  -9.5302563 ,  20.70740255, -14.37383222,
        11.82175684,  -6.88630116,  -1.13756423,  10.8214503 ,
        -4.17257428,   3.57189083,   6.5768641 ,   1.64582067,
       -24.03103756,  -2.93945158])

In [23]:
print rm.mean()
print rm.mean(0) #Average per column
print rm.mean(1) #average per row

0.548335360121
[ 0.57658315  0.53454732  0.37484245  0.63277754  0.62292634]
[ 0.33284978  0.57096233  0.6225809   0.67408515  0.54119864]


In [24]:
# for a different Normal Distribution, use np.random.normal
rm = np.random.normal(5,9,(30,30))
rm

array([[  1.28411646e+01,   5.99430183e+00,   9.70124939e+00,
          8.56542715e-01,   2.52737320e+00,  -2.11363699e+00,
         -2.48901406e+00,  -2.97522401e-01,   8.94977089e+00,
          9.07853023e+00,   1.36513871e+01,  -5.00674602e+00,
          5.79425981e+00,   1.90660971e+01,   2.91882937e+00,
         -1.19534644e+01,   1.28043660e+01,   1.36755905e+01,
          3.04664474e+00,   6.84526167e+00,  -9.05510609e+00,
          4.10788705e+00,   1.28321972e+01,   7.22753258e+00,
          6.82680638e+00,   3.53859527e+00,  -9.90696064e+00,
          9.46426692e+00,   1.22635305e+01,  -2.74516696e-01],
       [  1.48378432e+01,   1.79080187e+00,   1.03093001e+01,
         -8.72594539e+00,   1.63050797e+00,   6.95761242e+00,
          1.33947508e+01,  -9.87737010e+00,   2.40490447e+00,
          1.24841335e+01,  -5.99073865e+00,  -2.69938950e+00,
          2.97805796e+00,  -3.92100501e+00,  -3.47970973e+00,
          1.26367438e+01,   1.02434201e+01,  -3.02952092e+00,
       

In [25]:
print rm.mean(), "which is hopefully close to the input mean"
print rm.var(), "which variance = stdev squared"
print np.median(rm)

4.60047658255 which is hopefully close to the input mean
83.7055702693 which variance = stdev squared
4.4558416341


Find more distributions and random functions here: http://docs.scipy.org/doc/numpy/reference/routines.random.html

###Exercise 1
1) Create a 4x5 array of integers numbering 0 to 19.

In [26]:
np.arange(20).reshape(4,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

2) Create a 50x500 array with a mean of 20 and variance of 100. Save it to a variable called  `biggie`

In [27]:
biggie = np.random.normal(20,10,(50,500))
print biggie.shape
print biggie.mean()
print biggie.var()

(50, 500)
20.0348469652
100.213434382


3) Change the mean of the array to a value within 1 of 0 and the variance within 1 of 25. Think about what the mean and the variance represent and try using various mathematical operations.

In [28]:
morph = (biggie - 20)/2
print morph.mean()
print morph.var()

0.0174234825769
25.0533585955


# Pandas: DataFrames as Bamboo
You've already been exposed to dataframes in the previous labs so lets get into dataframes and how we can work with them.

In [29]:
import pandas as pd

data = pd.read_csv("../data/titanic.csv")
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S


In [30]:
data.describe() 

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [31]:
#data['Fare'] = 0
#data
#example, not make sense

In [32]:
data[data.Age>65] #math statement to set up a requirement

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
33,34,0,2,"Wheadon, Mr. Edward H",male,66.0,0,0,C.A. 24579,10.5,,S
96,97,0,1,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C
116,117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
493,494,0,1,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C
630,631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
672,673,0,2,"Mitchell, Mr. Henry Michael",male,70.0,0,0,C.A. 24580,10.5,,S
745,746,0,1,"Crosby, Capt. Edward Gifford",male,70.0,1,1,WE/P 5735,71.0,B22,S
851,852,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.775,,S


In [33]:
data[data.Age<65] #case sensitive, has to spell and print right

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.0500,,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7000,G6,S


In [34]:
data[(data.Age==11)&(data.SibSp==5)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
59,60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S


In [35]:
data[['Name','Age']][data.Age>65] #same thing as data[data.Age>65]['Name'] 
#double [] creat a list
#equal to colums=['Name', 'Age']
#data[data.age>65][Column]

Unnamed: 0,Name,Age
33,"Wheadon, Mr. Edward H",66.0
96,"Goldschmidt, Mr. George B",71.0
116,"Connors, Mr. Patrick",70.5
493,"Artagaveytia, Mr. Ramon",71.0
630,"Barkworth, Mr. Algernon Henry Wilson",80.0
672,"Mitchell, Mr. Henry Michael",70.0
745,"Crosby, Capt. Edward Gifford",70.0
851,"Svensson, Mr. Johan",74.0


In [36]:
data[(data.Age==11)|(data.SibSp==5)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
59,60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S
71,72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S
386,387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S
480,481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S
542,543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S
683,684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S
731,732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C
802,803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120.0,B96 B98,S


In [37]:
data.values #modified version of array
data.columns

Index([u'PassengerId', u'Survived', u'Pclass', u'Name', u'Sex', u'Age',
       u'SibSp', u'Parch', u'Ticket', u'Fare', u'Cabin', u'Embarked'],
      dtype='object')

###Cleaning Data

In [38]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 90.5+ KB


####Working with nulls
Exclude data

In [39]:
# data[data.Age.isnull()]
data[data.Age.notnull()]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.0500,,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7000,G6,S


In [40]:
# You can also just replace the nulls
data.Age[data.Age.isnull()].fillna(0)

5      0
17     0
19     0
26     0
28     0
29     0
31     0
32     0
36     0
42     0
45     0
46     0
47     0
48     0
55     0
64     0
65     0
76     0
77     0
82     0
87     0
95     0
101    0
107    0
109    0
121    0
126    0
128    0
140    0
154    0
      ..
718    0
727    0
732    0
738    0
739    0
740    0
760    0
766    0
768    0
773    0
776    0
778    0
783    0
790    0
792    0
793    0
815    0
825    0
826    0
828    0
832    0
837    0
839    0
846    0
849    0
859    0
863    0
868    0
878    0
888    0
Name: Age, dtype: float64

In [41]:
#Replace with the mean to preserve statistical values
avg_age = data.Age[data.Age.notnull()].mean()
print avg_age
data.age.fillna(avg_age)

29.6991176471


AttributeError: 'DataFrame' object has no attribute 'age'

####Replace with random normal distribution

In [None]:
# Get values of mean and standard deviation
data.Age[data.Age.notnull()].describe()

In [None]:
# Replace null values with 
data.Age.fillna(np.random.normal(29.7,14.5),inplace=True)

In [None]:
data.Age.fillna(np.random.normal(29.7,14.5)).describe()

###Convert categorical data to numerical

In [None]:
data.Sex=='female'

In [None]:
data.rename(columns={'Sex':'Is Female'},inplace=True)
data['Is Female']=data['Is Female']=='female'
data.head()

In [None]:
# get unique values of Embarked
data.Embarked.unique()

In [None]:
# replace values with numbers
data.Embarked.replace(['S', 'C', 'Q'],[1,2,3],inplace=True)
data.head()

###Selecting with .loc, .iloc, & .ix

Selecting data in pandas can be tricky. The main takeaway is that .loc looks for index labels, .iloc looks for the integer index position, and .ix can be a mix. 

In [None]:
df = pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD'))
df

In [None]:
df.loc['f']

In [None]:
df.iloc[len(df.index)-1]

In [None]:
df.A.ix['f'] == df.A.ix[-1]

In [None]:
cc = list('cookies')
cc[-4]

###Group by

In [None]:
# Find average age of passengers that survived vs. died
data.groupby(['Pclass','Survived'])['Age'].mean()
#grouping entire dataset into 2 values
#for sort:
#titanic_status= ...
#Titanic_status.sort()
#titanic _status

In [None]:
# Count number of female passengers
data.groupby('Is Female')['PassengerId'].count()

In [None]:
data.groupby(['Survived','Pclass'])['PassengerId'].count()

###Apply

In [None]:
# Convert ticket prices to USD
data.Fare.apply(lambda x: x*1.6)
data

In [None]:
data.Name

In [None]:
data.Name.apply(lambda x: x.split(",")[0])

###Concatenate

In [None]:
data_first_half = data.iloc[0:10,:]
data_first_half.info()

In [None]:
data_second_half = data.iloc[10:,:]

remake_data = pd.concat([data_first_half,data_second_half])
remake_data.info()

###EXERCISE 2
1) Replace Pclass numbers with 'First Class', 'Second Class', 'Third Class'

In [None]:
data.Pclass.replace([1,2,3],['First Class', 'Second Class', 'Third Class'],inplace=True)
data.head()

2) What was the average ticket price for survivors vs. dead passengers?

In [None]:
data.groupby(['Survived'])['Fare'].mean()

###Bonus!!!
Round all ages to the nearest year using `apply`

##Bokeh: Picture Perfect Visuals

To install Bokeh, go to a terminal and type:

`conda install bokeh` 

Bokeh is built by the same people that created Anaconda (Continuum Analytics) and is designed out of the box for web display, making it nice for creating presentation ready, interactive visuals quickly. Labs in this course will be shown in Bokeh. Checkout http://bokeh.pydata.org/en/latest/docs/quickstart.html#concepts to see some of the range of capabilities.

In [None]:
from bokeh.plotting import figure, output_notebook,show,vplot
output_notebook()

In [None]:
import pandas.io.data
import datetime
fb = pd.io.data.get_data_yahoo('FB', 
                                 start=datetime.datetime(2015, 4, 1), 
                                 end=datetime.datetime(2015, 4, 28))


In [None]:
y.mean()

In [None]:
x*y.mean()/x.mean()

In [None]:
# prepare some data
x = fb.Low
y = fb.High

# create a new plot with a title and axis labels
p = figure(title="Stock High vs. Low", x_axis_label='Low', y_axis_label='High')

# These are glyphs
p.circle(x, y,size=90,alpha=0.5,)
p.line(x,x*y.mean()/x.mean())

# show the results
show(p)

In [None]:
fb.Low

At its core, Bokeh is built up with Plots and Glyphs. Plots are created with the figure keyword and then glyphs are visuals that are added to the visualization. The visuals are scalable, interactive and savable. You can even create vectorized colors.

In [None]:
# prepare some data
N = 4000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = ["#%02x%02x%02x" % (r, g, 150) for r, g in zip(np.floor(50+2*x), np.floor(30+2*y))]

TOOLS="resize,crosshair,pan,wheel_zoom,box_zoom,reset,box_select,lasso_select"

# create a new plot with the tools above, and explicit ranges
p = figure(tools=TOOLS, x_range=(0,100), y_range=(0,100))

# add a circle renderer with vecorized colors and sizes
p.circle(x,y, radius=radii, fill_color=colors, fill_alpha=0.6, line_color=None)

# show the results
show(p)

In [None]:
p1 = figure(title="Titanic Ages Dead",x_axis_label = 'Age',y_axis_label = 'Count')
#construct the histogram
hist, edges = np.histogram(data.Age[data.Survived==0].values, density=True, bins=50)
#Construct your x axis
x = np.linspace(data.Age.min(),data.Age.max(),100)
#add the bars, scaling the value to the full count of people
p1.quad(top=hist*len(data.Age), bottom=0, left=edges[:-1], right=edges[1:],line_color='black')

p2 = figure(title="Titanic Ages Survived",x_axis_label = 'Age',y_axis_label = 'Count')

hist, edges = np.histogram(data.Age[data.Survived==1].values, density=True, bins=50)
x = np.linspace(data.Age.min(),data.Age.max(),100)
p2.quad(top=hist*len(data.Age), bottom=0, left=edges[:-1], right=edges[1:],line_color='black')

dummy_line = range(0,len(x)=1)
p2.line(x, dummy_line)

show(vplot(p1,p2))

In [None]:
%matplotlib inline
data.Age.hist()