# Lab 03 - Numpy Intro

In [None]:
# If you don't have seaborn package, install it via conda or pip from command line

In [None]:
# Standard imports for data analysis packages in Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# This enables inline Plots
%matplotlib inline

# Limit rows displayed in notebook - This is just my setup. We will talk more about Pandas next week
pd.set_option('display.max_rows', 10)
pd.set_option('display.precision', 2)

In [None]:
print 'Pandas Version: ', pd.__version__
print 'Numpy Version: ', np.__version__

## Numpy

* Pandas is built on top of Numpy.  Each Column in a Pandas DataFrame is a Numpy Array
* Plotting functions (matplotlib) needs Numpy Arrays as input
* Scikit-Learn needs Numpy Array as Input for Features and Lables and building Models

Let's do a overview of Numpy Data Structures and Functions.

### Data Strcutres

* Arrays
* Matrices

#### Arrays

In [None]:
# One Dimensional Array
arr1 = np.array([1, 2, 3, 4, 5, 6])
print 'Shape of Array: ', arr1.shape
print 'Type of object: ', type(arr1)
print 'Type of contents: ', arr1.dtype
print 'Size (elements): ', arr1.size
print 'ndim (number of dim): ', arr1.ndim
print arr1

In [None]:
# Selecting elements from array (Slicing)
arr1[1]

In [None]:
# Selecting a range - notice that it starts at lower limit upto the upper limt (not including).  Same behavior as Python list
arr1[1:3]

#### Assigning Values

In [None]:
# You can assign values individually
arr1[0] = 10
arr1

In [None]:
# Or Assign a indivual value to a range
arr1[1:] = 20  # Start at 1st element till END
arr1

In [None]:
# Two Dimensional Array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print 'Shape of Array: ', arr2.shape
print 'Type of object: ', type(arr2)
print 'Type of contents: ', arr2.dtype
print 'Size (elements): ', arr2.size
print 'ndim (number of dim): ', arr2.ndim
print arr2

In [None]:
# Reshape One-D to Two-D Array
arr1.reshape(2, 3)

In [None]:
arr1.reshape((3, 2))

In [None]:
arr1

In [None]:
arr1.reshape((:,))

In [None]:
# When you add a scalar to an array, it adds it to each element in the array.
# This is called broadcasting
arr1 + 5

In [None]:
# Same thing with other operators on Scalar.  All operations broadcast to each element
arr1 * 5

In [None]:
# It works on 2-D Arrays too
arr2 * 10

In [None]:
# Now, lets add two Numpy Arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

arr1 + arr2

In [None]:
# Multiply Two Numpy Arrays
arr1 * arr2

Main Idea: ALL Numpy Operations are element-wise operations.

* Adding to Numpy Arrays, results in does element-wise addition
* Multiplying two Arrays, results in element-wise multiplication

#### Exercise 1:  Now do this on Python List.  Do you see any difference

* Add two Python lists
* Multiple two Python lists
* Add a scalar to a Python list
* Multiply a Scalar to Python list

In [None]:
seq1 = [1, 2, 3]
seq2 = [4, 5, 6]

In [None]:
# Add two Python lists
# your code here

In [None]:
# Multiply two Python lists
# your code here

In [None]:
# Add a scalar to a Python list
# Hint: seq1 + <number>

In [None]:
# Multiply a scalar to a Python list
# Hint: seq1 + <number>

In [None]:
# How about slicing, does Numpy Array work the same as List?

But when you try to add a scalar value to a list, you will get an error

Compare Numpy Array to Python List operations.  

* Do you see any difference in behavior in Operations
* How about slicing

You will see that Numpy's broadcasting is what makes everything else possible in data analysis with Python

#### Matrices

* Matrix is like an Array, except that all operations are Matrix operations

In [None]:
# All Matrix objects are multi-dimensional.  See the difference between this and 1-D Arrays in the top
mat1 = np.matrix([1, 2, 3])
print 'Matrix Shape: ', mat1.shape
print mat1

# 2-D Matrix
mat2 = np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print 'Matrix Shape: ', mat2.shape
print mat2

#### Matrices are like Arrays, but you can do Matrix Manipulations

In [None]:
# Addition is Element wise addition
mat1 + mat1

In [None]:
# Matrix Multipliction
mat1 * mat2

We will mostly be using Numpy Arrays.  Infact you can do matrix operations on Matrices too.  But, just wanted to let you know of the Matrix datatype in Numpy.

### Numpy Functions

In [None]:
# Create a Numpy array from 1-100
np.arange(1, 101)

In [None]:
# Python list from 1 to 100.  Notice the similarities
print range(1, 101)

In [None]:
# Enough of similarities.  Let's show some awesomeness
# Create 10 elements between 0 and 1 - linearly separated
np.linspace(0, 1, 10)

In [None]:
# Now, let's look at some distributions

In [None]:
# Normal Dist
# Mean of 1, std of 0.1, 10 elements
np.random.normal(1, 0.1, 10)

In [None]:
# Standard Normal Dist
np.random.randn(1, 10)

In [None]:
# Look at Other Distribitions - Gamma, binomial etc..
# np.random.

* Remember the imputation of Age we did for Titanic last week with the Mean Age Value?  Now that you know about Numpy distributions, you can impute the age for missing values with Normal Distribution centered on Mean Age and Std dev of Age.  How cool is that.

## Matplitlib

- [Matplotlib](http://matplotlib.org)
- [Matplotlib Gallery](http://matplotlib.org/gallery.html)
- [Matplotlib Examples](http://matplotlib.org/examples)

In [None]:
# Let's plot the random Normal Distribution

# Create a 100 Element Array with mean 1 and std 0.1
arr1 = np.random.normal(1, 0.1, 100)

In [None]:
fig, ax = plt.subplots(1, 1)
ax.hist(arr1)

In [None]:
# Add a semi-colan if you don't want to see the returned BIN Values in the result
fig, ax = plt.subplots(1, 1)
ax.hist(arr1, bins=20);

#### Exercise: Now, increase the Number of Elements to 10000 and does it change the shape of the distribution?

In [None]:
# Your Code here to draw a distribution of 10000 element Numpy Array