# Connect Intensive - Machine Learning Nanodegree

## Week 1. Python Crash Course  

## Objectives    

- Jupyter notebook 
- Basic Python programming  
- Numpy
- Pandas 
- Data visualization with Matplotlib and Seaborn 

## Prerequisites   

 - You should have **Python 2.7** installed (if not, please [download and install Python 2.7](https://www.python.org/downloads/))
 - You should also install (and perhaps upgrade) the following packages, if you haven't already:
    - [numpy](http://www.numpy.org/)
    - [pandas](http://pandas.pydata.org/)
    - [matplotlib](http://matplotlib.org/)  
    - [seaborn](http://seaborn.pydata.org)  

---

## 2 | Numpy

Numpy is a Linear Algebra Library for Python. Almost all of the libraries in the PyData Ecosystem rely on Numpy as one of their main building blocks. Numpy is also incredibly fast, as it has bindings to C libraries. 

Some basic topics we cover here include:
- Create numpy array  
- Built-in methods in numpy array
- Array indexing / selection / slicing
- Broadcasting 
- Array operations 

> **Aditional reference:** 
> Check out [here](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists) for a post about why use array instead of list, and check out [here](https://www.dataquest.io/blog/numpy-cheat-sheet/) for a Numpy cheat sheet. 

In [5]:
# import numpy as a library
import numpy as np

### Create Numpy Array from Python List

In [6]:
my_list = [1, 2, 3]
my_list

[1, 2, 3]

In [7]:
# create numpy array from list
np.array(my_list)

array([1, 2, 3])

In [8]:
my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
my_matrix

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [9]:
np.array(my_matrix)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### Built-in Methods

**arange**

In [10]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:
np.arange(0, 10, 3) # a step size of 3

array([0, 3, 6, 9])

**linspace**   

In [11]:
# np.linspace(start, stop, num)
np.linspace(0, 10, 3) # generaet 3 numbers

array([  0.,   5.,  10.])

In [12]:
np.linspace(0, 100, 3) # default num is 50

array([   0.,   50.,  100.])

**zeros and ones**

In [13]:
np.zeros(5)

array([ 0.,  0.,  0.,  0.,  0.])

In [14]:
np.zeros((5, 5))

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [15]:
np.ones(5)

array([ 1.,  1.,  1.,  1.,  1.])

In [16]:
np.ones((5, 5))

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

**random**

In [17]:
# Create an array of the given shape and populate it with random samples from a 
# uniform distribution over 0 (inclusive) and 1 (exclusive)
np.random.rand(2)

array([ 0.78045813,  0.14108678])

In [18]:
# Return an array of the given shape and populate with samples from 
# standard normal distribution.
np.random.randn(3)

array([ 2.05988145,  1.25918596, -0.04299633])

In [19]:
# Return random integers from give range of low (inclusive) to high (exclusive).
np.random.randint(1, 10, 3) # specify numbers with np.random.randint(low, high, size)

array([9, 7, 7])

**dtype** 

In [20]:
arr = np.random.randint(1, 10, 5)
arr.dtype

dtype('int64')

**max, min, argmax, argmin**

In [21]:
arr = np.array([10, 2, 3, 6, 7])
arr

array([10,  2,  3,  6,  7])

In [30]:
arr.max()

10

In [31]:
arr.argmax()

0

In [32]:
arr.min()

2

In [33]:
arr.argmin()

1

**shape, reshape**

In [22]:
arr = np.random.randn(4, 4)
arr

array([[ 2.69202834, -0.23120011,  0.17247382, -0.49201637],
       [ 0.5270961 ,  0.11492956, -1.17766137,  0.77317362],
       [ 0.51537375, -0.79692671,  0.5777161 , -0.87193397],
       [-0.69034726,  0.88556005,  0.10482444, -0.39104396]])

In [23]:
arr.shape

(4, 4)

In [25]:
arr.reshape(1, 16)

array([[ 2.69202834, -0.23120011,  0.17247382, -0.49201637,  0.5270961 ,
         0.11492956, -1.17766137,  0.77317362,  0.51537375, -0.79692671,
         0.5777161 , -0.87193397, -0.69034726,  0.88556005,  0.10482444,
        -0.39104396]])

In [27]:
# use -1 and the value is inferred from the length of the array and remaining dimensions
arr.reshape(2, -1)

array([[ 2.69202834, -0.23120011,  0.17247382, -0.49201637,  0.5270961 ,
         0.11492956, -1.17766137,  0.77317362],
       [ 0.51537375, -0.79692671,  0.5777161 , -0.87193397, -0.69034726,
         0.88556005,  0.10482444, -0.39104396]])

In [None]:
# change doesn't happen inplace in original array
arr.shape

### Array Indexing / Slicing / Selection

In [28]:
arr = np.arange(0, 11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [29]:
# Get a value at an index
arr[8]

8

In [30]:
# Get values in a range
arr[1: 5]

array([1, 2, 3, 4])

In [31]:
# Select elements by a condition
arr[arr > 5]

array([ 6,  7,  8,  9, 10])

In [32]:
arr_2d = np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9]))
arr_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [33]:
# Indexing row
arr_2d[1]

array([4, 5, 6])

In [34]:
# Get individual element value [row][col]
arr_2d[1][2]

6

In [35]:
# Get individual element value [row, col]
arr_2d[1, 2]

6

In [36]:
# Get slice of 2d array
arr_2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [37]:
arr_2d[2, :]

array([7, 8, 9])

### Broadcasting

In [50]:
arr = np.arange(0, 10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]:
# broadcast (set a value with an index range)
arr[0:5] = 99
arr

array([99, 99, 99, 99, 99,  5,  6,  7,  8,  9])

In [52]:
# Reset array
arr = np.arange(0,11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [53]:
# create slice of array and view
arr_slice = arr[0:5]
arr_slice

array([0, 1, 2, 3, 4])

In [54]:
# broadcast
arr_slice[:] = 99
arr_slice

array([99, 99, 99, 99, 99])

In [55]:
# ORIGINAL ARRAY
arr

array([99, 99, 99, 99, 99,  5,  6,  7,  8,  9, 10])

**Changes also occur in our original array!** This is because data is not copied to the sliced array, the sliced array is just a view of the original array to avoid memory problems. 

In [56]:
# Make a copy of an array
arr_2 = arr.copy()
arr_2

array([99, 99, 99, 99, 99,  5,  6,  7,  8,  9, 10])

### Array Operators

In [38]:
arr = np.arange(0, 11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [39]:
arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [40]:
arr * arr

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

In [41]:
arr / arr # expect a warning

  """Entry point for launching an IPython kernel.


array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [42]:
1 / arr # expect a warning

  """Entry point for launching an IPython kernel.


array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [43]:
arr * 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [44]:
arr**2

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

In [45]:
np.sqrt(arr) # square root

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ,
        3.16227766])

In [46]:
np.exp(arr) # e^

array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03,   2.20264658e+04])

In [47]:
np.log(arr) # natural log, expect a warning

  """Entry point for launching an IPython kernel.


array([       -inf,  0.        ,  0.69314718,  1.09861229,  1.38629436,
        1.60943791,  1.79175947,  1.94591015,  2.07944154,  2.19722458,
        2.30258509])

In [48]:
np.max(arr) # same as arr.max()

10