### Introduction to NumPy (Numerical Python)
- We can think of all data as arrays of numbers
    - images: 2d array of numbers representing pixel brightness
    - sound clips: 1d arrays of intensity v. time
- The first step to making data "analyzable" is to transform it into arrays of numbers
- NumPy arrays from the core of almost all data science tools in Python


### Understanding Data Types in Python
- How can we store an manipulate data?
- **Note:** in Python, data types are *dynamically inferred*, which is different from Java where you must declare all variable types explicitly
    - Python variables have "type-flexibility", which makes them more than just their value

What is actually happening when I type:


In [1]:
x = 100

- x becomes a pointer to a *compound C structure* with these components
    - ob_refcnt: reference count for handling memory
    - ob_type: encodes type of variable
    - ob_size: size of the data members
    - ob_digit: contains actual integer value the varaible should represent

**Meaning there is a lot of overhead when you just declare an int!**

- A Python integer is more like an object than a primitive type- pointer to position in memory with all object's data
- Consider a list with LOTS of data in it.
    - In Python, the elements don't have to be the same type because each variable keeps track of lots of info to make it dynamic
    - If all of the elements in our list are going to be the same, that's a lot of overhang. We should use something more efficient
        - **NumPy Array** is a fixed type array 

### Fixed-Type Arrays in Python
#### Creating Arrays from Python Lists

In [2]:
import numpy as np

#create an array
#remember all types within array must be the same
np.array([2,4,6,8])

array([2, 4, 6, 8])

In [25]:
#What happens if we try different values?
np.array(['hi', 4, True])

array(['hi', '4', 'True'], 
      dtype='<U4')

In [26]:
#NumPy will try to cast up if possible! Above, each element was cast as a string.
#This should cast the ints to floats
np.array([4.8, 9, 7.2, 3, 5, 6])

array([ 4.8,  9. ,  7.2,  3. ,  5. ,  6. ])

In [5]:
#You can also explicitly choose the data type
#Note: I'm casting floats to ints. They will lose precision
np.array([4.8, 9, 7.2, 3, 5, 6], dtype= 'int')

array([4, 9, 7, 3, 5, 6])

In [6]:
#You can make numpy arrays multidimensional
#Inner lists are treated as rows of the 2d array
np.array([[1,2,3], [4,5,6], [7,8,9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

#### Creating Arrays from Scratch
- np.zeros() creates an array of zeros
- np.ones() creates an array of ones
- np.full( ,n) creates an array of n
- np.arange(start, finish, step_size) creates an array filled with a linear sequence starting at start and printing every step_size number until reaching finish
- np.linspace(start, finish, n) creates an array of n values evenly spaced from start to finish

etc.

The parameters for each of these are ((n,m), dtype= )

Creates an nxm matrix of the specified dtype

In [7]:
#Examples
np.zeros((5,2), int)

array([[0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0]])

In [8]:
np.ones(6, float)

array([ 1.,  1.,  1.,  1.,  1.,  1.])

In [13]:
np.full((5,5), 13)

array([[13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13]])

In [19]:
np.linspace(1,16,4, dtype=int)

array([ 1,  6, 11, 16])

### DataCamp
- Lists are great, but can't do operations over entire collections of data!
    - This is essential to data science
- NumPy arrays can perform calculations fast over entire lists
    - Numpy arrays can only hold ONE type of variable


In [21]:
#Example 1: Calculating BMI

height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])

#the numpy array performs the following calulations on elements in matching indexes in the two arrays
bmi = weight/(height**2)
print(bmi)

[ 21.85171573  20.97505669  21.75028214  24.7473475   21.44127836]


In [33]:
#We can still index np.arrays like a list
print(bmi[3])

#Now though, we can use booleans to filter a return from a list!
#The following returns a list of booleans. True means the element is greater than 21
print(bmi>21)

#You can use this inequality as an index to get the values of the array instead of the booleans
#Use a result of a comparison to make a selection from your data!
print(bmi[bmi>21])
print(bmi[bmi>23])

24.7473474987
[ True False  True  True  True]
[ 21.85171573  21.75028214  24.7473475   21.44127836]
[ 24.7473475]


In [None]:
#Example 2: 