---
# Crash Course Python for Data Science - Intro to Python  
---
# 06 - Numpy
---



## What is Numpy, anyway?

Numpy, short for Numerical Python, is a staple in every data scientist's tool kit. It's a powerful Python library that comes in handy when doing complex calculations on lots of data.

## Why am I learning it?
A couple reasons:  

1.   **It's fast** - Numpy is designed to work incredibly fast. Made possible by an object we mentioned briefly at the end of the last module: an *array*. Think of it as a more powerful Python list.
2.   **Python lists are limited** - Python can't handle certain operations on lists without requiring you to write some annoyingly complex functions. As we've already learned, however, that's where libraries come in! Numpy is a perfect example. 


In [None]:
# Let's illustrate with an example. 
# We have a list of weights in kgs, but we want to turn them into lbs.
# In order to do so, we need to multiply the values by 2.205.

kgs = [70, 22.2, 59, 86, 68.8]
constant = 2.205

kgs * constant

In [None]:
# First, we install 

In [None]:
# Now let's see what Numpy can do!
import numpy as np

np_kgs = np.array([70, 22.2, 59, 86, 68.8])

np_kgs * constant

### That's not all. What if we want to do calculations with multiple lists?
Let's try and calculate the basal metabolic rate (BMR) of five men. Here's the formula:  

88.362 + (13.397 x Weight in kg) + (4.799 x Height in cm) - (5.677 x Age in years)

In [None]:
# Let's create lists for weight, height, and age:

weights = [70.2, 85.6, 75.9, 79.0, 82.7]
heights = [170.9, 188.5, 181.2, 179.9, 174.3]
ages = [23, 35, 31, 28, 43]


# You'll notice that if we use python lists the following code will throw an error
88.362 + (13.397 * weights) + (4.799 * heights) - (5.677 * ages)

In [None]:
# Numpy has a nifty method to convert regular Python lists into Numpy arrays:
np_weights = np.asarray(weights)
np_heights = np.asarray(heights)
np_ages = np.asarray(ages)

88.362 + (13.397 * np_weights) + (4.799 * np_heights) - (5.677 * np_ages)

### To Recap:
Numpy is designed for mathematical operations that Python normally can't handle without special functions. Moreover, it's super-duper fast!


## Different type. Different behavior.
Numpy is great, but it's not perfect. Think of Numpy arrays as a new type of object with their own methods and behavior.  

In [None]:
# For example, check out these two operations:
ages + ages

In [None]:
np_ages + np_ages

In the first example, using regular lists, they are concatenated (joined) together.  

In the second example, Python performs an element-wise addition.

## Subsetting & Numpy Arrays
How can we retrieve certain values from our arrays?

Using the index is the same for lists and arrays:

In [None]:
ages[1]

In [None]:
np_ages[1]

But arrays also accept what's known as **boolean indexing**.
Remember booleans? Booleans are a binary data type (as in, True or False). 

In [None]:
# Let's declare an array to hold our BMR values:
bmr = 88.362 + (13.397 * np_weights) + (4.799 * np_heights) - (5.677 * np_ages)
bmr

In [None]:
# Tells us which BMRs meet the condition "Greater than 1900"
# by returning an array with boolean values
bmr > 1900

In [None]:
condition = bmr > 1850
print(condition)

In [None]:
bmr[condition]

In [None]:
# Similar, but the square brackets translate to this: "In bmr, return the element(s) that
# are greater than 1900."
bmr[bmr > 1850]

## Dimensionality

Nested lists are helpful for organizing our values in the form of a grid. Using grid-like forms to hold our data can be extremely valuable. Think about spreadsheets that have rows and columns, if we organized our data in a similar way then we could do operations with python that are very similar to what we could do with spreadsheets or even matrices.

In [None]:
## What are the dimensions of this list?
heights

There's only one dimension, or 1D for short. 

In [None]:
## How about this nested list?

x = [
    [4, 8, 16],
    [5, 10, 20]
]

x

This is a list inside of a list, or a nested list (or even a list of lists). That's two dimensions, or 2D for short.

In [None]:
np_x = np.asarray(x)
np_x

In [None]:
type(np_x)

"ndarray" refers to N-dimensional array.

In [None]:
np_x.shape

Two rows, three columns!

Wait...so is **shape** a method?

No! Shape is an _attribute_. Think of attributes as features of a data structure. And remember that methods have () at the end.

*  np.asarray() is a numpy method that turns a regulary python list into a numpy array
*  np.shape is  an ndarray attribute that returns the dimensions of said ndarray

## Subsetting, continued

In [None]:
# Numpy also has really awesome random number generators:
np.random.randint(5, size=(2, 4))

# Every time you run this cell, it'll output different numbers.

# Confused about what that was? Remember the built-in help() function!

In [None]:
# Let's create a random (3,6) array and assign it to a variable to keep its values.

# I'm going to set a "random seed" so that I get the *same* random numbers every time.
np.random.seed(42)

y = np.random.randint(5, size=(3, 6))

In [None]:
y

So we have three rows and six columns. How might we subset a specific value now?

Let's say we want the value in the 3rd row and the 2nd column (value is 1).

In [None]:
y[2,1]

Remember that Python is 0-indexed! (Since it starts at 0, the 3rd element is at index position 2.)

In [None]:
# Alternatively, we could also do...
y[2][1]

### Slicing, revisited!

In [None]:
# Ok, let's say we want only the first two columns of each row:
y[:,0:2]

Remember that slicing ranges have an inclusive start and an exclusive end. Meaning the start value is included in the slicing, but the end value is not.  

At the same time, a colon will include the entire range of that row/column.

In [None]:
# Now let's just get the third row:
y[2,:]

## Summary Statistics
A numerical library wouldn't be much good if it couldn't give us some basic stats about our data, now would it?

In [None]:
# Calculate the average of an array

np.mean(bmr)

In [None]:
# Median
np.median(bmr)

In [None]:
# Standard Deviation
np.std(bmr)

### Remember to put it all together in practice by completing the exercise!