## Numpy lab

Numpy is one of the main libraries for performing scientific computing in Python. Using Numpy, you can create high-performance multi-dimensional arrays, and several tools to work with these arrays. 

A numpy array can store a grid of values. All the values must be of the same type. numpy arrays are n-dimensional, and the number of dimensions is denoted the *rank* of the numpy array. The shape of an array is a tuple of integers which hold the size of the array along each of the dimensions.

For more information on numpy, we refer to http://www.numpy.org/.


To work with numpy arrays, you should import `numpy` first. The naming convention for `numpy` is to import it as `np`.

## Numpy array creation and basic operations

In [2]:
import numpy as np

Now, create your first numpy array. Often, one would create a list and then convert it to a numpy array. We will create a first numpy array containing the population of the 5 biggest american cities (in millions). The numbers are 8.6, 3.9, 2.7, 2.1 and 1.6 respectively. Store it in `city_pop`.

For your information, these cities are NYC, LA, Chicago, Houston and Philadelphia respectively.

In [18]:
city_pop = [8.6, 3.9, 2.7, 2.1, 1.6]

Look at the type of `city_pop` and confirm that it is a list.

In [19]:
type(city_pop)

list

now, convert `city_pop` to a numpy array and store it in `city_pop_np`

In [20]:
city_pop_np = np.array(city_pop)

confirm the data type again

In [36]:
type(city_pop_np)

numpy.ndarray

Now we want to make the numpy array in a way that it reflects the population to the actual unit. You can easily do this by multiplying your entire array with 1000000. 

In [37]:
city_pop_m_np= city_size_np*1000000
city_pop_m_np

array([8600000., 3900000., 2700000., 2100000., 1600000.])

This was easy! Now let's look at another numpy array which stores the surface area of these five cities (in square miles). 

In [38]:
cities_size_np= np.array([468, 503, 234, 627, 142])
cities_size

array([468, 503, 234, 627, 142])

We're interested in finding the population density for the four cities. This is one of the perks of numpy arrays: you simply divide the two arrays by each other!

In [40]:
city_density_np = city_pop_m_np/cities_size
city_density_np

array([18376.06837607,  7753.47912525, 11538.46153846,  3349.28229665,
       11267.6056338 ])

It is no surprise that NYC is very densely populated where the population density of Houston is much lower.

## Numpy array subsetting

Imagine you want to select certain elements from a numpy array. This can easily be done using square brackets. Now, go ahead and select the fourth element in the `city_density_np` array.

In [48]:
density_houston_np = city_density_np[3]
density_houston_np

3349.282296650718

Similarly, you can use colons to select a range of items. Use a colon to select the first three densities.

In [52]:
first_three_np = city_density_np[0:3]
first_three_np 

array([18376.06837607,  7753.47912525, 11538.46153846])

Next, we want to select the densities that are higher than 10000 and store them in the object `high_density_np`

In [47]:
high = city_density_np > 10000
high_density_np = city_density_np[high]
high_density_np

array([18376.06837607, 11538.46153846, 11267.6056338 ])

great!

## Higher dimensional numpy arrays

`city_pop_m_np`, `city_size_np`, and `city_density_np` are one-dimensional numpy arrays. This is pretty straightforward, but can be verified by using the attribute `.shape`. 

In [56]:
city_pop_m_np.shape

(5,)

This indicates that we have a 1-dimensional numpy array with 5 elements. Now, let's group our three arrays together (population, then size, then density) and see what happens if we call `.shape` again.

In [61]:
cities_np = np.array([city_pop_m_np, city_size_np, city_density_np])

In [62]:
cities_np.shape

(3, 5)

This numpy array has 3 rows and 5 columns!

## Subsetting in higher dimensional numpy arrays

In higher dimensional numpy arrays, you can use subsetting as well. However, simply using square brackets with one entry will simply select rows. Let's see how it works.

In [64]:
first_row = cities_np[0]
first_row

array([8600000., 3900000., 2700000., 2100000., 1600000.])

# Sources

http://cs231n.github.io/python-numpy-tutorial/#numpy

http://www.numpy.org/

