# Numpy



In [0]:
# import the library to get started
import numpy as np

Numpy is the workhorse underneath a number of our tools that we will be using.  At the core, numpy arrays look a lot like `lists` that we worked through last week, except that they have additional properties:

1.  numpy arrays can be multi-dimensional (think: Multiple columns in a spreadsheet)

2.  numpy arrays allow us to do calculations on each element on the list at the same time (we can't do this with lists).  This is **element wise** calculations.

3.  unlike lists, numpy arrays are (or will be converted) to a single **type**

>  ***From above, you see that I imported numpy as np.  This is the standard convention.***

# Lists to Numpy arrays

In [0]:
# create a normal list
my_list = [1,2,4,5,6,7]

## print out the type and the values
print(type(my_list))
print(my_list)

In [0]:
# create a new list and perform some math on it
new_list = [1,2,3]
new_list

In [0]:
# multiple the list by 3
new_list * 3

Above shows that just like with `+`, we can repeat the list with `*`

In [0]:
# try this again
multiplier = [3]
new_list * multiplier

In [0]:
# We see that above throws an error telling us we can do this.

In [0]:
# numpy to the rescue
np_list = np.array(new_list)
np_list

In [0]:
# we converted our list to an numpy array. now lets try it
np_list * 3

While lists and tuples are powerful in base python, and we will use them later in the course, when it really comes to analyzing data, the foundations of the tools that we will use are in numpy.

Simply, the higher level tools use numpy under the hood, but it's important for us to have a baseline understanding of numpy before we dive into pandas.

# Numpy Arrays - Single Type Only

Because numpy arrays are built to do calculations, helping us with analyzing data, each array must be of a single type. 

In [0]:
# create a normal list
normal_list = [1, 'a', True]
normal_list

This does what we expected, it printed a list, and if we wanted, we could even have a list within a list, as we learned last week.

In [0]:
# convert the list to numpy, as we saw above with np.array(<list>)
np_convert = np.array(normal_list)
np_convert

## Numpy Conversion Discussion

Look at what we have above.  Numpy needs to have all elements in an array be of the same type.  But it didn't error out.



> ***Discussion:  What did numpy do to the values?***

# Core numpy features

## Creating NP arrays

In [0]:
# we have already seen numpy array
np.array( [1,2,4] )


In [0]:
# let's try a tuple
np.array(1,2,3)

In [0]:
# we see that numpy yells at the tuple, but confirm we attempted a tuple
tup = 1,2,3
type(tup)

> Takeaway, numpy wants lists.



---



## nd arrays

In [0]:
# remember a list of lists?
lol = [ [1,2], [3,4], [5,6] ]
lol

In [0]:
# bring this into numpy
lol_np = np.array(lol)
lol_np

In [0]:
# what shape is our array
lol_np.shape

> Do you remember last week when we saw this,, it's not exactly this, but for now, we can think of this as a 3 row / 2 column Excel worksheet



In [0]:
# how many dimensions within the array
lol_np.ndim

In [0]:
# how many elements total, or the size
lol_np.size

In [0]:
# remember, we can always get help
lol_np?



---





## Slicing

In [0]:
# lets slice the array - include everything
lol_np[:]

In [0]:
# get just the first array
lol_np[0]

Multidimenstional arrays can be sliced using a comma separated set of tuples

In [0]:
# return just 2 
lol_np[:1, 1]

lets look at a larger nd array

In [0]:
# generate a larger array
np.random.seed(0) # for reproducing exmaple
x2 = np.random.randint(10, 20, size=(3, 4))
x2

In [0]:
# rows/columns
x2.shape

In [0]:
# all "rows" and first "column"
x2[:, 0]

In [0]:
# 2 rows, 3 cols
x2[:2, :3]

In [0]:
# just get one array
x2[0]

In [0]:
# want to reverse?
x2[::-1, ::-1]



---


# Useful methods

In [0]:
# generate an array of numbers with bounds and spacing
x = np.arange(2, 10, 2)
x

In [0]:
# do some math and finding locations
print("the mean of the array is {}".format(str( x.mean() )))   # x.mean()
print("the largest value in the array is {}".format(str( x.max() )))   # x.max()
print("the index for the largest value int he array is {}".format(str( x.argmax() )))   # x.argmax()


In [0]:
# we can generate an array of booleans for subetting (helpful with pandas, which we start tonight!)
x > 5

In [0]:
# we can also do complex logic - logical and
np.logical_and(x > 0, x <5)

In [0]:
# logical or
np.logical_or(x > 5, x <7)

We saw from above that we can generate random numbers, but there are so many options and distributions that we can pull from.

> Why?  Often times when we are starting out, it's helpful to generate fake datasets to show an example, make a case, or just play around with other features, like plotting.

In [0]:
# what random data can we generate?
np.random?

In [0]:
# for example, let's generate a historgram

# import matplotlib
import matplotlib.pyplot as plt

# generate a normally distributed array (dataset or "column" in Excel) that has 10K rows 
norm = np.random.normal(size=10000)

In [0]:
# plot the distribution -- future lesson, but for a taste of "why"

# generate the plot object
plt.hist(norm)

# some options
plt.title("This is a normal distribution histogram")

# plot inline within the notebook
plt.show()

# Summary

In this notebook, we learned 

- numpy generates arrays that are very powerful for analysis, and can be thought of enhanced lists

- numpy arrays must be of one type, else they will be converted to a singular type

- numpy arrays can be multidimensional

- numpy arrays can be sliced just like lists, with dimensional slicing occuring `[:, :]`

- we can 