# Notebook 3.4: Introduction to Numpy

This notebook is associated with the following accompanied reading: 

+ The Python Data Science Handbook **Chapter 2** https://jakevdp.github.io/PythonDataScienceHandbook/


## Learning objectives: 

By the end of this module you should be able to:

1. Find and use scientific and numerical functions in `numpy`.
2. Generate arrays of data and compute values on them. 
3. Understand the difference between lists and numpy arrays.
4. Calculate N50 contig size. 


## Introduction to numpy

This notebook should be completed while reading Chapter 2 of the Data Science Handbook that was assigned for your reading. The numpy library is a *third party* Python library, meaning that it is not distributed by default with every Python installation. It can be easily installed however, and provides a huge suite of tools for scientific computing. I think that the assigned Chapter introduces numpy very well, so this notebook will mostly consist of exercises to test your comprehension of the reading. 

In [None]:
# start by importing numpy 
import numpy as np

### Create a numpy array
There are many ways to create a numpy array. Numpy has several built-in functions for generating arrays that are composed entirely of one value, or a range of values using `.zeros()`, `.ones()`, or `.arange()`; or, we can also generate an array by passing in a list like in the last two examples below. 

In [None]:
# create an array with ten items in it that are all zeros
np.zeros(10)

In [None]:
# create an array with ten items that are all zeros as integers
np.zeros(10, dtype=int)

In [None]:
# create an array with a range of values from 0-10
np.arange(0, 10)

In [None]:
# create an array from a list
np.array([0, 3, 4, 10, 2, 2, 2, 2])

In [None]:
# create an array that is made of 0 and 1 alternating
np.array([0, 1] * 5)

### However, the datatype is important... 
When you create a numpy array from a list it tries to infer the datatype from contents of the list. Above, when we created a list of all int elements it created an int array. However, when we pass it a list below where some elements are ints and some are strings, it converts everything to strings. **This is because numpy works most efficiently by storing all data in an array as a single datatype**. You can create arrays with a mixed datatype but you lose much of the efficiency of numpy when you do so. 

In [None]:
# mixed type lists will be converted to a single dtype array
np.array([0, 1, "apple", "orange"])

## Dimensions and indexing

Numpy arrays can be indexed just like Python list objects to select particular elements from them. In addition to the one dimension in which lists can be indexed, however, arrays can be indexed in multiple dimensions to select both rows and columns, and they can apply functions over these indices as well. If you need a refresher on how to `index` and `slice` objects in Python look for a refresher.

In [None]:
# create a 2-dimensional array 
np.zeros((4, 4))

In [None]:
# create a 3-dimensional array
np.zeros((5, 3, 3))

In [None]:
# index the third element in a 1-d array
np.arange(100)[3]

In [None]:
# slice the third through tenth elements in a 1-d array
np.arange(100)[3:10]

### Testing your skills

<div class="alert alert-success">
    <b>Action [5]:</b> 
    Create a 2-dimensional array with 3 rows and 5 columns that is composed entirely of cells with the integer value 35. 
</div>

<div class="alert alert-success">
    <b>Action [6]:</b> 
    Create a 2-dimensional array of size (3, 5) that is composed of random integers generated by numpy.
</div>

<div class="alert alert-success">
    <b>Action [7]:</b> 
    Return the values [24, 25, 26] from the array below by using slicing. 
</div>

<div class="alert alert-success">
    <b>Action [8]:</b> 
    Return the values [14, 16, 18, 20, 22] from the array below by slicing. 
</div>

In [None]:
arr = np.arange(24).reshape((12, 2))

## Filling an array with data

When learning about lists and dictionaries we learned that a convenient way to use these objects is to first create an empty list or dictionary and then fill it with values as you iterate over an object and then append items to the list, or add new key/val pairs to the dictionary. In numpy you have to do things a bit differently. You start by creating an array that is the full size that you plan to work with, initialized with some null value like zeros, and then you update the values with new data. 

The difference between these two approaches may seem subtle, but it can lead to huge speed improvements when done correctly in numpy. This is because numpy essentially reserves space in your computers memory for the size of the `array`, whereas a `list` that changes size as you extend it will need to keep updating the object in memory. As explained in your reading, this allows numpy to perform calculations using compiled functions in faster languages like C and fortran; it's a workaround that allows us to write pretty Python code but still benefit from super fast speed. 

These details only begin to matter when you do pretty high level computing, but it's worth learning the motivation for why we use numpy, and why the code looks the way that it does.  

In [None]:
# common with lists: start with an empty list and fill it as you iterate
empty = []
for i in range(100):
    empty.append(i)

In [None]:
# filling an empty array doesn't work like with lists
empty = np.array([])
for i in range(100):
    # ... a function does not exist to extend the size of arrays.
    # ... only to generate new arrays of a given size
    pass

In [None]:
# instead, you create the full sized array with null values and update them by indexing
empty = np.zeros(100)
for i in range(100):
    empty[i] = i