# Introduction to NumPy: Numerical Python

Sarah records her second-grade class’s grades in an online spreadsheet. Her web browser records that she visited that spreadsheet, in addition to every other site she’s visited. Those sites record her location, the time she spent on them, and where she visited next. The world is chock-full of all sorts of different datasets, and learning how to create, analyze, and manipulate these datasets can give us some insight and control over our digital surroundings.

In this lesson, we’ll be constructing and manipulating single-variable datasets. One way to think of a single-variable dataset is that it contains answers to a question. For instance, we might ask 100 people, “How tall are you?” Their heights in inches would form our dataset.

To work with our datasets, we’ll be using a powerful Python module known as NumPy, which stands for Numerical Python.

NumPy has many uses including:

- Efficiently working with many numbers at once
- Generating random numbers
- Performing many different numerical functions (i.e., calculating sin, cos, tan, mean, median, etc.)

In the following exercises, we’ll learn how to construct one- and two-dimensional arrays and perform basic array operations.

# Working with NumPy

NumPy is great at storing and manipulating numerical data in arrays.

Let's take a look at an example. Twice Charred in a fictional (mostly) movie review site where four good friends and movie reviewers, Lorie, Marty, Tori, and Kurtz watch movies and give them ratings on a scale of 0 to 100.

In [2]:
import numpy as np

 When the gang rates a movie, we can store their ratings in a NumPy array movie_ratings:

In [7]:
movie_ratings = np.array([63.0, 54.0, 70.0, 50.0])

But they see more than one movie, so we have to create a 2-dimensional array where each row is their ratings for a specific movie.

In [8]:

movie_ratings = np.array([[63.0, 54.0, 70.0, 50.0],
                          [94.0, 85.0, 89.0, 95.0],
                          [64.0, 90.0, 73.0, 85.0]])

In [11]:
movie_ratings_stars = movie_ratings / 20
movie_ratings_stars

array([[3.15, 2.7 , 3.5 , 2.5 ],
       [4.7 , 4.25, 4.45, 4.75],
       [3.2 , 4.5 , 3.65, 4.25]])

Now let's say the ratings are always in the same order (Lorie, Marty, Tori, Kurtz) if we wanted to create an array that only had Tori's ratings, we could select that from our movie_ratings array.

In [10]:
tori_ratings = movie_ratings[:, 2]
tori_ratings

array([70., 89., 73.])

Now, say we find that we have very similar taste to Marty, so we only want to see movies that he gives a good rating to, we can use logic to select those movies.

Let's select all of Marty's ratings that are over 80:

In [12]:
marty_ratings = movie_ratings[:, 1]
marty_ratings[marty_ratings > 80]

array([85., 90.])

# Importing NumPy
To use NumPy with Python, import it at the top of your file using the following line:

    import numpy as np
Writing as np allows us to use np as a shorthand for NumPy, which saves us time when calling a NumPy function (less typing = fewer errors!)

# NumPy Arrays
NumPy includes a powerful data structure known as an array. A NumPy array is a special type of list. It’s a data structure that organizes multiple items. Each item can be of any type (strings, numbers, or even other arrays).

Arrays are most powerful when they are used to store numbers. This is because arrays give us special ways of performing mathematical operations that are both simpler to write and more efficient computationally. We’ll get more into this later.

A NumPy array looks a lot like a Python list:

    my_array = np.array([1, 2, 3, 4, 5, 6])
    
We can transform a regular list into a NumPy array by using np.array() and saving the value to a new variable:

    my_list = [1, 2, 3, 4, 5, 6]
    my_array = np.array(my_list)

# Creating an Array from a CSV
Typically, you won’t be entering data directly into an array. Instead, you’ll be importing the data from somewhere else.

We’re able to transform CSV (comma-separated values) files into arrays using the np.genfromtxt() function:

Consider the following CSV, sample.csv,

    34,9,12,11,7
    
We can import this into a NumPy array using the following code:

    csv_array = np.genfromtxt('sample.csv', delimiter=',')
    
Note that in this case, our file sample.csv has values separated by commas, so we use delimiter=',', but sometimes you’ll find files with other delimiters, the most common being tabs or colons.

Once imported, this CSV will create the array

    >>> csv_array
    array([34, 9, 12, 11, 7])

In [17]:
import numpy as np

test_1 = np.array([92, 94, 88, 91, 87])

test_2 =np.genfromtxt('test_2.csv', delimiter=',')
test_2

array([ 79., 100.,  86.,  93.,  91.])

# Operations with NumPy Arrays
Generally, NumPy arrays are more efficient than lists. One reason is that they allow you to do element-wise operations. An element-wise operation allows you to quickly perform an operation, such as addition, on each element in an array.

Let’s compare how to add a number to each value in a python list versus a NumPy array:

    # With a list
    l = [1, 2, 3, 4, 5]
    l_plus_3 = []
    for i in range(len(l)):
        l_plus_3.append(l[i] + 3)
        
    # With an array
    a = np.array(l)
    a_plus_3 = a + 3
    
As we can see, if we were to add 3 to every number in a list, we would have to use a for loop or a list comprehension. With an array, we can just add 3. The same is true for subtraction, multiplication, and division.

We can also use NumPy Arrays to find the squares or square roots of each value.

Squaring each value:

    >>> a ** 2
    array([ 1,  4,  9, 16, 25, 36])
    (Note: ** is the exponent notation in Python. For example, 3 squared can be calculated using 3 ** 2.)

Taking the square root of each value:

    >>> np.sqrt(a)
    array([ 1, 1.41421356, 1.73205081, 2, 2.23606798, 2.44948974])

In [18]:
import numpy as np

test_1 = np.array([92, 94, 88, 91, 87])
test_2 = np.array([79, 100, 86, 93, 91])
test_3 = np.array([87, 85, 72, 90, 92])

test_3_fixed = test_3 + 2

print(test_3_fixed)

[89 87 74 92 94]


# Operations with NumPy Arrays II
Arrays can also be added to or subtracted from each other in NumPy, assuming the arrays have the same number of elements.

When adding or subtracting arrays in NumPy, each element will be added/subtracted to its matching element.

    >>> a = np.array([1, 2, 3, 4, 5])
    >>> b = np.array([6, 7, 8, 9, 10])
    >>> a + b
    array([ 7,  9, 11, 13, 15])

In [22]:
import numpy as np

test_1 = np.array([92, 94, 88, 91, 87])
test_2 = np.array([79, 100, 86, 93, 91])
test_3 = np.array([87, 85, 72, 90, 92])
test_3_fixed = test_3 + 2

total_grade =test_1 + test_2 + test_3_fixed

final_grade = total_grade/3

print(final_grade)

[86.66666667 93.66666667 82.66666667 92.         90.66666667]


# Two-Dimensional Arrays

In Python, we can create lists that are made up of other lists. Similarly, in NumPy we can create an array of arrays. If the arrays that make up our bigger array are all the same size, then it has a special name: a two-dimensional array.

In the previous exercises we had stored the students’ test scores in separate one-dimensional arrays for each test:

    test_1 = np.array([92, 94, 88, 91, 87])
    test_2 = np.array([79, 100, 86, 93, 91])
    test_3 = np.array([87, 85, 72, 90, 92])
    
But we could have also stored all of this data in a single, two-dimensional array:

    np.array([[92, 94, 88, 91, 87], 
              [79, 100, 86, 93, 91],
              [87, 85, 72, 90, 92]])
              
Here, each row represents a test, and each column represents a student. This allows us to store all of our data in a single array without losing any of its organization.

As we mentioned, a two-dimensional array is a list of lists where each list has the same number of elements. Here are some examples that are not two-dimensional arrays.

This code will run but it will not create a two-dimensional array because the lists have different numbers of elements:

    np.array([[29, 49,  6], 
              [77,  1]])
              
This code will not run because the [] for the outer lists are missing:

    np.array([68, 16, 73],
             [61, 79, 30])

# Instructions
1.
In statistics, we often use two-dimensional arrays to represent a set of samples. For instance, if we flip a coin we can represent each head as a 1 and each tail as a 0.

Create a one-dimensional array for a coin toss experiment that results in heads, tails, tails, heads, tails, and save it to the variable coin_toss.


2.
We run the experiment again and get the following outcome: tails, tails, heads, heads, heads. Create a new array that represents both outcomes as a single experiment. Save the new array to coin_toss_again.

In [23]:
import numpy as np

coin_toss = np.array([1, 0, 0, 1, 0])

coin_toss_again = np.array([[1, 0, 0, 1, 0], [0, 0, 1, 1, 1]])

# Selecting Elements from a 1-D Array
NumPy allows us to select elements from an array using their indices. Consider the one-dimensional array

    a = np.array([5, 2, 7, 0, 11])
    
If we wanted to select the first element in this array, we would call:

    >>> a[0]
    5 
    
In typical Python fashion, the indices for an array start at 0. This is known as zero-indexed numbering. In the array above, 5 is known as the zeroth element, a[0]. It follows that 2 is the first element, a[1].

We can also select negative indices, which count from opposite end of the array and start at -1. This is particularly useful when you want to access the last element or two of an array:

    >>> a[-1]
    11
    >>> a[-2]
    0
    
If we wanted to select multiple elements in the array, we can define a range, such as a[1:3], which will select all the elements from a[1] to a[3], including a[1] but excluding a[3].

    >>> a[1:3]
    array([2, 7])
    
Similarly, if we wanted to select all elements before a[3] we would use:

    >>> a[:3]
    array([5, 2, 7])

We can also use negative indices to select multiple elements. Let’s say we want to select the last 3 elements in an array:

    >>> a[-3:]
    array([7, 0, 11])
    
Notice that when we select multiple elements, we get an array.

In [25]:
import numpy as np

test_1 = np.array([92, 94, 88, 91, 87])
test_2 = np.array([79, 100, 86, 93, 91])
test_3 = np.array([87, 85, 72, 90, 92])

jeremy_test_2 = test_2[3]

manual_adwoa_test_1 = test_1[1:3]

manual_adwoa_test_1

array([94, 88])

# Selecting Elements from a 2-D Array

Selecting elements from a 2-d array is very similar to selecting them from a 1-d array, we just have two indices to select from. The syntax for selecting from a 2-d array is a[row,column] where a is the array.

It’s important to note that when we work with arrays that have more than one dimension, the relationship between the interior arrays is defined in terms of axes. A two-dimensional array has two axes: axis 0 represents the values that share the same indexical position (are in the same column), and axis 1 represents the values that share an array (are in the same row). This is illustrated below.

<img src='images/NumPy+Array+fixed.svg' width=400>

Diagram showing the axes in an array

Consider the array

    a = np.array([[32, 15, 6, 9, 14], 
                  [12, 10, 5, 23, 1],
                  [2, 16, 13, 40, 37]])
                  
We can select specific elements using their indices:

    >>> a[2,1]
    16
    
Let’s say we wanted to select an entire column, we can insert : as the row index:

    # selects the first column
    >>> a[:,0]
    array([32, 12,  2])
    
The same works if we want to select an entire row:

    # selects the second row
    >>> a[1,:]
    array([12, 10,  5, 23,  1])
    
We can further narrow it down and select a range from a specific row:

    # selects the first three elements of the first row
    >>> a[0,0:3]
    array([32, 15,  6])

In [26]:
import numpy as np

student_scores = np.array([[92, 94, 88, 91, 87],
                           [79, 100, 86, 93, 91],
                           [87, 85, 72, 90, 92]])



tanya_test_3= student_scores[2,0]
cody_test_scores = student_scores[:,4]

print(tanya_test_3)
print(cody_test_scores)

87
[87 91 92]


# Logical Operations with Arrays

Another useful thing that arrays can do is perform element-wise logical operations. For instance, suppose we want to know how many elements in an array are greater than 5. We can easily write some code that checks to see whether this statement evaluates to True for each item in the array, without having to use a for loop :

    >>> a = np.array([10, 2, 2, 4, 5, 3, 9, 8, 9, 7])
    >>> a > 5
    array([True, False, False, False, False, False, True, True, True, True], dtype=bool)
    
We can then use logical operators to evaluate and select items based on certain criteria. To select all elements from the previous array that are greater than 5, we’d write the following:

    >>> a[a > 5]
    array([10, 9, 8, 9, 7])
    
We can also combine logical statements to further specify our criteria. To do so, we place each statement in parentheses and use boolean operators like & (and) and | (or).

In our example, we can use combined statements to find the elements that are greater than five or less than two:

    >>> a[(a > 5) | (a < 2)]
    array([10, 9, 8, 9, 7])

# Instructions
1.
Today we’re visiting the Goldilocks Porridge Festival, sampling a selection of breakfast cereals and judging them based on their temperature (listed in Fahrenheit).

Create a logical condition that selects samples in the porridge array that are less than 60, and save them to a variable named cold.


2.
Create a logical condition that finds all the samples that are higher than 80 and save them to a variable named hot.


3.
Create a logical condition that finds all the samples that are between 60 and 80 and save them to a variable named just_right.


4.
Print each array to the terminal.

In [27]:
import numpy as np

porridge = np.array([79, 65, 50, 63, 56, 90, 85, 98, 79, 51])

cold = porridge[porridge < 60]

hot = porridge[porridge > 80]

just_right = porridge[(porridge > 60) & (porridge < 80)]

print(cold)
print(hot)
print(just_right)

[50 56 51]
[90 85 98]
[79 65 63 79]


# Review
Let’s take a second and review. In this lesson, you learned the basics of the NumPy package. Here are some key points:

- Arrays are a special type of list that allows us to store values in an organized manner.

- An array can be created by either defining it directly using np.array() or by importing a CSV using np.genfromtxt('file.csv', delimiter=',').

- An operation (such as addition) can be performed on every element in an array by simply performing it on the array itself.
- Elements can be selected from arrays using their index and array locations, both of which start at 0.

- Logical operations can be used to create new, more focused arrays out of larger arrays.

The next lesson will explore how to analyze these arrays and use means, medians, and standard deviations to tell a story. 
But first, practice what you’ve learned by working through the following checkpoints.