# Introduction to NumPy

**Library Alias**

In [1]:
import numpy as np

## Built-In Help


### Exercise

In [None]:
# Place your cursor after the period and press <TAB>:
np.



### Exercise

In [None]:
# Replace 'add' below with a few different NumPy function names and look over the documentation:
np.add?

## NumPy arrays: a specialized data structure for analysis

> **Learning goal:** By the end of this subsection, you should have a basic understanding of what NumPy arrays are and how they differ from the other Python data structures you have studied thus far.

### Lists in Python


In [None]:
myList = list(range(10))
myList

**List Comprehension with Types**

In [None]:
[type(item) for item in myList]

**Share**

In [None]:
myList2 = [True, "2", 3.0, 4]
[type(item) for item in myList2]

### Fixed-type arrays in Python

#### Creating NumPy arrays method 1: using Python lists

In [None]:
# Create an integer array:
np.array([1, 4, 2, 5, 3])

**Think, Pair, Share**

In [None]:
np.array([3.14, 4, 2, 3])

### Exercise

In [None]:
# What happens if you construct an array using a list that contains a combination of integers, floats, and strings?


**Explicit Typing**

In [None]:
np.array([1, 2, 3, 4], dtype='float32')

### Exercise

In [None]:
# Try this using a different dtype.
# Remember that you can always refer to the documentation with the command np.array.


**Multi-Dimensional Array**

**Think, Pair, Share**

In [None]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

#### Creating NumPy arrays method 2: building from scratch

In [None]:
np.zeros(10, dtype=int)

In [None]:
np.ones((3, 5), dtype=float)

In [None]:
np.full((3, 5), 3.14)

In [None]:
np.arange(0, 20, 2)

In [None]:
np.linspace(0, 1, 5)

In [None]:
np.random.random((3, 3))

In [None]:
np.random.normal(0, 1, (3, 3))

In [None]:
np.random.randint(0, 10, (3, 3))

In [None]:
np.eye(3)

In [None]:
np.empty(3)

> **Takeaway:** NumPy arrays are a data structure similar to Python lists that provide high performance when storing and working on large amounts of homogeneous data—precisely the kind of data that you will encounter frequently in doing data science. NumPy arrays support many data types beyond those discussed in this course. With all of that said, however, don’t worry about memorizing all of the NumPy dtypes. **It’s often just necessary to care about the general kind of data you’re dealing with: floating point, integer, Boolean, string, or general Python object.**

## Working with NumPy arrays: the basics

> **Learning goal:** By the end of this subsection, you should be comfortable working with NumPy arrays in basic ways.

**Similar to Lists:**
- **Arrays attributes**: Assessing the size, shape, and data types of arrays
- **Indexing arrays**: Getting and setting the value of individual array elements
- **Slicing arrays**: Getting and setting smaller subarrays within a larger array
- **Reshaping arrays**: Changing the shape of a given array
- **Joining and splitting arrays**: Combining multiple arrays into one and splitting one array into multiple arrays

### Array attributes

In [None]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

a1 = np.random.randint(10, size=6)  # One-dimensional array
a2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
a3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

**Array Types**

In [None]:
print("dtype:", a3.dtype)



### Exercise:

In [None]:
# Change the values in this code snippet to look at the attributes for a1, a2, and a3:
print("a3 ndim: ", a3.ndim)
print("a3 shape:", a3.shape)
print("a3 size: ", a3.size)

### Exercise:

In [None]:
# Explore the dtype for the other arrays.
# What dtypes do you predict them to have?
print("dtype:", a3.dtype)

### Indexing arrays

**Quick Review**

In [None]:
a1

In [None]:
a1[0]

In [None]:
a1[4]

In [None]:
a1[-1]

In [None]:
a1[-2]

**Multi-Dimensional Arrays**

In [None]:
a2

In [None]:
a2[0, 0]

In [None]:
a2[2, 0]

In [None]:
a2[2, -1]

In [None]:
a2[0, 0] = 12
a2

In [None]:
a1[0] = 3.14159
a1

### Exercise:

In [None]:
# What happens if you try to insert a string into a1?
# Hint: try both a string like '3' and one like 'three'


### Slicing arrays

#### One-dimensional slices

In [None]:
a = np.arange(10)
a

In [None]:
a[:5]

In [None]:
a[5:]

In [None]:
a[4:7]

**Slicing With Index**

In [None]:
a[::2]

In [None]:
a[1::2]

In [None]:
a[::-1]

In [None]:
a[5::-2]

#### Multidimensional slices

In [None]:
a2

In [None]:
a2[:2, :3]

In [None]:
a2[:3, ::2]

In [None]:
a2[::-1, ::-1]

#### Accessing array rows and columns

In [None]:
print(a2[:, 0])

In [None]:
print(a2[0, :])

In [None]:
print(a2[0])

#### Slices are no-copy views

In [None]:
print(a2)

In [None]:
a2_sub = a2[:2, :2]
print(a2_sub)

In [None]:
a2_sub[0, 0] = 99
print(a2_sub)

In [None]:
print(a2)

#### Copying arrays


In [None]:
a2_sub_copy = a2[:2, :2].copy()
print(a2_sub_copy)

In [None]:
a2_sub_copy[0, 0] = 42
print(a2_sub_copy)

In [None]:
print(a2)

### Joining and splitting arrays

#### Joining arrays

In [None]:
a = np.array([1, 2, 3])
b = np.array([3, 2, 1])
np.concatenate([a, b])

In [None]:
c = [99, 99, 99]
print(np.concatenate([a, b, c]))

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
np.concatenate([grid, grid])

#### Splitting arrays
**Think, Pair, Share**

In [None]:
a = [1, 2, 3, 99, 99, 3, 2, 1]
a1, a2, a3 = np.split(a, [3, 5])
print(a1, a2, a3)

> **Takeaway:** Manipulating datasets is a fundamental part of preparing data for analysis. The skills you learned and practiced here will form building blocks for the most sophisticated data-manipulation you will learn in later sections in this course.

## Sorting arrays

In [None]:
a = np.array([2, 1, 4, 3, 5])
np.sort(a)

In [None]:
print(a)

In [None]:
a.sort()
print(a)

### Sorting along rows or columns

In [None]:
rand = np.random.RandomState(42)
table = rand.randint(0, 10, (4, 6))
print(table)

In [None]:
np.sort(table, axis=0)

In [None]:
np.sort(table, axis=1)

### NumPy Functions vs Python Built-In Functions

| Operator	    | Equivalent ufunc    | Description                           |
|:--------------|:--------------------|:--------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

#### Exponents and logarithms

In [None]:
a = [1, 2, 3]
print("a     =", a)
print("e^a   =", np.exp(a))
print("2^a   =", np.exp2(a))
print("3^a   =", np.power(3, a))

In [None]:
a = [1, 2, 4, 10]
print("a        =", a)
print("ln(a)    =", np.log(a))
print("log2(a)  =", np.log2(a))
print("log10(a) =", np.log10(a))

In [None]:
a = [0, 0.001, 0.01, 0.1]
print("exp(a) - 1 =", np.expm1(a))
print("log(1 + a) =", np.log1p(a))

#### Specialized Functions

In [None]:
from scipy import special

In [None]:
# Gamma functions (generalized factorials) and related functions
a = [1, 5, 10]
print("gamma(a)     =", special.gamma(a))
print("ln|gamma(a)| =", special.gammaln(a))
print("beta(a, 2)   =", special.beta(a, 2))

> **Takeaway:** Universal functions in NumPy provide you with computational functions that are faster than regular Python functions, particularly when working on large datasets that are common in data science. This speed is important because it can make you more efficient as a data scientist and it makes a broader range of inquiries into your data tractable in terms of time and computational resources.

## Aggregations

> **Learning goal:** By the end of this subsection, you should be comfortable aggregating data in NumPy.

### Summing the values of an array

In [None]:
myList = np.random.random(100)
np.sum(myList)

**NumPy vs Python Functions**

In [None]:
large_array = np.random.rand(1000000)
%timeit sum(large_array)
%timeit np.sum(large_array)

### Minimum and maximum

In [None]:
np.min(large_array), np.max(large_array)

In [None]:
print(large_array.min(), large_array.max(), large_array.sum())

## Computation on arrays with broadcasting

> **Learning goal:** By the end of this subsection, you should have a basic understanding of how broadcasting works in NumPy (and why NumPy uses it).

In [None]:
first_array = np.array([3, 6, 8, 1])
second_array = np.array([4, 5, 7, 2])
first_array + second_array

In [None]:
first_array + 5

In [None]:
one_dim_array = np.ones((1))
one_dim_array

In [None]:
two_dim_array = np.ones((2, 2))
two_dim_array

In [None]:
one_dim_array + two_dim_array

**Think, Pair, Share**

In [None]:
horizontal_array = np.arange(3)
vertical_array = np.arange(3)[:, np.newaxis]

print(horizontal_array)
print(vertical_array)

In [None]:
horizontal_array + vertical_array

## Comparisons, masks, and Boolean logic in NumPy

> **Learning goal:** By the end of this subsection, you should be comfortable with and understand how to use Boolean masking in NumPy in order to answer basic questions about your data.

### Example: Counting Rainy Days

Let's see masking in practice by examining the monthly rainfall statistics for Seattle. The data is in a CSV file from data.gov. To load the data, we will use pandas, which we will formally introduce in Section 4.

In [None]:
import numpy as np
import pandas as pd

# Use pandas to extract rainfall as a NumPy array
rainfall_2003 = pd.read_csv('Data/Observed_Monthly_Rain_Gauge_Accumulations_-_Oct_2002_to_May_2017.csv')['RG01'][ 2:14].values
rainfall_2003

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
plt.bar(np.arange(1, len(rainfall_2003) + 1), rainfall_2003)

### Boolean operators

In [None]:
np.sum((rainfall_2003 > 0.5) & (rainfall_2003 < 1))

In [None]:
rainfall_2003 > (0.5 & rainfall_2003) < 1

In [None]:
np.sum(~((rainfall_2003 <= 0.5) | (rainfall_2003 >= 1)))

In [None]:
print("Number of months without rain:", np.sum(rainfall_2003 == 0))
print("Number of months with rain:   ", np.sum(rainfall_2003 != 0))
print("Months with more than 1 inch: ", np.sum(rainfall_2003 > 1))
print("Rainy months with < 1 inch:   ", np.sum((rainfall_2003 > 0) &
                                              (rainfall_2003 < 1)))

## Boolean arrays as masks

In [None]:
rand = np.random.RandomState(0)
two_dim_array = rand.randint(10, size=(3, 4))
two_dim_array

In [None]:
two_dim_array < 5

**Masking**

In [None]:
two_dim_array[two_dim_array < 5]

In [None]:
# Construct a mask of all rainy months
rainy = (rainfall_2003 > 0)

# Construct a mask of all summer months (June through September)
months = np.arange(1, 13)
summer = (months > 5) & (months < 10)

print("Median precip in rainy months in 2003 (inches):   ", 
      np.median(rainfall_2003[rainy]))
print("Median precip in summer months in 2003 (inches):  ", 
      np.median(rainfall_2003[summer]))
print("Maximum precip in summer months in 2003 (inches): ", 
      np.max(rainfall_2003[summer]))
print("Median precip in non-summer rainy months (inches):", 
      np.median(rainfall_2003[rainy & ~summer]))

> **Takeaway:** By combining Boolean operations, masking operations, and aggregates, you can quickly answer questions similar to those we posed about the Seattle rainfall data about any dataset. Operations like these will form the basis for the data exploration and preparation for analysis that will by our primary concerns in Sections 4 and 5.