In [1]:
import numpy as np

## Statistics Functions on Arrays

**Numpy** is a Python package that, among other things, has many useful statistics **functions**.  These take any array-like object as an input and can be found inside the **np** library.  Sometimes, the same functionality can be found both as a Numpy function  and an array method, giving you the choice of how you'd like to use it.  


```python
>>> np.mean([1, 2, 3, 4])
2.5

>>> np.ptp([1, 2, 3, 4])
3
```

A couple lists of functions in Numpy can be found here:
  - Math:  https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html
  - Statistics: https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html

We'll see more later.

**Exercise**: Calculate the statistics on the following numbers:

In [4]:
data = [2, 8, 5, 9, 2, 4, 6]
data

[2, 8, 5, 9, 2, 4, 6]

1. Get the mean of the data.

2. What is the sum of the data?

3. What is the minimum of the data?

4. The standard deviation?

5. The difference between the data's maximum and minimum? ("peak-to-peak")

6. The data's median?

7. This data's median?

In [6]:
data = [3, 4, 6, np.nan, 2, 1]  # nan stands for "not a number". It's often used as a placeholder for missing or invalid data.

## Arrays and Vectorization

**Arrays** are sequences of same-type data points (most-often numbers).  Numpy allows us to work with the sequence without writing a for-loop, using a technique called **vectorization** or **broadcasting**.  

Besides an **array()** class, Numpy also includes a lot of math functions, which makes analysis much easier.  Let's try some out!

## Numpy Exercises

### Building Arrays

Numpy has some convenient array-building functions as well.  Some commonly-used are examples are **arange()**, **linspace()**, **zeros()**, and the random number generation functions in **random**.

| function | Purpose |  Example |
| :-----------: | :-------------: | :-------------: |
| **np.arange()**                  | Makes an array with all the integers between two values | np.arange(2, 7) |
| **np.linspace()**               | Makes a specific-length array |  np.linspace(2, 3, 10) |
| **np.zeros()**                    | Makes an array of all zeros | np.zeros(5) |
| **np.ones()**                     | Makes an array of all ones | np.ones(3) |
| **np.random.random()** | Makes an array of random numbers | np.random.random(100) |
| **np.random.randn()**     | Makes an array of normally-distributed random numbers | np.random.randn(100) |


1. Make an array containing the integers from 1 to 15.

1.5 Make an array of only 6 numbers between 1 and 10, evenly-spaced between them.

2. Make an array containing 20 zeros.

3. Make an array contain 20 ones!

4. Generate an array of 10 random numbers from Numpy's **random** submodule, using any function you want.

5. What is the standard deviation of the integers between 2 and 20?

6. What is the standard deviation of the numbers generated from the np.random.randn() function?  

## Statistics Methods on Arrays

Arrays have many useful math methods.  For example, to get the mean of an array of numbers:

```python
data = np.random.random(100)
data.mean()
```

**Exercise**: Calculate the statistics on the following numbers:

In [3]:
data = np.arange(2, 7)
data

array([2, 3, 4, 5, 6])

1. Get the mean of the data.

2. What is the sum of the data?

3. The maximum of the data?

4. The standard deviation of the data?

## Arithmetic with Arrays

Arrays can also be added, subtracted, multiplied, and divided.  

For example, to add 10 to all values in an array:

```python
data = np.random.randn(5)
data + 10
```

Here is multiplying two arrays together: 

```python
data * data
```



**Exercises**: Modify the following arrays using the math operators  (+, -, *, /)

In [7]:
data = np.arange(-3, 5)
data

array([-3, -2, -1,  0,  1,  2,  3,  4])

1. Multiply the data by 100

2. Add 40 to each value in the array.

3. Divide the numbers by 100

4. Subtract the data from itself.

5. Subtract each value in the data from its own mean (a.k.a. "mean-centering" the values)

6. Make an array of 10 values, all of them 2!

7. Calculate the square of all the numbers from 0 to 8

8. Calculate the square roots of all the numbers from 0 to 8.

9. Make an array of 20 values, all of them 2's.

### Translating Algorithms into Code

Calculate the standard deviation of an array's values, without using the numpy.std() function.  (Formula can be found here: http://www.mathsisfun.com/data/standard-deviation-formulas.html)

1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!


In [None]:
data = np.random.randn(100)


## Making Pictures: Plotting Data

Let's explore Numpy's random data generation functions!  To plot a histogram of the data, we'll use the library **matplotlib**.  

Here's how to import it:

```python
import matplotlib.pyplot as plt
plt.hist([1, 2, 3, 4, 5])
```

**Exercise:** Make 5 different histograms, each of different stistical distributions:

1.

2.

3. 

4. 

5. 

## Broadcasting vs. For-Loops
All of these tasks can also be done without NumPy, using the built-in Python ints, floats, lists, etc.  To tell Python to do the same task repeatedly, however, there is some extra syntax: The For-Loop.

Here, we'll look at a special version of the loop called a **list comprehension**, which builds a new list out of an old list, performing some modification to each value along the way:

```python
>>> data = [1, 2, 3, 4, 5]
>>> [x ** 2 for x in data]
[1, 4, 9, 16, 25]
```

```python
>>> data2 = [x - 2 for x in data]
>>> data2
[-1, 0, 1, 2, 3]
```

### Exercises

Using the following array, perform the following transformations using both a List Comprehension and a Numpy operation:

In [7]:
data = np.arange(1, 11)
data

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

1. Multiply all the values by 10

with broadcasting:

with a list comprehension:

2. Add each value to itself

with broadcasting:

with a list comprehension:

3. Calculate the square root of each value

with broadcasting:

with a list comprehension

# Filtering Data

Sometimes you want to remove certain values from your dataset.  In Numpy, this can be done with **Logical Indexing**, and in normal Python this is done with an **If Statement**

## With Logical Indexing

### Step 1: Create a Logical Numpy Array

We can convert all of the values in an array at once with a single logical expression.  This is broadcasting, the same as is done with the math operations we saw earlier:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> data < 3
[True, True, False, False, False]
```

**Exercises**: Make arrays of True/False values that answer the following questions about the dataset below for each element.

In [22]:
import numpy as np

list_of_values = [3, 7, 10, 2, 1, 7, 20, -5]
data = np.array(list_of_values)

1. Which values are greater than zero?

2. Which values are equal to 7?

3. Which values are greater or equal to 7?

4. Which values are not equal to 7?

## Step 2: Filter with Logical Indexing

If an array of True/False values is used to *index* another array, and both arrays are the same size, it will return all of the values that correspond to the True values of the indexing array:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> is_big = data > 3
>>> is_big
[False, False, False, True, True]

>>> data[is_big]
[4, 5]
```


**Exercises**:  Using the data below, extract only the values that corresspond to each question

In [28]:
data = np.array([3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7])
data

array([ 3,  1, -6,  8, 20,  2,  7,  1,  9,  7,  7, -7])

1. The values that are less than 0

2. The values that are greater than 3

3. The values equal to 7

4. The values not equal to 7

  5. The values equal to 20

### Step 2.5: Combine Step 1 and Step 2 into a single line

Both steps can be done in a single expression.  Sometimes this can make things clearer!


```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> data[data > 3]
[4, 5]
```



**Exercises**: Do the same as in the previous section, this time in a single line.

In [28]:
data = np.array([3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7])
data

array([ 3,  1, -6,  8, 20,  2,  7,  1,  9,  7,  7, -7])

1. The values that are less than 0

2. The values that are greater than 3

3. The values equal to 7

4. The values not equal to 7

  5. The values equal to 20

**Extra Exercises**: Using the following dataset, have Python to calculate the answers to the questions below:

In [29]:
data = np.array([3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7])
data

array([ 3,  1, -6,  8, 20,  2,  7,  1,  9,  7,  7, -7])

1. How many values are greater than 4 in this dataset?  (*Hint:* the len() function is useful here)

2. How many values are equal to 7 in this dataset?

3. What is the mean value of the positive numbers in this dataset?

4. What is the mean value of the negative numbers in this dataset?

5. What proportion of the values in this dataset are positive?

6. What proportion of the values in this dataset are less than or equal to 8?

## With If-Statements

List comprehensions also support using an **if** statement, which lets you do the filtering in a single step as well:

```python
>>> data = [1, 2, 3, 4, 5]
>>> [x for x in data if x > 3]
[4, 5]
```


**Exercises**: Filter the list below, this time using list comprehensions instead.

In [28]:
data = [3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7]
data

array([ 3,  1, -6,  8, 20,  2,  7,  1,  9,  7,  7, -7])

1. The values that are less than 0

2. The values that are greater than 3

3. The values equal to 7

4. The values not equal to 7

  5. The values equal to 20

# Comparing NumPy with Core Python: Demonstration and Discussion

## Speed

## Memory

## Flexibility