## Numpy

NumPy is a python package that stands for ‘Numerical Python’.

1. **[Introduction to Numpy](#python_numpy)**
2. **[Indexing an Array](#indexing)**
3. **[Slicing an Array](#slicing)**
4. **[Operations on Array](#operations)**
5. **[Arithmetic Functions in Numpy](#arithmetic)**
6. **[Concatenation of Array](#concatenation)**
7. **[Splitting of Array](#splitting)**
8. **[Exercises](#exercises)**
9. **[Conclusion](#conclusion)**



Reference Guide Numpy: <a href="https://docs.google.com/document/d/1VixbONq4RgYZgCxmebQg_RAmidiMFC1W/edit?usp=drive_link">link</a>

<a id="python_numpy"> </a>
### 1. Introduction to Numpy

<table align="left">
    <tr>
        <td>
            <div align="left", style="font-size:120%">
                <font color="#21618C">
                    <b><br>
                    Numpy is the core library for scientific computing, which contains a powerful n-dimensional array object. It provides tools for integrating C, C++, etc.<br>
                    It also contains a powerful N-dimensional array object.<br>
                    </b>
                </font>
            </div>
        </td>
    </tr>
</table>

### Python NumPy Array vs List
We use python numpy array instead of a list because of the below three reasons:
1. Less memory<br>
2. Fast<br>
3. Convenient

**The N-dimensional array**<br>
A simple way to create an array from data or simple python data structures lie=ke a list is to use the array() function.

In [2]:
# import the numpy package as np
import numpy as np

# Create 2 new lists height and weight
person_height = [5.2,  5.4, 4.4, 4.5, 5.6, 6]
person_weight = [81, 55, 65, 70, 45, 44]

# Create 2 numpy arrays from height and weight
person_height = np.array(person_height)
person_weight = np.array([person_weight])

print(type(person_weight))

<class 'numpy.ndarray'>


<a id="indexing"> </a>
### 2. Indexing an Array

**Indexing in 1 dimension**

Each element in the array can be accessed by passing the positional index of the element.
* Indexing starts from 0

In [3]:
# given array
my_array = np.array([11, 22, 33, 24, 57, 473])

# get the third element
print(my_array[2]) 

33


**Indexing in 2 dimensions**

We can retrieve an element of the 2D array using two indices i and j - i selects the row, and j selects the column:

In [4]:
my_2darray = np.array([[101, 231, 321],
              [412, 512, 622],
              [712, 821, 912]])

print(my_2darray)

# get the element in 3rd row (i) and 2nd column (j)
print(my_2darray[2, 1])

[[101 231 321]
 [412 512 622]
 [712 821 912]]
821


We can also select a single row or column.

In [5]:
# pick the third ROW from the array
print(my_2darray[2])

# pick the second COLUMN from the array
print(my_2darray[:,2])

[712 821 912]
[321 622 912]


In [17]:
# The shape attribute returns the number of elements in each dimension
# of an array.
print("Shape: ", arr.shape)

# The ndim attribute returns the number of dimensions in an array.
print("Dim: ", arr.ndim)

Shape:  (3,)
Dim:  1


<a id="Slicing"></a>
### 3. Slicing

**Slicing a 1D array**

The slice notation specifies a start and end value [start:end], where 'start' is inclusive but 'end' is exclusive.

In [6]:
my_array = [101, 121, 112, 123, 114]
print(my_array)

# pick the second, third, and fourth element from the array 
new_array = my_array[1:4]  
print(new_array)

# first three elements
c = my_array[:3]
print(c)

# all the elements from element 112 forward
d = my_array[2:]    
print(d)

# get all the elements from start to end of array
e = my_array[:]
print(e)

[101, 121, 112, 123, 114]
[121, 112, 123]
[101, 121, 112]
[112, 123, 114]
[101, 121, 112, 123, 114]


**Slicing a 2D array**

In [7]:
my_2darray = np.array([[101, 131, 122, 113, 143],
               [145, 165, 137, 318, 193],
               [240, 241, 252, 253, 324],
               [225, 126, 727, 928, 129]])

print(my_2darray)

# select All rows except the 1st row
# select 3rd and 4th column
print(my_2darray[1:, 2:4])

[[101 131 122 113 143]
 [145 165 137 318 193]
 [240 241 252 253 324]
 [225 126 727 928 129]]
[[137 318]
 [252 253]
 [727 928]]


**Note:** The index returns an element of the array, the slice returns a list of elements.

<a id="operations"> </a>
### 4. Operations on Arrays

In [3]:
# Lists cannot be multiplied together.
list_a = [1, 2, 3]
list_b = [2, 4, 6]

# Convert lists to arrays.
array_a = np.array(list_a)
array_b = np.array(list_b)

# Perform element-wise multiplication between the arrays.
array_a * array_b

array([ 2,  8, 18])

In [9]:
# Arrays cast every element they contain as the same data type.
arr = np.array([1, 2, 'coconut'])
arr

array(['1', '2', 'coconut'], dtype='<U21')

In [11]:
# The dtype attribute returns the data type of an array's contents.
arr = np.array([1, 2, 3])
arr.dtype

dtype('int64')

In [4]:
my_array1 = np.array([120, 230, 310, 410, 150])
my_array2 = np.arange(5)

print(my_array1)
print(my_array2)

[120 230 310 410 150]
[0 1 2 3 4]


In [9]:
# Adding the two arrays
my_array3 = my_array1 + my_array2
my_array3

array([120, 231, 312, 413, 154])

In [11]:
# Sum 2 arrays with different number of elements
my_array4 = np.array([1,2,3,5])
my_array1 + my_array4

ValueError: operands could not be broadcast together with shapes (5,) (4,) 

<table align="left">
    <tr>
        <td>
            <div align="left">
                <font color="#21618C">
                    <b>If you try to add arrays with the same dimension but a different number of elements, you get an error.
                    </b>
                </font>
            </div>
        </td>
    </tr>
</table>

**Multiplication and Square**
Use ' ** ' to compute power of the numbers.

In [12]:
# multiply each element in the array by 4 
print(my_array1*4)

# get square of each element
print(my_array1**2)

[ 480  920 1240 1640  600]
[ 14400  52900  96100 168100  22500]


In [21]:
# Create new array
arr = np.array([1, 2, 3, 4, 5])

# The mean() method returns the mean of the elements in an array
print(np.mean(arr))

# The log() method returns the natural logarithm of the elements in an array
print("log() ", np.log(arr))

# The floor() method returns the value of a number rounded down
# to the nearest integer
print(np.floor(5.7))

# The ceil() method returns the value of a number rounded up
# to the nearest integer
print(np.ceil(5.3))

3.0
log()  [0.         0.69314718 1.09861229 1.38629436 1.60943791]
5.0
6.0


**Using Numpy with Comparison Expressions**

In [13]:
my_array = np.array([34, 45, 67, 45, 23])

# check which elements are greater than or equal to 40
# the comparison condition gives boolean output
new_array = my_array >= 40
new_array

array([False,  True,  True,  True, False])

Pass the above boolean array to the main array to fetch the values that satisfy the comparison condition.

In [14]:
# elements greater than or equal to 40
print(my_array[new_array])

# Rather than creating a separate array of booleans, 
# you can specify the comparison operation directly on the main array.
print(my_array[my_array >= 40])

[45 67 45]
[45 67 45]


<a id="arithmetic"> </a>
### 5. Arithmetic Functions in Numpy

**sum():**<br>
sum() function adds all the values in the array and gives a scalar output.

**min():**<br>
min function finds the lowest value in the array.

**power():**<br>
power function raises the numbers in the array to the given value.

In [15]:
# given array
my_array = np.array([5,7,8,2,4])
my_array

# add all the elements of 'my_array'
print(my_array.sum())

# find minimum of 'my_array'
print(my_array.min())

# get cube of elements of 'my_array'
np.power(my_array, 3)

26
2


array([125, 343, 512,   8,  64])

<a id="concatenation"> </a>
### 6. Concatenation of Array

The arrays can be concatenated only if they have same shape, except in the dimension corresponding to the axis of concatenation.

**Concatenate 1D array**

In [16]:
# concatenate two 1D arrays
array_x = np.array([11, 22, 13])
array_y = np.array([23, 22, 12])
print(np.concatenate([array_x, array_y]))

# you can also concatenate more than two arrays at once
array_z = np.array([23,45])
print(np.concatenate([array_x, array_y, array_z]))

[11 22 13 23 22 12]
[11 22 13 23 22 12 23 45]


**Concatenate 2D array**

In [17]:
# create a 2D array
my_array = np.array([[1, 2, 3],
                 [4, 5, 6]])
my_array

array([[1, 2, 3],
       [4, 5, 6]])

**Concatenate along the first axis and second axis**

In [18]:
# by default concatenate() is along 'axis = 0'
print(np.concatenate([my_array, my_array]))

# for second axis - columns, use axis =1
print(np.concatenate([my_array, my_array], axis=1))

[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]]


**Concatenate 1D and 2D array**

In [19]:
# concatenate the 1D and 2D arrays
# consider a 1D array -- 'array_x'
# consider a 2D array -- 'my_array'
np.concatenate((array_x, my_array), axis = 0)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)

<table align="left">
    <tr>
        <td>
            <div align="left">
                <font color="#21618C">
                    <b>Note: One can not concatenate the arrays with different dimensions
                    </b>
                </font>
            </div>
        </td>
    </tr>
</table>

<a id="splitting"> </a>
### 7. Splitting of Array

Splitting is used to split the array into multiple sub-arrays. It is opposite of concatenation, which is implemented by the functions like split(), hsplit(), and so on.

**array_split():** It is used to split the array into sub-arrays. It takes the integer 'N' as the input for the number of splits, even if 'N' does not divide the array into sub-arrays of equal length.In the example, ee split the array of length 8 into 3 sub-arrays; the function 'array_split()' returns <i>8 % 3 (=2)</i> sub-arrays of size <i>8//3 + 1 (=3)</i> and the rest (i.e. one sub-array) of size <i>8//3 (=2)</i>.<br>

**split():**

**vsplit():**<br>
The vsplit() function is used to split an array into multiple sub-arrays vertically (row-wise).

**hsplit():**<br>
The hsplit() function is used to split an array into multiple sub-arrays horizontally (column-wise).

In [20]:
# given array
array_x = np.arange(8) 

# split 'array_x' into 3 sub-arrays using 'array_split'
np.array_split(array_x, 3)

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

In [21]:
# split into sub-arrays
print(np.split(array_x, 2))

# split occurs at 5th and 7th indices
array_y = np.split(array_x,[5,7])
print(array_y)

[array([0, 1, 2, 3]), array([4, 5, 6, 7])]
[array([0, 1, 2, 3, 4]), array([5, 6]), array([7])]


In [22]:
# split 'array_x' into 3 sub-arrays
np.split(array_x, 3)

ValueError: array split does not result in an equal division

<table align="left">
    <tr>
        <td>
            <div align="left">
                <font color="#21618C">
                    <b>The split() function does not allow the integer (N) as number of splits, if N does not divide the array into sub-arrays of equal length
                    </b>
                </font>
            </div>
        </td>
    </tr>
</table>

In [23]:
my_array = np.arange(20.0).reshape(4,5)
print(my_array)

# split vertically
print(np.vsplit(my_array, 2))

# split horizontally
new_array = np.arange(16.0).reshape(4,4)
print(new_array)
print(np.hsplit(new_array, 2))

[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]]
[array([[0., 1., 2., 3., 4.],
       [5., 6., 7., 8., 9.]]), array([[10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]])]
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]]
[array([[ 0.,  1.],
       [ 4.,  5.],
       [ 8.,  9.],
       [12., 13.]]), array([[ 2.,  3.],
       [ 6.,  7.],
       [10., 11.],
       [14., 15.]])]


<a id="exercises"> </a>
### 8. Exercises: Numpy

#### Introduction 

Your work as a data professional for the U.S. Environmental Protection Agency (EPA) requires you to analyze air quality index data collected from the United States and Mexico.

The air quality index (AQI) is a number that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 or below represents good air quality, while an AQI value over 300 represents hazardous air quality. Refer to this guide from [AirNow.gov](https://www.airnow.gov/aqi/aqi-basics/) for more information.

In this lab, you will work with NumPy arrays to perform calculations and evaluations with data they contain. Specifically, you'll be working with just the data from the numerical AQI readings.

##### Exercise 1: Create an array using NumPy

The EPA has compiled some AQI data where each AQI report has the state name, county name, and AQI. Refer to the table below as an example.

| state_name | county_name | aqi |
| ------- | ------- | ------ |
| Arizona | Maricopa | 18 |
| California | Alameda | 11 |
| California | Butte | 6 |
| Texas | El Paso | 40 |
| Florida | Duval | 15 |

<br/>

##### 1a: Create an array of AQI data

You are given an ordered `list` of AQI readings called `aqi_list`.

1. Use a NumPy function to convert the list to an `ndarray`. Assign the result to a variable called `aqi_array`.
2. Print the length of `aqi_array`.
3. Print the first five elements of `aqi_array`.

*Expected result:*

```
[OUT] 1725
      [18.  9. 20. 11.  6.]

In [22]:
import numpy as np

import ada_c2_labs as lab
aqi_list = lab.fetch_epa('aqi')

In [23]:
# 1. ### YOUR CODE HERE
aqi_array = np.array(aqi_list)

# 2. ### YOUR CODE HERE
print(len(aqi_array))

# 3. ### YOUR CODE HERE
print(aqi_array[:5])

1725
[18.  9. 20. 11.  6.]


#### Exercise 2: Calculate summary statistics

Now that you have the AQI data stored in an array, use NumPy functions to calculate some summary statistics about it.

* Use built-in NumPy functions to print the following values from `aqi_array`:
    1. Maximum value
    2. Minimum value
    3. Median value
    4. Standard deviation

*Expected result:*

```
[OUT] Max = 93.0
      Min = 0.0
      Median = 8.0
      Std = 10.382982538847708
```

In [24]:
### YOUR CODE HERE ###
print('Max =', np.max(aqi_array))
print('Min =', np.min(aqi_array))
print('Median =', np.median(aqi_array))
print('Std =', np.std(aqi_array))

Max = 93.0
Min = 0.0
Median = 8.0
Std = 10.382982538847708


#### Exercise 3: Calculate percentage of readings with cleanest AQI

You are interested in how many air quality readings in the data represent the cleanest air, which we'll consider **readings of 5 or less.**

To perform this calculation, you'll make use of one of the properties of arrays that make them so powerful: their element-wise operability. For example, when you add an integer to an `ndarray` using the `+` operator, it performs an element-wise addition on the whole array.

```
[IN]  my_array = np.array([1, 2, 3])
      my_array = my_array + 10
      print(my_array)

[OUT] [11, 12, 13]
```

**The same concept applies to comparison operators used on an `ndarray`.** With this in mind:

* Calculate the percentage of AQI readings that are considered cleanest:
    1. Use a comparison statement to get an array of Boolean values that is the same length as `aqi_array`. Assign the result to variable called `boolean_aqi`.
    2. Calculate the number of `True` values in the `boolean_aqi` and divide this number by the total number of values in the array. Assign the result to a variable named `percent_under_6` and print it.

*Expected result:*

```
[OUT] 0.3194202898550725
```


In [25]:
# 1. ### YOUR CODE HERE ###
boolean_aqi = (aqi_array <= 5)

# 2. ### YOUR CODE HERE ###
percent_under_6 = boolean_aqi.sum() / len(boolean_aqi)
percent_under_6

0.3194202898550725

<a id="conclusion"> </a>
### 9. Conclusion

Numpy:
* Python packages contain functions to perform specific tasks.
    * The NumPy package has functions used for working with arrays and performing mathematical operations
* Arrays are similar to lists, but only store one type of data per array.
    * Processing data stored in an array is much quicker than processing data stored in traditional lists.
* Arrays are useful for performing element-wise operations, including arithmetic and comparisons.
