<h1>Exemplar: Vectors and arrays with NumPy</h1>

## Introduction 

Your work as a data professional for the U.S. Environmental Protection Agency (EPA) requires you to analyze air quality index data collected from the United States and Mexico.

The air quality index (AQI) is a number that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 or below represents good air quality, while an AQI value over 300 represents hazardous air quality. Refer to this guide from [AirNow.gov](https://www.airnow.gov/aqi/aqi-basics/) for more information.

In this lab, you will work with NumPy arrays to perform calculations and evaluations with data they contain. Specifically, you'll be working with just the data from the numerical AQI readings.

# Task 1: Create an array using NumPy

The EPA has compiled some AQI data where each AQI report has the state name, county name, and AQI. Refer to the table below as an example.

| state_name | county_name | aqi |
| ------- | ------- | ------ |
| Arizona | Maricopa | 18 |
| California | Alameda | 11 |
| California | Butte | 6 |
| Texas | El Paso | 40 |
| Florida | Duval | 15 |

<br/>

## 1a: Import NumPy

Import NumPy using its standard alias.

In [None]:
import numpy as np

## 1b: Create an array of AQI data

You are given an ordered `list` of AQI readings called `aqi_list`.

1. Use a NumPy function to convert the list to an `ndarray`. Assign the result to a variable called `aqi_array`.
2. Print the length of `aqi_array`.
3. Print the first five elements of `aqi_array`.

*Expected result:*

```
[OUT] 1725
      [18.  9. 20. 11.  6.]
```

In [None]:
import ada_c2_labs as lab
aqi_list = lab.fetch_epa('aqi')

In [None]:
aqi_array = np.array(aqi_list)
print(len(aqi_array))
print(aqi_array[:5])

1725
[18.  9. 20. 11.  6.]


# Task 2: Calculate summary statistics

Now that you have the AQI data stored in an array, use NumPy functions to calculate some summary statistics about it.

* Use built-in NumPy functions to print the following values from `aqi_array`:
    1. Maximum value
    2. Minimum value
    3. Median value
    4. Standard deviation

*Expected result:*

```
[OUT] Max = 93.0
      Min = 0.0
      Median = 8.0
      Std = 10.382982538847708
```

In [None]:
print('Max =', np.max(aqi_array))
print('Min =', np.min(aqi_array))
print('Median =', np.median(aqi_array))
print('Std =', np.std(aqi_array))

Max = 93.0
Min = 0.0
Median = 8.0
Std = 10.382982538847708


# Task 3: Calculate percentage of readings with cleanest AQI

You are interested in how many air quality readings in the data represent the cleanest air, which we'll consider **readings of 5 or less.**

To perform this calculation, you'll make use of one of the properties of arrays that make them so powerful: their element-wise operability. For example, when you add an integer to an `ndarray` using the `+` operator, it performs an element-wise addition on the whole array.

```
[IN]  my_array = np.array([1, 2, 3])
      my_array = my_array + 10
      print(my_array)

[OUT] [11, 12, 13]
```

**The same concept applies to comparison operators used on an `ndarray`.** With this in mind:

* Calculate the percentage of AQI readings that are considered cleanest:
    1. Use a comparison statement to get an array of Boolean values that is the same length as `aqi_array`. Assign the result to variable called `boolean_aqi`.
    2. Calculate the number of `True` values in the `boolean_aqi` and divide this number by the total number of values in the array. Assign the result to a variable named `percent_under_6` and print it.

*Expected result:*

```
[OUT] 0.3194202898550725
```




In [None]:
boolean_aqi = (aqi_array <= 5)

percent_under_6 = boolean_aqi.sum() / len(boolean_aqi)
percent_under_6

0.3194202898550725

# Conclusion

**What are your key takeaways from this lab?**

Python packages provide specialized functions for various tasks, such as NumPy, which offers tools for array manipulation and mathematical computations. Arrays, unlike lists, store a single data type and are processed much faster. They are ideal for element-wise operations, including arithmetic and comparisons.