#### Last Name: Upadhyay
#### ID: XT81177


## Instruction for Quiz on Python Data Structures and Basic Data Analysis

This quiz is designed to assess your knowledge of Python data structures and your ability to perform data analysis using basic Python functions such as lists, tuples, dictionaries, loops, conditionals, and functions.

### Part I: Understanding Data Structures
The first part of the quiz focuses on understanding and manipulating data structures through a series of problems. You'll work with small datasets to apply concepts related to lists, tuples, dictionaries, and more.

### Part II: Fetching and Manipulating Data from an API
In the second part, you will learn how to fetch data from a public dataset using the Air Quality API ([https://aqicn.org/api/](https://aqicn.org/api/)). The challenge involves performing basic data manipulation without relying on Pandas and NumPy libraries.

**Please clearly write your last name and ID below:**

_Last Name:_  
_ID:_

Feel free to add your own code and conduct any extra analysis. Bonus points will be awarded for additional insights and creativity in your solutions.


## I) Python Data Structures and Analysis 


**Question 1: Dictionary Conversion:**
- Given a list of air quality measurements as tuples: [('PM2.5', 35), ('PM10', 40), ('NO2', 20)], 
- Write a Python script to convert this list into a dictionary where the first element of each tuple is the key and the second element is the value

In [1]:
# Your code here
measurements = [('PM2.5', 35), ('PM10', 40), ('NO2', 20)]
# Convert to dictionary
measurements_dic = {key: value for key, value in measurements}
print(measurements_dic)

{'PM2.5': 35, 'PM10': 40, 'NO2': 20}


**Question 2: Updating Dictionary Values**
- Write a Python function update_quality_index(air_quality, pollutant, index) that updates the air quality index of a given pollutant in a dictionary air_quality.

In [2]:
# Define the function here
def update_quality_index(air_quality, pollutant, index):
    # Your code here
    """
    Update the air quality index of a given pollutant in a dictionary.

    Args:
    - air_quality (dict): Dictionary containing air quality measurements.
    - pollutant (str): Name of the pollutant to update.
    - index (int): New air quality index value.

    Returns:
    - None: Modifies the 'air_quality' dictionary in place.
    """
    # Check if the pollutant exists in the air_quality dictionary
    if pollutant in air_quality:
        # Update the air quality index of the given pollutant
        air_quality[pollutant] = index
    else:
        # If the pollutant doesn't exist in the dictionary, add it with the new index value
        air_quality[pollutant] = index
    

**Question 3: Set Intersection**
- Given two sets of cities, cities_with_good_air = {'CityA', 'CityB', 'CityC'} and cities_with_poor_air = {'CityB', 'CityD', 'CityE'}
- write a Python script to find the cities that are in both sets.

In [3]:
# Your code here to find common cities
cities_with_good_air = {'CityA', 'CityB', 'CityC'}
cities_with_poor_air = {'CityB', 'CityD', 'CityE'}
# Find common cities
common_cities = cities_with_good_air & cities_with_poor_air # We are using & operator 

print("Common cities:", common_cities)

Common cities: {'CityB'}


**Question 4: NumPy Array Creation and Mean Calculation**
- Demonstrate how to create a NumPy array from a list of average PM2.5 values [25, 35, 45, 20] and calculate the mean value.

In [5]:
# Your code here for NumPy operations
import numpy as np 
pm_values = [25, 35, 45, 20]
# Convert list to NumPy array
pm_array = np.array(pm_values)
# Convert to NumPy array and calculate mean
pm_mean = np.mean(pm_array)

print("Mean PM2.5 value:", pm_mean)

Mean PM2.5 value: 31.25


**Quesion5: The Python and R Data Structure**


R and Python are both powerful programming languages for data analysis, but they handle data types differently.

Vectors in R: The fundamental data structure in R is the vector. Vectors in R are homogeneous, meaning all elements must be of the same type. R supports atomic vectors (logical, integer, double, character, complex) and lists (which can be heterogeneous).

Data Frames in R: A data frame in R is similar to a table in a relational database or a dataframe in Python's pandas library. It's a list of vectors of equal length, making it a two-dimensional structure that can store different types of data (numeric, character, logical, etc.) in each column, similar to a spreadsheet.

Lists in Python: Python lists are heterogeneous, meaning they can contain elements of different types. Lists are ordered collections and are mutable.

Dictionaries in Python: A dictionary in Python is a collection of key-value pairs. Keys are unique within a dictionary, and the values can be of any data type. Dictionaries are unordered (though as of Python 3.7, insertion order is preserved as an implementation detail).

 Quesiton **List and Vector Operations**
 
- Consider you have a Python list py_list = [10, 20, 30, 40] and an R vector r_vector <- c(10, 20, 30, 40).

- In Python, add a new element 50 to py_list and demonstrate how to iterate over py_list to print each element's value multiplied by 2.

- In R, perform the same operations: add a new element 50 to r_vector and write a loop to print each element's value multiplied by 2.

- Discuss the differences in syntax and approach between adding elements and iterating over the collection in R and Python.

In [7]:
# Python
py_list = [10, 20, 30, 40] # Creating the list
py_list.append(50) # Appending the new element into the list
# Iterating over py_list to print each element's value multiplied by 2
for item in py_list:
    print(item * 2)


20
40
60
80
100


In [None]:
# R
r_vector <- c(10, 20, 30, 40) # Creating the list
r_vector <- c(r_vector, 50) # Adding a new element to r_vector
# Looping over r_vector to print each element's value multiplied by 2
for(item in r_vector) {
  print(item * 2)
}

In [None]:
#Discuss the differences in syntax and approach between adding elements and iterating over the collection in R and Python.

Differences in syntax and approach between R and Python:

1. **Adding Elements**:
   - In Python, to add an element to a list, we use the `append()` method.
   - In R, to add an element to a vector, we use the concatenation operator i.e., `c()`.

2. **Iterating Over Collection**:
   - In Python, we will typically use a `for` loop with `in` keyword to iterate over a list.
   - In R, we will also use a `for` loop, but the syntax is different. The loop iterates over each element of the vector using `for(item in vector)`.

3. **Indexing**:
   - In Python, indexing starts from 0 (zero-based indexing).
   - In R, indexing starts from 1 (one-based indexing).

4. **Syntax**:
   - Python syntax is more verbose, with the use of colons (`:`) to denote code blocks and indentation to specify block structure.
   - R syntax is more compact and resembles natural language, with curly braces (`{}`) denoting code blocks.

Overall, Python and R have different syntax and conventions for adding elements to collections and iterating over them. However, both languages provide similar functionalities for working with collections.

## II) Python Basic Functionaliy application: 
 
 - **Dataset: Air Quality Data Analysis**

1. Data Accessing from the Air Quality API

  The purpose of the API, and how to handle the response. Here's a step-by-step guide:

### Understanding the API Request

- **API Endpoint**: This is the URL to which you send the request. In this case, `https://api.waqi.info/feed/here/?token=YOUR_TOKEN` targets the "here" endpoint of the WAQI API, which automatically determines the location based on the IP address of the request. The `token` parameter is your API key, which authenticates your request.
  
- **Requests Library**: Python's `requests` library simplifies making HTTP requests. The `get` method is used to send a GET request to the API.

###  Making the GET Request

```python
response = requests.get(url)
```

- This line sends a GET request to the API endpoint and stores the response in a variable named `response`.

### Checking the Response Status

```python
if response.status_code == 200:
```

- HTTP status code 200 indicates success. This condition checks if the request was successful before proceeding.

### Parsing the JSON Response

```python
data = response.json()
```

- The `.json()` method converts the JSON response from the API into a Python dictionary, making it easy to access the data programmatically.

### Accessing Specific Data

```python
if 'data' in data and data['data'] is not None:
    aqi = data['data'].get('aqi')
    city = data['data'].get('city', {}).get('name')
```

- This checks if the key `'data'` exists and is not `None`. It then extracts the Air Quality Index (AQI) and the city name from the nested structure of the response data.

### Printing the Results

```python
print(f"Air Quality Index (AQI) for {city}: {aqi}")
```

- Finally, this prints the AQI along with the city name to the console. If the data for the requested location is not available, it prints an alternative message.



In [10]:
# The API SCRIPT
import requests

# API endpoint with the token
url = "https://api.waqi.info/feed/here/?token=88870b41d8fa20dcbb42be4d50920eab0f29bbf4" # Replace your KPI KEY here 

# Make a GET request to the WAQI API
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()
    
    # Print the entire response data
    print(data)
    
    # Example of accessing specific data
    # Check if the response contains the 'data' key and it's not None
    if 'data' in data and data['data'] is not None:
        aqi = data['data'].get('aqi')
        city = data['data'].get('city', {}).get('name')
        print(f"Air Quality Index (AQI) for {city}: {aqi}")
    else:
        print("Data not available for the requested location.")
else:
    print(f"Failed to fetch data: HTTP {response.status_code}")


{'status': 'ok', 'data': {'aqi': 19, 'idx': 8700, 'attributions': [{'url': 'http://www.mde.state.md.us/Pages/Home.aspx', 'name': 'Air Now - US EPA - Maryland state', 'logo': 'US-Maryland.png'}, {'url': 'http://www.airnow.gov/', 'name': 'Air Now - US EPA'}, {'url': 'https://waqi.info/', 'name': 'World Air Quality Index Project'}], 'city': {'geo': [39.462002, -76.631599], 'name': 'Padonia, Maryland, USA', 'url': 'https://aqicn.org/city/usa/maryland/padonia', 'location': ''}, 'dominentpol': 'o3', 'iaqi': {'h': {'v': 93}, 'o3': {'v': 19.2}, 'p': {'v': 1011}, 'pm25': {'v': 9}, 't': {'v': 10.5}, 'w': {'v': 1}, 'wg': {'v': 5.2}}, 'time': {'s': '2024-03-06 15:00:00', 'tz': '-05:00', 'v': 1709737200, 'iso': '2024-03-06T15:00:00-05:00'}, 'forecast': {'daily': {'o3': [{'avg': 11, 'day': '2024-03-04', 'max': 20, 'min': 2}, {'avg': 7, 'day': '2024-03-05', 'max': 16, 'min': 1}, {'avg': 4, 'day': '2024-03-06', 'max': 9, 'min': 1}, {'avg': 8, 'day': '2024-03-07', 'max': 12, 'min': 5}, {'avg': 8, 'day'

In [12]:
#SAMPLE SCRIPTS

# Given data dictionary
data = {
    'status': 'ok',
    'data': {
        'aqi': 23,
        'idx': 7422,
        'attributions': [
            {'url': 'http://www.mde.state.md.us/Pages/Home.aspx', 'name': 'Air Now - US EPA - Maryland state', 'logo': 'US-Maryland.png'},
            {'url': 'http://www.airnow.gov/', 'name': 'Air Now - US EPA'},
            {'url': 'https://waqi.info/', 'name': 'World Air Quality Index Project'}
        ],
        'city': {
            'geo': [39.055302, -76.878304],
            'name': 'HU-Beltsville, Maryland, USA',
            'url': 'https://aqicn.org/city/usa/maryland/hu-beltsville',
            'location': ''
        },
        'dominentpol': 'o3',
        'iaqi': {
            'co': {'v': 2.3},
            'h': {'v': 42.5},
            'no2': {'v': 8.5},
            'o3': {'v': 23.3},
            'p': {'v': 1017.6},
            'pm25': {'v': 13},
            'so2': {'v': 0.2},
            't': {'v': 1.9},
            'w': {'v': 4.5},
            'wg': {'v': 10.5}
        },
        'time': {
            's': '2024-02-18 10:00:00',
            'tz': '-05:00',
            'v': 1708250400,
            'iso': '2024-02-18T10:00:00-05:00'
        },
        'forecast': {
            'daily': {
                'o3': [
                    {'avg': 9, 'day': '2024-02-16', 'max': 16, 'min': 1},
                    # More o3 forecast data...
                ],
                'pm10': [
                    {'avg': 14, 'day': '2024-02-16', 'max': 29, 'min': 2},
                    # More pm10 forecast data...
                ],
                'pm25': [
                    {'avg': 38, 'day': '2024-02-16', 'max': 74, 'min': 5},
                    # More pm25 forecast data...
                ]
            }
        },
    },
    'debug': {'sync': '2024-02-19T00:18:22+09:00'}
}

# Checking if the status of the response is 'ok'
if data['status'] == 'ok':
    # Extracting and printing general information
    general_info = data['data']
    city_info = general_info['city']
    print(f"City: {city_info['name']}")
    print(f"Current AQI: {general_info['aqi']}")
    print(f"Dominant Pollutant: {general_info['dominentpol']}")
    
    # Printing attributions
    print("\nAttributions:")
    for attribution in general_info['attributions']:
        print(f"- {attribution['name']} ({attribution['url']})")
    
    # Extracting and printing individual pollutant indices
    print("\nIndividual Pollutant Indices:")
    for pollutant, value in general_info['iaqi'].items():
        print(f"- {pollutant.upper()}: {value['v']}")
    
    # Forecast data (simplified for brevity)
    print("\nForecast Data:")
    for pollutant, forecasts in general_info['forecast']['daily'].items():
        print(f"\n{pollutant.upper()} Levels:")
        for forecast in forecasts:
            print(f"- {forecast['day']}: Avg: {forecast['avg']}, Max: {forecast['max']}, Min: {forecast['min']}")

else:
    print("Failed to fetch data or status not ok.")


City: HU-Beltsville, Maryland, USA
Current AQI: 23
Dominant Pollutant: o3

Attributions:
- Air Now - US EPA - Maryland state (http://www.mde.state.md.us/Pages/Home.aspx)
- Air Now - US EPA (http://www.airnow.gov/)
- World Air Quality Index Project (https://waqi.info/)

Individual Pollutant Indices:
- CO: 2.3
- H: 42.5
- NO2: 8.5
- O3: 23.3
- P: 1017.6
- PM25: 13
- SO2: 0.2
- T: 1.9
- W: 4.5
- WG: 10.5

Forecast Data:

O3 Levels:
- 2024-02-16: Avg: 9, Max: 16, Min: 1

PM10 Levels:
- 2024-02-16: Avg: 14, Max: 29, Min: 2

PM25 Levels:
- 2024-02-16: Avg: 38, Max: 74, Min: 5


Question 1: Write a Function to Extract City Name
Write a Python function named get_city_name that takes the air quality data dictionary as an argument and returns the name of the city.

In [14]:
def get_city_name(data):
    """
    Extracts the name of the city from the air quality data dictionary.

    Args:
    - data (dict): Air quality data dictionary obtained from the API.

    Returns:
    - str: Name of the city.
    """
    # Check if the 'city' key exists in the data dictionary
    if 'data' in data and 'city' in data['data']:
        city_name = data['data']['city']['name']
        return city_name
    else:
        return None  # Return None if city name is not found

Question 2: List Comprehension to Get All Pollution Values
Using list comprehension, extract all the pollutant values (v) from the iaqi section of the air quality data. Assume the data structure is stored in a variable named data. Your answer should produce a list of values.

In [16]:
pollutant_values = [v for v in data.values()]  # Fill in the list comprehension
print(pollutant_values)

['ok', {'aqi': 23, 'idx': 7422, 'attributions': [{'url': 'http://www.mde.state.md.us/Pages/Home.aspx', 'name': 'Air Now - US EPA - Maryland state', 'logo': 'US-Maryland.png'}, {'url': 'http://www.airnow.gov/', 'name': 'Air Now - US EPA'}, {'url': 'https://waqi.info/', 'name': 'World Air Quality Index Project'}], 'city': {'geo': [39.055302, -76.878304], 'name': 'HU-Beltsville, Maryland, USA', 'url': 'https://aqicn.org/city/usa/maryland/hu-beltsville', 'location': ''}, 'dominentpol': 'o3', 'iaqi': {'co': {'v': 2.3}, 'h': {'v': 42.5}, 'no2': {'v': 8.5}, 'o3': {'v': 23.3}, 'p': {'v': 1017.6}, 'pm25': {'v': 13}, 'so2': {'v': 0.2}, 't': {'v': 1.9}, 'w': {'v': 4.5}, 'wg': {'v': 10.5}}, 'time': {'s': '2024-02-18 10:00:00', 'tz': '-05:00', 'v': 1708250400, 'iso': '2024-02-18T10:00:00-05:00'}, 'forecast': {'daily': {'o3': [{'avg': 9, 'day': '2024-02-16', 'max': 16, 'min': 1}], 'pm10': [{'avg': 14, 'day': '2024-02-16', 'max': 29, 'min': 2}], 'pm25': [{'avg': 38, 'day': '2024-02-16', 'max': 74, 'm

Question 3: Write a Function to Find the Maximum AQI Forecast
Write a function named max_aqi_forecast that takes the air quality data dictionary as input and returns the maximum AQI forecast value for PM2.5. If there are no PM2.5 forecasts, the function should return None.

In [17]:
def max_aqi_forecast(data):
    """
    Finds the maximum AQI forecast value for PM2.5 from the air quality data dictionary.

    Args:
    - data (dict): Air quality data dictionary obtained from the API.

    Returns:
    - int or None: Maximum AQI forecast value for PM2.5, or None if there are no forecasts.
    """
    # Your code here
    
    # Check if 'forecast' key exists in the data dictionary
    if 'data' in data and 'forecast' in data['data']:
        pm25_forecast = data['data']['forecast'].get('pm25', {})
        # Check if 'pm25' key exists in the forecast data
        if pm25_forecast:
            # Extract forecasted AQI values for PM2.5
            pm25_aqi_values = [int(forecast.get('aqi', 0)) for forecast in pm25_forecast.values()]
            # Return the maximum AQI forecast value for PM2.5
            return max(pm25_aqi_values, default=None)
    return None  # Return None if there are no PM2.5 forecasts


Question 4: Conditional Statements to Categorize AQI
Write a Python script that categorizes the current AQI (aqi value from the data) into "Good", "Moderate", "Unhealthy for Sensitive Groups", "Unhealthy", "Very Unhealthy", and "Hazardous" based on the AQI value ranges [0-50, 51-100, 101-150, 151-200, 201-300, 301+].

In [18]:
# Assume `aqi` variable holds the current AQI value
aqi = 215

# Conditional statements to categorize AQI
if aqi >= 0 and aqi <= 50:
    category = "Good"
elif aqi >= 51 and aqi <= 100:
    category = "Moderate"
elif aqi >= 101 and aqi <= 150:
    category = "Unhealthy for Sensitive Groups"
elif aqi >= 151 and aqi <= 200:
    category = "Unhealthy"
elif aqi >= 201 and aqi <= 300:
    category = "Very Unhealthy"
else:
    category = "Hazardous"

print("AQI Category:", category)

AQI Category: Very Unhealthy


Question 5: Loop Through Attributions and Create a URL List
Using a loop, iterate through the attributions section of the air quality data and create a list of URLs. Assume the data structure is stored in a variable named data.

In [31]:
attributions = data.get('attributions', [])
urls = []
# Your loop to fill the `urls` list
for attributions in attributions:
    url = attributions.get('url')
    if url:
        urls.append(url)

# Print the list of URLs
print("List of URLs:")
for url in urls:
    print(url)

List of URLs:


Question 6: Advanced Function for Pollution Summary
Create a function named pollution_summary that takes the air quality data dictionary as an argument and returns a summary dictionary. The summary should contain the number of pollutants measured, the highest pollutant value, and its corresponding pollutant name.

In [32]:
def pollution_summary(data):
    """
    Generate a summary dictionary containing the number of pollutants measured,
    the highest pollutant value, and its corresponding pollutant name.

    Args:
    - data (dict): Air quality data dictionary obtained from the API.

    Returns:
    - dict: Summary dictionary containing the following keys:
            - 'total_pollutants': Total number of pollutants measured.
            - 'highest_pollutant': Name of the pollutant with the highest value.
            - 'highest_value': Highest pollutant value.
    """
    # Your code here
    # Initialize variables to store summary information
    total_pollutants = 0
    highest_pollutant = None
    highest_value = float('-inf')  # Initialize with negative infinity

    # Check if 'iaqi' key exists in the data dictionary
    if 'data' in data and 'iaqi' in data['data']:
        # Count the number of pollutants measured
        total_pollutants = len(data['data']['iaqi'])

        # Iterate through pollutants and find the highest value
        for pollutant, values in data['data']['iaqi'].items():
            if 'v' in values:
                pollutant_value = values['v']
                if pollutant_value > highest_value:
                    highest_value = pollutant_value
                    highest_pollutant = pollutant

    # Return the summary dictionary
    return {"total_pollutants": total_pollutants, "highest_pollutant": highest_pollutant, "highest_value": highest_value}

