# Welcome to the Jupyter Notebook! Jupyter Notebook = interactive, online coding environment. You will use this notebook to learn more about air quality data and how to analyse it.

Anything with a # in front of it is a comment! Look out for comments in each block of code to explain what's going on.

# # Part 1

In [1]:
#This block contains the libraries - external pieces of python code we use to support the operations in this notebook.

#%%capture
# ^ if you would like to suppress the output from this block, uncomment the "%%capture" line. Try it out! 
import csv
import pytz
import time
import pandas as pd
from datetime import datetime
import os
from matplotlib import pyplot as plt
import folium
from folium import plugins

In [3]:
#below, we read in the data.
#we will start with static data collected from a test sensor we sent to EKTU, which was deployed on campus.

aq = "ektu_static_data.csv" 

data = pd.read_csv(aq, engine='python') #contents of EKTU STATIC DATA

#then we set the timezone so the code knows what it is, and print ourselves a message to verify we read it in correctly
tiz = pytz.timezone('Asia/Thimphu')
print("aq read")

#if there's any accidentally duplicated data in the sheet, we omit it here
data.drop_duplicates(keep=False,inplace=True)

aq read


In [4]:
#here, we filter out any erroneous lat/lon values, like zeroes
data = data.loc[(data[['latitude', 'longitude']] != 0).all(axis=1)]
data = data[data[['latitude', 'longitude']].notnull().all(1)]

In [5]:
#now, we filter out any timestamp errors
data = data.loc[(data[['localtime']] > '2021-01-01').all(axis=1)]

In [6]:
#last, we filter out any data errors
data = data.loc[(data['humidity'] > 0)]
data = data.loc[(data['humidity'] < 100)]

data = data.loc[(data['temperature'] > -50)]
data = data.loc[(data['temperature'] < 80)]

data = data.loc[(data['PM25'] > -0.0001)]
data = data.loc[(data['PM25'] < 200)]

data = data.loc[(data['PM1'] > -0.0001)]
data = data.loc[(data['PM1'] < 200)]

data = data.loc[(data['PM10'] > -0.0001)]
data = data.loc[(data['PM10'] < 200)]

In [7]:
#we're all set up! now let's find some parameters from the deployment in time series - let's start with temperature! 
#this will be for the month of November.

#let's get some basic information about temperature from this deployment - 
#the average temperature the devices saw
#the maximum temperatures the devices saw
#the minimum temperature the devices sat 
tempavg = data['temperature'].mean()
tempmax = data['temperature'].max()
tempmin = data ['temperature'].min()

#now that we have those values, let's print them out so we can take a look
print("Avg Temp",tempavg, "Max Temp",tempmax, "Min Temp",tempmin)

Avg Temp -0.8205761645392594 Max Temp 21.91 Min Temp -16.74


In [None]:
#great! now let's make a plot of the total temperature during the time we have collected the data for.

plt.plot(data["localtime"], data["temperature"], 'r')
plt.xlabel("localtime")
plt.ylabel("Temperature (C)")

Now you know how to get temperature information for the deployment. What other information can we visualize? Let's try to get some other parameters.

In [None]:
#in this block, write some code to find the average, minimum, and maximum humidity! 
#the parameter is 'humidity'

humavg = '''fill this in!'''
hummax = '''fill this in!'''
hummin = '''fill this in!'''

#what should we print? code it here!

In [None]:
#now let's make a plot of the total humidity during the time we have collected the data for.

plt.plot(data["localtime"], data["humidity"], 'g')
plt.xlabel("localtime")
plt.ylabel(" Relative Humidity (%)")

In [None]:
#in this block, write some code to visualize the average, minimum, and maximum pm2.5! 
#the parameter is 'PM25'

pm25avg = '''fill this in!'''
pm25max = '''fill this in!'''
pm25min = '''fill this in!'''

#what should we print? code it here!

In [None]:
#can you write the code to visualize this data?

plt.plot('''fill this in!''', 'b')
plt.xlabel('''fill this in!''')
plt.ylabel("Pm 2.5")

In [None]:
#for the last section of part 1, let's see where this data is coming from!

#we're going to make a map. we start by setting a center point for the map to display the data
coords = data.loc[:,['latitude','longitude']].values
start_point=coords[0]

In [None]:
#here, we set up the specifications for the map
ektumap = folium.Map(location= start_point, tiles='Stamen Terrain', zoom_start=14)

#this will loop through the data and show us where it's coming from
for i,row in data.iterrows():
    folium.CircleMarker((row.latitude,row.longitude), radius=4, weight=1, color='blue', fill_color='blue', fill_opacity=.5).add_to(ektumap)

#this will display our map! 
ektumap

#here we save an html version of the map - you can zoom in and out of it and interact with it!
#uncomment and run if you want to use
#ektumap.save('ektumap.html')

# # Part 2

Amazing! We've gotten a bunch of information(add time series) from a static sensor in Kazakhtan. How does the data look different when we have more than one sensor, and when those sensors are moving? Let's examine some data from a different deployment, in New York City, to find out. 

In [8]:
#let's read in our new data below

aq1 = "AQ_orgfid.csv" 

data = pd.read_csv(aq1, engine='python') #contents of AQ_orgfid_clean.csv

#then we set the timezone so the code knows what it is, and print ourselves a message to verify we read it in correctly
tiz = pytz.timezone('America/New_York')
print("aq1 read")

#if there's any accidentally duplicated data in the sheet, we omit it here
data.drop_duplicates(keep=False,inplace=True)

aq1 read


In [9]:
#let's do the same data cleaning process as before - filter out any erroneous lat/lon values, like zeroes
data = data.loc[(data[['lat', 'long']] != 0).all(axis=1)]
data = data[data[['lat', 'long']].notnull().all(1)]

In [10]:
#once we do that, we set a center point for the map we are going to make to display the data
coords = data.loc[:,['lat','long']].values
start_point=coords[0]

We've finished the setup portion! Now, we will build our map in the next code block.

In [11]:
#here, we set up the specifications for the map
Pm25map = folium.Map(location= start_point, tiles='Stamen Terrain', zoom_start=14)

#run the block as-is once, then uncomment the below line and run again to see the difference 
data = data.loc[(data['pm25'] > 100)]

#we will start by plotting hotspots of PM2.5 on the map
for i,row in data.iterrows():
    #after you run this code block once, try changing these parameters to see what happens! 
    folium.CircleMarker((row.lat,row.long), radius=4, weight=1, color='red', fill_color='red', fill_opacity=.5).add_to(Pm25map)

#this will display our map! 
Pm25map

#here we save an html version of the map - you can zoom in and out of it and interact with it!
#uncomment and run if you want to use
#Pm25map.save('Pm25map.html')
