![Before and after lockdown](https://i.insider.com/5e9f6084e61f342bbd063f47?width=600&format=jpeg)

# Task: How to improve the air quality in three cities in India?

### Which city to invest in to improve air pollution for the long run. 
 * You'll have the support for 3 years. 
 * Then, if you are successful, I will give same amount for 2 more cities. 
 
### Provide a rough plan
 * Which geographical city / area to focus on, 
 * Why that city,
 * How to make improvements
 * How to track progress


The lockdown presented opportunities to improve the air quality and with that the health of many people in India.
What can we do keep the air quality nice.

### Below you can see, what can be possible in terms of air quality.


## 3 cities are selected for the clean air program, based on the analysis further below.

### Pilot project: Ahmedabad
 **Ahmedabad** is the most polluted and hazardous air in India. And it has the highest potential to be a clean air city.
* This is our pilot project, because: 
 * If Ahmedabad gets clean, it is a great success to inspire every other city.
 * Next steps will be to analyse the distributors to pollutions.
 * It might be the unfiltered exhausts of transportation, heating and industry.

### Follow up cities
The air is unhealthy and tens of millions of people are affected.
* **Delhi**
* **Mumbai** 

### First steps are taken 
First steps are taken as you can see in the % difference to previous year.
The air is getting better - let's keep on doing that.

* Get filters to clean the air and enjoy beautiful india.
* Support new types of transportation
* Set up modern waste management

Further below, you can see why I arrived at this conclusion.

In [None]:
import pandas as pd

#plotting library
import matplotlib.pyplot as plt
import seaborn as sns             


# interactive plotting library
import plotly.express as px       
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly.offline import iplot
from plotly.subplots import make_subplots


import pandas_profiling # library for automatic EDA
#%pip install autoviz # installing and importing autoviz, another library for automatic data visualization
#from autoviz.AutoViz_Class import AutoViz_Class


from IPython.display import HTML
from IPython.display import display # display from IPython.display


import os

from scipy import stats # statistical library
from statsmodels.stats.weightstats import ztest # statistical library for hypothesis testing
from itertools import cycle # function used for cycling over values

In [None]:
pd.set_option('display.max_columns', 100) # Setting pandas to display a N number of columns
pd.set_option('display.max_rows', 20) # Setting pandas to display a N number rows
pd.set_option('display.width', 1000) # Setting pandas dataframe display width to N

In [None]:
home = '../input/air-quality-data-in-india'
try:
    city_day = pd.read_csv(os.path.join(home, 'city_day.csv'))
    city_hour = pd.read_csv(os.path.join(home, 'city_hour.csv'))
    station_day = pd.read_csv(os.path.join(home, 'station_day.csv'))
    station_hour = pd.read_csv(os.path.join(home, 'station_hour.csv'))
    stations = pd.read_csv(os.path.join(home, 'stations.csv'))
except:
    print('File names have changed!')

In [None]:
home2 = '../input/city-hour-pollutants-pop'
try:
  city_hour_pollutants_pop = pd.read_csv(os.path.join(home2, 'city_hour_pollutants_pop.csv'))
except:
    print('File names have changed as well!')

# When is air quality bad enough that you should stay inside?

An AQI under 50 means that the air quality is good. At this low AQI level, a person can spend time outdoors and air pollution will pose very little risk to their health. As the AQI number increases, so does the risk to human health. 

Pollutants can be grouped into 
* **acidifying substances**, 
* **particulates**
* **ozone precursors** 

*(See the chart below for a summary of the AQI levels of health concern.)*


<img src="https://scijinks.gov/review/air-quality/air-quality3.png" width="600" align = "left"/>


# Let's check the average Air Quality in the cities?

In [None]:
%%html
<div class='tableauPlaceholder' id='viz1599471771237' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020-&#47;CityvsAQI2018-2020&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AirQualityDatainIndia2015-2020-&#47;CityvsAQI2018-2020' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020-&#47;CityvsAQI2018-2020&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='de' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1599471771237');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.5)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

In [None]:
#city_hour_pollutants_pop
city_hour_pollutants_pop_selected = city_hour_pollutants_pop[['City','AQI', 'PM2.5', 'PM10', 'CO', 'O3', 'NO2', 'SO2','Date_year']]
city_hour_pollutants_pop_selected_2018_2020 = city_hour_pollutants_pop_selected.loc[city_hour_pollutants_pop_selected['Date_year'] > 2018]
city_pollutants_2018_2020 = city_hour_pollutants_pop_selected_2018_2020[['City','AQI', 'PM2.5', 'PM10', 'CO', 'O3', 'NO2', 'SO2']]
city_pollutants_2018_2020.groupby(["City"]).mean().sort_values(by=['AQI'],ascending = False).style.background_gradient('YlGn')

# Most harmful pollutants you’re breathing every day

* Highest impact on AQI has **particulate matter (PM10, PM2.5)**
* **NO2** is a precursor for **O3** and **particulate matter**
* **CO** is a killer pollutant
* **O3** is toxic and creates poor visibility.
* **SO2** is caused by industrial production sites - causing acid rain and poor visibility.

### *Particulate matter (particle pollution)*
> * Particulate matter (also called airborne particles or PM) consists of particles in the air, including dirt, dust, and smoke, and tiny drops of liquid. 
* **PM10 (Coarse particles)** Coarse particles, or PM10, are inhalable particles with a diameter ranging between 2.5 and 10 microns.
All that dust floating around your attic or the ominous smoke billowing from a wildfire are great examples of PM10 particles that you can see. These airborne particles can affect your throat, eyes, and nose, and can cause serious health effects.
* *PM2.5 (Fine particles)*
Fine particles, or PM2.5, are inhalable particles with a diameter of less than 2.5 microns, which means they can only be seen underneath a microscope. Common sources of fine particulate matter include pet dander, dust mites, bacteria, and dust from construction and demolition sites.
* Long-term exposure to PM2.5 can also reduce both your lung function and life expectancy.

### **Carbon Monoxide (CO)** 
> * Carbon monoxide (CO), known as the “invisible killer,” is an odorless, colorless gas that frequently goes undetected.
* Carbon monoxide is typically created from combustion processes, like the burning of wood, oil, coal, charcoal, natural gas, and propane, but it can also be found indoors from:
  * unvented kerosene and gas heaters
  * leaking chimneys and fireplaces
  * back-drafting from furnaces and water heaters

### Ozone (O3)
> * Ozone is a naturally occurring gas found in both the Earth’s upper atmosphere, where it helps block out harmful ultraviolet light from the sun. However, when ozone is found at ground level, it’s toxic to human beings.
* Ground-level ozone is formed when pollutants emitted by cars, power plants, refineries, and other sources chemically react in the presence of sunlight. Ever wonder why there’s more of that unsightly smog during hot summer days? That’s because the hotter the day and the stronger the sun, the more ozone is formed.

### Nitrogen Dioxide (NO2)
> * Nitrogen dioxide (NO2) is a harsh-smelling gas formed as a result of road traffic and other fossil fuel combustion processes. 
* To make matters more malicious, nitrogen dioxide is also a precursor for ozone and particulate matter, and it plays a role in the formation of acid rain.

### Sulfur Dioxide (SO2)
> * Sulfur dioxide (SO2) is a colorless gas or liquid with a strong, pungent odor.
* Unfortunately, the presence of sulfur dioxide in the air is almost exclusively man-made. Sulfur dioxide is produced when fossil fuels such as coal and oil are burned in industrial processes, and when mineral ores like aluminum are smelted. 
* This noxious gas is also frequently responsible for causing poor visibility and acid rain.


# Density plot on pollutants

How does the pollution look over time?

*Visuals below are made with Tableau.*

In [None]:
%%html

<div class='tableauPlaceholder' id='viz1599472088850' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;QG&#47;QGD5DFD6M&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='path' value='shared&#47;QGD5DFD6M' /> <param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;QG&#47;QGD5DFD6M&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='de' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1599472088850');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.5)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

# Let's drill down to all cities' air quality to check whats going on?

In [None]:
%%html
<div class='tableauPlaceholder' id='viz1599306316342' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020v4&#47;AQIperweekdayovermonthandyearsallcities&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AirQualityDatainIndia2015-2020v4&#47;AQIperweekdayovermonthandyearsallcities' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020v4&#47;AQIperweekdayovermonthandyearsallcities&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='de' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1599306316342');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

In [None]:
%%html

<div class='tableauPlaceholder' id='viz1599479793706' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020-&#47;AverageAQIanddifferenceinallcities-&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AirQualityDatainIndia2015-2020-&#47;AverageAQIanddifferenceinallcities-' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020-&#47;AverageAQIanddifferenceinallcities-&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='de' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1599479793706');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='650px';vizElement.style.height='887px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='650px';vizElement.style.height='887px';} else { vizElement.style.width='100%';vizElement.style.height='727px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

# There is no difference in AQI during daytime - just for one city.

In [None]:
%%html
<div class='tableauPlaceholder' id='viz1599310121339' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020v4&#47;DaytimedifferenceofAQI&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AirQualityDatainIndia2015-2020v4&#47;DaytimedifferenceofAQI' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ai&#47;AirQualityDatainIndia2015-2020v4&#47;DaytimedifferenceofAQI&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='de' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1599310121339');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='80%';vizElement.style.height=(divElement.offsetWidth*0.5)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

In [None]:
#pd.set_option("display.max_rows", 999)

def explorer(data):
    print("\nHEAD",80*"_")
    display(data.head(5))
    print("\nINFO",80*"_")
    print(data.info())
    #print("\nCOLUMNS",80*"_")
    #print(data.columns)
    print("\nSHAPE",80*"_")
    print(data.shape)
    print("\nUNIQUE VALUES PER COLUMNS",80*"_")
    frame = data.nunique().to_frame()
    display(frame)
    print("\nDESCRIBE",80*"_")
    display(data.describe(include = "all"))

In [None]:
explorer(city_hour_pollutants_pop)

In [None]:
city_hour_pollutants_pop["City"].value_counts()

In [None]:
major_cities = ["Ahmedabad","Delhi","Bengaluru","Mumbai","Patna","Chennai","Lucknow"]
cities = city_hour_pollutants_pop["City"].unique()

data_1 = city_hour_pollutants_pop[city_hour_pollutants_pop['Date_year'].notnull()]
data_2 = city_hour_pollutants_pop.loc[city_hour_pollutants_pop['City'].isin(cities)]
data_2 = data_2.dropna().sort_values(["Date_year","Date_month_number","Date_dayofmonth","Date_hour"])

In [None]:
city_hour_pollutants_pop.columns

In [None]:
import plotly.express as px



fig = px.scatter(data_2, 
                 x="Date_dayofmonth", 
                 y="Date_month_number", 
                 animation_frame="Date_hour", 
                 animation_group="City",
                 size="AQI", 
                 color="City", 
                 facet_col="Date_year",
                 hover_name="City",
                 range_x=[-1,25],
                 range_y=[-1,32],
                 size_max=50
                )
#log_x=True, , range_x=[100,100000], 
fig.show()

# Time series analysis: Seasonality (WIP)

In [None]:
#data_for_time = data_for_time.set_index("Date")

In [None]:
city = ["Delhi"]
data_for_time = city_hour_pollutants_pop.loc[city_hour_pollutants_pop['City'].isin(city)]

ax = data_for_time.plot(x ='Date', y='AQI', kind = 'line')

In [None]:
pd.plotting.lag_plot(data_for_time["AQI"], lag = 1)
plt.title('Lag Plot for Delhi AQI')
#plt.xlabel('categories')
#plt.ylabel('values')
plt.show()