 # EDA OF DATA(AIR_POLLUTION_IN_SEOUL)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image

In [None]:
Image.open('../input/seoul-image/seoul_image.jpg')

About this Dataset

Context

This dataset deals with air pollution measurement information in Seoul, South Korea. Seoul Metropolitan Government provides many public data, including air pollution information, through the 'Open Data Plaza' I made a structured dataset by collecting and adjusting various air pollution related datasets provided by the Seoul Metropolitan Government

Content

This data provides average values for six pollutants (SO2, NO2, CO, O3, PM10, PM2.5). - Data were measured every hour between 2017 and 2019. - Data were measured for 25 districts in Seoul.

Descripion of factors avialable in Data


SO2 - Sulfur dioxide is a gas. It is invisible and has a nasty, sharp smell. It reacts easily with other substances to form harmful compounds, such as sulfuric acid, sulfurous acid and sulfate particles.

About 99% of the sulfur dioxide in air comes from human sources. The main source of sulfur dioxide in the air is industrial activity that processes materials that contain sulfur, eg the generation of electricity from coal, oil or gas that contains sulfur. Some mineral ores also contain sulfur, and sulfur dioxide is released when they are processed. In addition, industrial activities that burn fossil fuels containing sulfur can be important sources of sulfur dioxide.

Sulfur dioxide is also present in motor vehicle emissions, as the result of fuel combustion. In the past, motor vehicle exhaust was an important, but not the main, source of sulfur dioxide in air. However, this is no longer the case.

NO2 - Nitrogen dioxide is part of a group of gaseous air pollutants produced as a result of road traffic and other fossil fuel combustion processes. Its presence in air contributes to the formation and modification of other air pollutants, such as ozone and particulate matter, and to acid rain.

CO - CO is a colorless, odorless gas that can be harmful when inhaled in large amounts. CO is released when something is burned. The greatest sources of CO to outdoor air are cars, trucks and other vehicles or machinery that burn fossil fuels. A variety of items in your home such as unvented kerosene and gas space heaters, leaking chimneys and furnaces, and gas stoves also release CO and can affect air quality indoors.

O3 - Ozone (O3) is a gas that can form and react under the action of light and that is present in two layers of the atmosphere. High up in the atmosphere, ozone forms a layer that shields the Earth from ultraviolet rays. However, at ground level, ozone is considered a major air pollutant.

PM10 & PM2.5 - PM stands for Particulate Matter. PM2.5 and PM10 are minute particles present in the air and exposure to it is very harmful for health. When the level of these particles increases and penetrate deeply in to the lungs, you can experience number of health impacts like breathing problem, burning or sensation in the eyes etc.

In [None]:

data_air= pd.read_csv('../input/air-pollution-in-seoul/AirPollutionSeoul/Measurement_summary.csv')


In [None]:
data_air.head()

## Data cleaning

In [None]:
data_air.isnull()

In [None]:
%matplotlib inline

In [None]:
data_measure=pd.read_csv('../input/air-pollution-in-seoul/AirPollutionSeoul/Original Data/Measurement_item_info.csv')

In [None]:
data_measure # Standard value of all of the factors

In [None]:
# checking weather data has any missing value or not
sns.heatmap(data_air.isnull(),yticklabels=False,cbar=False,cmap='viridis')

 Now we can conclude there is no any missing value in data given as there is plane bar for all factors

In [None]:
data_air.info()

In [None]:
req_data=data_air[['SO2','NO2','O3','CO','PM10','PM2.5']]

In [None]:
req_data.describe()

conclusion from description of data-

SO2 has mean count '-0.001795' which is much lesser than Good satandard.

NO2 has mean count '0.022519' which is also lesser than Good satandard of NO2.

O3 has mean count '0.017979' which is also lesser than Good satandard of O3.

CO has mean count '0.017979' which is also lesser than Good satandard of CO.

PM10 has mean count '43.708051' it will come under Normal standard of PM10.

PM2.5 has mean count '25.411995' it will also come under Normal standard of PM2.5.

so overall Seoul has good air Quality

In [None]:
plt.figure(figsize=(12,8))
sns.boxplot(data=req_data)

In [None]:
# converting 'SO2' value in their standard
def measure(x):
    if x<=0.02:
        return 'good'
    elif 0.02<x<=0.05:
        return 'normal'
    elif 0.05<x<=0.15:
        return 'bad'
    else:
        return 'very bad'
    
SO2_measure=list(map(measure,req_data['SO2']))

In [None]:
# converting 'NO2' value in their standard
def measure(x):
    if x<=0.03:
        return 'good'
    elif 0.03<x<=0.06:
        return 'normal'
    elif 0.06<x<=0.20:
        return 'bad'
    else:
        return 'very bad'
    
NO2_measure=list(map(measure,req_data['NO2']))

In [None]:
# converting 'O3' value in their standard
def measure(x):
    if x<=0.02:
        return 'good'
    elif 0.02<x<=0.05:
        return 'normal'
    elif 0.05<x<=0.15:
        return 'bad'
    else:
        return 'very bad'
    
O3_measure=list(map(measure,req_data['O3']))

In [None]:
# converting 'CO' value in their standard
def measure(x):
    if x<=2.00:
        return 'good'
    elif 2.00<x<=9.00:
        return 'normal'
    elif 9.00<x<=15.00:
        return 'bad'
    else:
        return 'very bad'
    
CO_measure=list(map(measure,req_data['CO']))

In [None]:
# converting 'PM10' value in their standard
def measure(x):
    if x<=30.00:
        return 'good'
    elif 30.00<x<=80.00:
        return 'normal'
    elif 80.00<x<=150.00:
        return 'bad'
    else:
        return 'very bad'
    
PM10_measure=list(map(measure,req_data['PM10']))

In [None]:
# converting 'PM2.5' value in their standard
def measure(x):
    if x<=15.00:
        return 'good'
    elif 15.00<x<=35.00:
        return 'normal'
    elif 35.00<x<=75.00:
        return 'bad'
    else:
        return 'very bad'
    
PM25_measure=list(map(measure,req_data['PM2.5']))

In [None]:
req_data['SO2_standard']=SO2_measure
req_data['NO2_standard']=NO2_measure
req_data['O3_standard']=O3_measure
req_data['CO_standard']=CO_measure
req_data['PM10_standard']=PM10_measure
req_data['PM2.5_standard']=PM25_measure



In [None]:
req_data

These above data is of description of no of observation recorded from different Stations.

 ## Analysis on the quality standard of of all the factors.

In [None]:
sns.countplot(x='SO2_standard',data=req_data)

In [None]:
sns.countplot(x='NO2_standard',data=req_data)

In [None]:
sns.countplot(x='O3_standard',data=req_data)

In [None]:
sns.countplot(x='CO_standard',data=req_data)

In [None]:
sns.countplot(x='PM10_standard',data=req_data)

In [None]:
sns.countplot(x='PM2.5_standard',data=req_data)

From above figures we can see that SO2 and CO level is quite good as all the data observed to be of 'good' standard.
O3 has much lesser quality than these two factors as some data are observed of 'normal' and 'bad' standard
and PM10 and PM2.5 has to be notice as there is very_bad standard of these factors observed.



In [None]:
plt.figure(figsize=(8,8))
sns.heatmap(req_data.corr(),
            vmin=0,
            cmap='coolwarm',
            annot=True);

At here from the above picture we can see that there is high correlation in between 'SO2 and O3' and lowest correlation is in between'O3 and PM 2.5'

In [None]:
req_data['Address']=data_air["Address"]

In [None]:
req_data

## Analysis on different Address of Seoul

### As 'PM10' and PM'2.5' are affecting more on air quality so we will try to see how these factors are affecting each city of Seoul

In [None]:
data_address=data_air["Address"].value_counts()
data_address

In [None]:
plt.figure(figsize=(19,10))

sns.countplot(x="PM10_standard",data=req_data,hue="Address",orient="v")


In [None]:
plt.figure(figsize=(19,15))

sns.countplot(x="PM2.5_standard",data=req_data,hue="Address",orient="v")