# Proyek Analisis Data: Air Quality Dataset
- **Nama:** Efrado Suryadi
- **Email:** efradosuryadi@gmail.com
- **ID Dicoding:** efrado_suryadi_tPYl

## Menentukan Pertanyaan Bisnis

- Pertanyaan 1
- Pertanyaan 2

## Import Semua Packages/Library yang Digunakan

Imports for dealing with data:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Imports for dealing with files paths:

In [2]:
import os

## Data Wrangling

### Data information

#### Data source

Data source for this project is downloaded from the https://github.com/marceloreis/HTI/tree/master. The repository consists of monitoring data of air quality from stations in Beijing, China, which includes:

- Aotizhongxin
- Changping
- Dingling
- Dongsi
- Guanyuan
- Gucheng
- Huairou
- Nongzhanguan
- Shunyi
- Tiantan
- Wanliu
- Wanshouxigong

#### Data features

Features on the data and their explanations:

- `No` : An index or row data for identifying row of data.
- `year`: The year of the recorded data.
- `month`: The month of the recorded data.
- `day`: The day of the month of the recorded data.
- `hour`: The hour of the day (0-23) when the observation was made.
- `PM2.5`: Concentration of particulate matter with a diameter of 2.5 micrometers or smaller (measured in µg/m³). The higher its level is, the more dangerous it is, as it can affect health condition, especially respitory conditions.
- `PM10`: Concentration of particulate matter with a diameter of 10 micrometers or smaller (measured in µg/m³). Similar to PM2.5 but includes larger particles; important for assessing overall air quality.
- `SO2`: Concentration of sulfur dioxide (measured in µg/m³). A pollutant that can come from industrial processes and fossil fuel combustion; high levels can cause respiratory problems.
- `NO2`: Concentration of nitrogen dioxide (measured in µg/m³). A pollutant from vehicle emissions and other sources; contributes to smog and can affect lung function.
- `CO`: Concentration of carbon monoxide (measured in µg/m³). A colorless, odorless gas produced by incomplete combustion; can be harmful at high levels.
- `O3`: Concentration of ozone (measured in µg/m³). Ground-level ozone is a key component of smog and can harm health and the environment; usually forms in the presence of sunlight.
- `TEMP`: Temperature (measured in degrees Celsius). Provides context for air quality readings; can influence pollutant concentrations and reactions.
- `PRES`: Atmospheric pressure (measured in hPa or millibars). Important for understanding weather patterns and conditions affecting air quality.
- `DEWP`: Dew point temperature (measured in degrees Celsius). Indicates humidity levels and can help understand weather conditions affecting air quality.
- `RAIN`: Amount of rainfall (measured in mm). Rain can help clear pollutants from the air, so it's important for understanding air quality variations.
- `wd`: Wind direction (usually measured in degrees). Provides information about the source of air pollutants and can affect dispersion.
- `WSPM`: Wind speed (measured in meters per second or km/h). Important for understanding how pollutants disperse in the atmosphere.
- `station`: Identifier or name of the monitoring station where the data was collected. Helps identify the geographical location of the measurements, which is crucial for spatial analysis of air quality.

### Gathering Data

#### Reading data from the `data` directory

Get the directory path of `data` directory that consists of all of the `csv` files:

In [4]:
data_path = os.path.join(os.getcwd(), "data")

Get the name of all `.csv` files as a list:

In [5]:
data_list = os.listdir(data_path)
data_list

['PRSA_Data_Aotizhongxin_20130301-20170228.csv',
 'PRSA_Data_Changping_20130301-20170228.csv',
 'PRSA_Data_Dingling_20130301-20170228.csv',
 'PRSA_Data_Dongsi_20130301-20170228.csv',
 'PRSA_Data_Guanyuan_20130301-20170228.csv',
 'PRSA_Data_Gucheng_20130301-20170228.csv',
 'PRSA_Data_Huairou_20130301-20170228.csv',
 'PRSA_Data_Nongzhanguan_20130301-20170228.csv',
 'PRSA_Data_Shunyi_20130301-20170228.csv',
 'PRSA_Data_Tiantan_20130301-20170228.csv',
 'PRSA_Data_Wanliu_20130301-20170228.csv',
 'PRSA_Data_Wanshouxigong_20130301-20170228.csv']

Examples of opening one `.csv` file from the `data` directory (`Aotizhongxin` in this case):

In [6]:
example_df = pd.read_csv(os.path.join(data_path, data_list[0]))
example_df

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,4.0,7.0,300.0,77.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Aotizhongxin
1,2,2013,3,1,1,8.0,8.0,4.0,7.0,300.0,77.0,-1.1,1023.2,-18.2,0.0,N,4.7,Aotizhongxin
2,3,2013,3,1,2,7.0,7.0,5.0,10.0,300.0,73.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Aotizhongxin
3,4,2013,3,1,3,6.0,6.0,11.0,11.0,300.0,72.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Aotizhongxin
4,5,2013,3,1,4,3.0,3.0,12.0,12.0,300.0,72.0,-2.0,1025.2,-19.5,0.0,N,2.0,Aotizhongxin
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35059,35060,2017,2,28,19,12.0,29.0,5.0,35.0,400.0,95.0,12.5,1013.5,-16.2,0.0,NW,2.4,Aotizhongxin
35060,35061,2017,2,28,20,13.0,37.0,7.0,45.0,500.0,81.0,11.6,1013.6,-15.1,0.0,WNW,0.9,Aotizhongxin
35061,35062,2017,2,28,21,16.0,37.0,10.0,66.0,700.0,58.0,10.8,1014.2,-13.3,0.0,NW,1.1,Aotizhongxin
35062,35063,2017,2,28,22,21.0,44.0,12.0,87.0,700.0,35.0,10.5,1014.4,-12.9,0.0,NNW,1.2,Aotizhongxin


**Insight:**
- xxx
- xxx

### Assessing Data

**Insight:**
- xxx
- xxx

### Cleaning Data

**Insight:**
- xxx
- xxx

## Exploratory Data Analysis (EDA)

### Explore ...

**Insight:**
- xxx
- xxx

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

**Insight:**
- xxx
- xxx

## Analisis Lanjutan (Opsional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2