# Comprehensive Analysis for Environmental Dataset
This notebook provides a comprehensive analysis of the 'TEMP', 'pm2.5', and 'PRES' columns, including:
1. Loading the dataset.
2. Descriptive statistics, separated by individual summary statistics (mean, standard deviation, etc.).
4. Shapiro-Wilk normality test.
5. Yearly mean and standard deviation.

## 1. Loading the Dataset
We begin by loading the dataset and inspecting its first few rows.


In [3]:
import pandas as pd
file_path = 'Enviromental_dataset.csv'
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,time,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,01/01/2010 00:00,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,01/01/2010 01:00,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,01/01/2010 02:00,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,01/01/2010 06:00,2010,1,1,6,,-19,-9.0,1017.0,NW,19.23,0,0
4,01/01/2010 03:00,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0


## 2. Mean for 'TEMP', 'pm2.5', and 'PRES'
We calculate the mean for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [11]:
columns_of_interest = ['pm2.5', 'TEMP', 'PRES']


In [13]:
#Your Solution
df[columns_of_interest].mean()

pm2.5      97.784018
TEMP       12.448521
PRES     1016.447654
dtype: float64

## 3. Standard Deviation for 'TEMP', 'pm2.5', and 'PRES'
We calculate the standard deviation for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [15]:
#Your Solution
df[columns_of_interest].std()

pm2.5    91.398542
TEMP     12.198613
PRES     10.268698
dtype: float64

## 4. Minimum for 'TEMP', 'pm2.5', and 'PRES'
We calculate the minimum values for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [17]:
#Your Solution
df[columns_of_interest].min()

pm2.5      0.0
TEMP     -19.0
PRES     991.0
dtype: float64

## 5. Maximum for 'TEMP', 'pm2.5', and 'PRES'
We calculate the maximum values for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [19]:
#Your Solution
df[columns_of_interest].max()

pm2.5     994.0
TEMP       42.0
PRES     1046.0
dtype: float64

## 6. 50% Quantile for 'TEMP', 'pm2.5', and 'PRES'
We calculate the 50% Quantile for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [23]:
#Your Solution
df[columns_of_interest].quantile(0.5)

pm2.5      72.0
TEMP       14.0
PRES     1016.0
Name: 0.5, dtype: float64

## 7. Median for 'TEMP', 'pm2.5', and 'PRES'
We calculate the maximum values for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [27]:
#Your Solution
df[columns_of_interest].median()

pm2.5      72.0
TEMP       14.0
PRES     1016.0
dtype: float64

## 8. Yearly Mean and Standard Deviation for 'TEMP', 'pm2.5', and 'PRES'
Finally, we calculate the yearly mean and standard deviation for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [8]:
#Your Solution


In [29]:
# Alternative solution to get all information in one shot

df[columns_of_interest].describe()

Unnamed: 0,pm2.5,TEMP,PRES
count,43800.0,43824.0,43824.0
mean,97.784018,12.448521,1016.447654
std,91.398542,12.198613,10.268698
min,0.0,-19.0,991.0
25%,29.0,2.0,1008.0
50%,72.0,14.0,1016.0
75%,136.0,23.0,1025.0
max,994.0,42.0,1046.0


## Appendix. Additional Summary Statistics (Skewness, Kurtosis, IQR)
We calculate additional summary statistics including skewness, kurtosis, and the interquartile range (IQR) for the 'TEMP', 'pm2.5', and 'PRES' columns.


In [31]:
additional_stats = pd.DataFrame({
    'Skewness': df[columns_of_interest].skew(),
    'Kurtosis': df[columns_of_interest].kurt(),
    'IQR': df[columns_of_interest].quantile(0.75) - df[columns_of_interest].quantile(0.25)
})
additional_stats

Unnamed: 0,Skewness,Kurtosis,IQR
pm2.5,1.823656,5.032758,107.0
TEMP,-0.163304,-1.110977,21.0
PRES,0.098207,-0.846462,17.0
