<a href="https://colab.research.google.com/github/vitaliy-sharandin/data_science_projects/blob/master/portfolio/eda/Global_climate_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Datasets

#### Global
* NASA's Global Climate Change Datasets
* NOAA's Climate Datasets
* WHO Air Pollution Data Portal
* Climate Change Knowledge Portal
* Climate Change Initiative
* Citizen Weather Observer Program (CWOP/APRS)

* https://ourworldindata.org/co2-and-greenhouse-gas-emissions
* https://github.com/owid/co2-data

#### Krakow
* Krakow temperatures
https://www.visualcrossing.com/weather/weather-data-services
* Krakow pollution levels
https://powietrze.gios.gov.pl/pjp/archives
* Air corridors
* https://reasonstobecheerful.world/stuttgart-ventilation-corridors-green-cool-air/
* What is direction and average speed of wind in Krakow?

# Questions
1. Temperature Changes:

  * How has the global average temperature changed over the years?
  * Are there specific regions where the temperature is rising faster?
  * What is the correlation between greenhouse gas concentration and global temperature rise?

2. Greenhouse Gas Emissions:
  * Which countries are the largest emitters of greenhouse gases, and how has this changed over time?
  * What sectors (energy, agriculture, industrial processes, etc.) contribute most to greenhouse gas emissions globally and locally?
  * How do per capita emissions vary across different countries?

3. Effects on Biodiversity:
  * How is climate change impacting biodiversity in different ecosystems?
  * Are certain species more affected than others?
  * What is the correlation between climate change and the extinction rate?

4. Climate Policies and Agreements:
  * How have different countries' emissions changed after major climate agreements (like the Paris Agreement)?
  * Are countries meeting their Nationally Determined Contributions (NDCs)?
  * How do the policies and efforts of different countries compare?

5. Extreme Weather Events:
  * Has the frequency of extreme weather events (like hurricanes, heatwaves, heavy rain) changed over time?
  * What regions are most affected by these extreme events?
  * What's the economic impact of these events?
  




* Temperature by city/country
  * https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
  * https://climatedata.imf.org/datasets/4063314923d74187be9596f10d034914_0/explore
* Emmisions by city/country
  * https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022
  * https://ourworldindata.org/co2-and-greenhouse-gas-emissions
  * https://github.com/owid/co2-data
* Krakow temperatures
  * https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/miesieczne/klimat/
* Krakow emmisions
  * https://powietrze.gios.gov.pl/pjp/archives

In [1]:
!pip install -U -q datasets
!pip install -U -q ydata-profiling

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m357.3/357.3 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.7/102.7 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m679.5/679.5 kB[0m [31m31.2 MB/

In [2]:
import pandas as pd
from datasets import load_dataset, Dataset
from ydata_profiling import ProfileReport

In [13]:
global_temp_anomaly_df = load_dataset("vitaliy-sharandin/climate-global-temp-anomaly")['train'].to_pandas().set_index('dt')
country_temp_df = load_dataset("vitaliy-sharandin/climate-global-temp-country")['train'].to_pandas().set_index('dt')
krakow_temp_df = load_dataset("vitaliy-sharandin/climate-krakow-temp-monthly")['train'].to_pandas().set_index('dt')

pollution_region_df = load_dataset("vitaliy-sharandin/pollution-by-region")['train'].to_pandas().set_index('dt')
pollution_variation_df = load_dataset("vitaliy-sharandin/pollution-absolute-variation-co2")['train'].to_pandas().set_index('dt')
pollution_krakow_df = load_dataset("vitaliy-sharandin/pollution-krakow-no2-co")['train'].to_pandas().set_index('dt')

Downloading readme:   0%|          | 0.00/810 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/20.4k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/519 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/451k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/10198 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/673 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/7.06k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/62 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/644 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/17.3k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/870 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/628 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/396k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/31349 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/654 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/351k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/28944 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/568 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/10.1k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/276 [00:00<?, ? examples/s]

In [19]:
global_temp_anomaly_df # yearly diff 1961-1990
country_temp_df # Yearly 1951-1980
krakow_temp_df # recreate to krakow average temperature 1951-1980

# pollution_region_df # Yearly 'European Union (27)','Poland','United States', 'India', 'China'
# pollution_variation_df.pivot(columns='Entity', values='Annual CO₂ emissions growth (abs)').columns # Yearly
pollution_krakow_df.index # Monthly

DatetimeIndex(['2000-01-01 00:00:00+00:00', '2000-02-01 00:00:00+00:00',
               '2000-03-01 00:00:00+00:00', '2000-04-01 00:00:00+00:00',
               '2000-05-01 00:00:00+00:00', '2000-06-01 00:00:00+00:00',
               '2000-07-01 00:00:00+00:00', '2000-08-01 00:00:00+00:00',
               '2000-09-01 00:00:00+00:00', '2000-10-01 00:00:00+00:00',
               ...
               '2022-03-01 00:00:00+00:00', '2022-04-01 00:00:00+00:00',
               '2022-05-01 00:00:00+00:00', '2022-06-01 00:00:00+00:00',
               '2022-07-01 00:00:00+00:00', '2022-08-01 00:00:00+00:00',
               '2022-09-01 00:00:00+00:00', '2022-10-01 00:00:00+00:00',
               '2022-11-01 00:00:00+00:00', '2022-12-01 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', name='dt', length=276, freq=None)

In [89]:
average_slice = krakow_temp_df.loc['1951-01-01':'1980-01-01']
avg_1951_1980 = average_slice.resample('Y').mean().mean()

yearly_averages = krakow_temp_df.resample('Y').mean()

relative_yearly_df = yearly_averages.loc[:'2022-12-31'] - avg_1951_1980

avg_7months_1951_1980 = average_slice.loc[average_slice.index.month <= 6].resample('Y').mean().mean()
relative_7months_2023 = krakow_temp_df.loc['2023-01-01':'2023-06-01'] - avg_7months_1951_1980

relative_yearly_df = relative_yearly_df.append(pd.DataFrame([relative_7months_2023.mean()], index=pd.to_datetime('2023-06-01')))

relative_yearly_df

  relative_yearly_df = relative_yearly_df.append(pd.DataFrame([relative_7months_2023.mean()], index=['2023-06-01']))


Unnamed: 0,Absolute maximum temperature [°C],Absolute minimum temperature [°C],Average monthly temperature [°C]
1951-12-31 00:00:00,0.700556,3.108611,1.795000
1952-12-31 00:00:00,-0.982778,1.391944,0.536667
1953-12-31 00:00:00,0.242222,0.783611,1.253333
1954-12-31 00:00:00,-0.182778,0.533611,0.203333
1955-12-31 00:00:00,-0.374444,1.266944,0.420000
...,...,...,...
2019-12-31 00:00:00,4.333889,4.008611,3.053333
2020-12-31 00:00:00,2.467222,3.575278,2.570000
2021-12-31 00:00:00,3.267222,1.958611,1.453333
2022-12-31 00:00:00,3.408889,2.833611,2.445000


In [61]:
monthly_averages = krakow_temp_df.loc['1951-01-01':'1980-01-01'].groupby(krakow_temp_df.loc['1951-01-01':'1980-01-01'].index.month).mean()
mapped_averages = krakow_temp_df.index.map(lambda x: monthly_averages.loc[x.month])
mapped_averages_df = pd.DataFrame(mapped_averages.tolist(), index=krakow_temp_df.index)
krakow_monthly_relative_temp = krakow_temp_df - mapped_averages_df
krakow_monthly_relative_temp

Unnamed: 0_level_0,Absolute maximum temperature [°C],Absolute minimum temperature [°C],Average monthly temperature [°C]
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1951-01-01,1.450000,2.960000,2.093333
1951-02-01,3.006897,3.096552,2.565517
1951-03-01,1.906897,3.127586,-0.210345
1951-04-01,0.506897,2.600000,1.179310
1951-05-01,-2.196552,2.868966,-0.200000
...,...,...,...
2023-02-01,1.806897,4.496552,3.365517
2023-03-01,5.006897,4.127586,3.289655
2023-04-01,0.506897,-1.500000,-0.120690
2023-05-01,-1.896552,-0.531034,-0.200000


Here are climate datasets in zip file.

3 for temperature and weather. One is for Krakow city data, another is global and third is global by country.

2 for pollution levels. One global by country and second one for Krakow.

I need to analyze following topics and answer following questions:
1. Temperature Changes
  
  1.1.Global temperature change.
  
  1.2.Temperature change per country.
  
  1.3.GLobal, Poland and Krakow temperature yearly rise.
2. Greenhouse Gas Emissions
  
  1. GLobal emmisions

  2. Emmisions per country

  3. Krakow emmisions.

3. Correlation
  
  1. GLobal emmisions temps and emmisions correlation

  2. Emmisions per country correlation

  3. Krakow emmisions correlation

  4. Krakow monthly analysis of emissions.
