<a href="https://colab.research.google.com/github/vitaliy-sharandin/data_science_projects/blob/master/portfolio/eda/Global_climate_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Datasets

#### Global
* NASA's Global Climate Change Datasets
* NOAA's Climate Datasets
* WHO Air Pollution Data Portal
* Climate Change Knowledge Portal
* Climate Change Initiative
* Citizen Weather Observer Program (CWOP/APRS)

* https://ourworldindata.org/co2-and-greenhouse-gas-emissions
* https://github.com/owid/co2-data

#### Krakow
* Krakow temperatures
https://www.visualcrossing.com/weather/weather-data-services
* Krakow pollution levels
https://powietrze.gios.gov.pl/pjp/archives
* Air corridors
* https://reasonstobecheerful.world/stuttgart-ventilation-corridors-green-cool-air/
* What is direction and average speed of wind in Krakow?

# Questions
1. Temperature Changes:

  * How has the global average temperature changed over the years?
  * Are there specific regions where the temperature is rising faster?
  * What is the correlation between greenhouse gas concentration and global temperature rise?

2. Greenhouse Gas Emissions:
  * Which countries are the largest emitters of greenhouse gases, and how has this changed over time?
  * What sectors (energy, agriculture, industrial processes, etc.) contribute most to greenhouse gas emissions globally and locally?
  * How do per capita emissions vary across different countries?

3. Effects on Biodiversity:
  * How is climate change impacting biodiversity in different ecosystems?
  * Are certain species more affected than others?
  * What is the correlation between climate change and the extinction rate?

4. Climate Policies and Agreements:
  * How have different countries' emissions changed after major climate agreements (like the Paris Agreement)?
  * Are countries meeting their Nationally Determined Contributions (NDCs)?
  * How do the policies and efforts of different countries compare?

5. Extreme Weather Events:
  * Has the frequency of extreme weather events (like hurricanes, heatwaves, heavy rain) changed over time?
  * What regions are most affected by these extreme events?
  * What's the economic impact of these events?
  




* Temperature by city/country
  * https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
  * https://climatedata.imf.org/datasets/4063314923d74187be9596f10d034914_0/explore
* Emmisions by city/country
  * https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022
  * https://ourworldindata.org/co2-and-greenhouse-gas-emissions
  * https://github.com/owid/co2-data
* Krakow temperatures
  * https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/miesieczne/klimat/
* Krakow emmisions
  * https://powietrze.gios.gov.pl/pjp/archives

In [1]:
!pip install -U -q datasets
!pip install -U -q ydata-profiling

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m357.3/357.3 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.7/102.7 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m679.5/679.5 kB[0m [31m48.2 MB/

In [15]:
import pandas as pd
from datasets import load_dataset
from ydata_profiling import ProfileReport

In [45]:
global_temp_anomaly_df = load_dataset("vitaliy-sharandin/climate-global-temp-anomaly")['train'].to_pandas()
world_region_temp_df = load_dataset("vitaliy-sharandin/climate-world-region")['train'].to_pandas()
country_temp_df = load_dataset("vitaliy-sharandin/climate-global-temp-country")['train'].to_pandas()
krakow_temp_df = load_dataset("vitaliy-sharandin/climate-krakow-temp-monthly")['train'].to_pandas()

pollution_region_df = load_dataset("vitaliy-sharandin/pollution-by-region")['train'].to_pandas()
pollution_variation_df = load_dataset("vitaliy-sharandin/pollution-absolute-variation-co2")['train'].to_pandas()
pollution_krakow_df = load_dataset("vitaliy-sharandin/pollution-krakow-no2-co")['train'].to_pandas()

In [50]:
global_temp_anomaly_df['dt'] = pd.to_datetime(global_temp_anomaly_df['dt'], errors="coerce", utc=True)
global_temp_anomaly_df.set_index('dt',drop=True,inplace=True)

In [51]:
world_region_temp_df['Date'] = pd.to_datetime(world_region_temp_df['Date'], errors="coerce", utc=True)
world_region_temp_df.set_index('Date',drop=True,inplace=True)

In [None]:
country_temp_df['dt'] = pd.to_datetime(country_temp_df['dt'], errors="coerce", utc=True)
country_temp_df.set_index('dt',drop=True,inplace=True)

In [53]:
krakow_temp_df['datetime'] = pd.to_datetime(krakow_temp_df[['Year', 'Month']].assign(DAY=1))
krakow_temp_df.set_index('datetime', inplace=True)
krakow_temp_df = krakow_temp_df[['Absolute maximum temperature [°C]','Absolute minimum temperature [°C]','Average monthly temperature [°C]']]

In [57]:
pollution_region_df['Year'] = pd.to_datetime(pollution_region_df['Year'], errors="coerce", utc=True)
pollution_region_df.set_index('Year',drop=True,inplace=True)

In [58]:
pollution_variation_df['Year'] = pd.to_datetime(pollution_variation_df['Year'], errors="coerce", utc=True)
pollution_variation_df.set_index('Year',drop=True,inplace=True)

In [59]:
pollution_krakow_df['Datetime'] = pd.to_datetime(pollution_krakow_df['Datetime'], errors="coerce", utc=True)
pollution_krakow_df.set_index('Datetime',drop=True,inplace=True)
pollution_krakow_df[['NO2','CO']] = pollution_krakow_df[['NO2','CO']].astype(int)

Here are climate datasets in zip file.

3 for temperature and weather. One is for Krakow city data, another is global and third is global by country.

2 for pollution levels. One global by country and second one for Krakow.

I need to analyze following topics and answer following questions:
1. Temperature Changes
  
  1.1.How has the global average temperature changed over the years?
  
  1.2.Are there specific regions where the temperature is rising faster?
  
  1.3.What are Poland and Krakow temperature levels rise?
  
  1.4. Show correlation between global temps rise, Poland and Krakow.
2. Greenhouse Gas Emissions
  
  1.Which countries are the largest emitters of greenhouse gases, and how has this changed over time?

  2.2.What sectors (energy, agriculture, industrial processes, etc.) contribute most to greenhouse gas emissions globally and locally?
  
  2.3. How do per capita emissions vary across different countries?
  
  2.4.What is the correlation between greenhouse gas concentration and global temperature rise?
  
  2.5. What is the correlation between greenhouse gas concentration and Krakow temperature rise?
