# Počasí
Informace o počasí čerpám z https://opendata.chmi.cz/meteorology/climate/historical_csv/. Pracovat budu pouze s denními daty z jedné měřicí stanice a to z profesionální stanice Praha Karlov:
- wsi: 0-20000-0-11519
- gh_id: P1PKAR01
- souřadnice: 14.4186,50.0675
- nadmořská výška: 260.5 m. n. m.
(zdroj: meta1.csv)

Zajímají mě konkrétně tyto veličiny:
- Rychlost větru (F), m/s, 8.5 metrů nad zemí, průměr z měření v 07:00 14:00 a 21:00
- Výška sněhu (SCE), cm, 0 metrů nad zemí, měřeno v 06:00 (viz dly-0-20000-0-11519-SCE.csv)
- Srážka (SRA),mm,1.11, měřeno od 6:00 daného dne do 6:00 následujícího
- Sluneční svit (SSV), hod, 1.5, měřeno od 00:00 do 24:00 daného dne
- Teplota (T),°C,1.99, průměr z měření v 07:00 14:00 a 21:00
(zdroj: meta2.csv)

In [2]:
import pandas as pd
import datetime
from ydata_profiling import ProfileReport

## Vítr

In [3]:
df_wind = pd.read_csv('dly-0-20000-0-11519-F.csv')
profile_wind = ProfileReport(df_wind, title="Profiling Report")
profile_wind

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 8/8 [00:00<00:00, 11.25it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [4]:
# Let's clean it:
df_wind = df_wind.drop(columns=['WSI', 'EG_EL_ABBREVIATION', 'FLAG1', 'Unnamed: 7'])
df_wind.to_csv('wind.csv', index=False) 

## Sníh

In [5]:
df_snow = pd.read_csv('dly-0-20000-0-11519-SCE.csv')
profile_snow = ProfileReport(df_snow, title="Profiling Report")
profile_snow

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 8/8 [00:00<00:00, 34.18it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [6]:
# Let's clean it:
df_snow = df_snow.drop(columns=['WSI', 'EG_EL_ABBREVIATION', 'TIME', 'Unnamed: 7'])
df_snow.to_csv('snow.csv', index=False) 

## Déšť

In [7]:
df_rain = pd.read_csv('dly-0-20000-0-11519-SRA.csv')
profile_rain = ProfileReport(df_rain, title="Profiling Report")
profile_rain

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 8/8 [00:00<00:00, 84.21it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



Sloupec FLAG1 obsahuje pouze hodnoty T (= neměřitelné množství) nebo nic. Neměřitelné množství koresponduje s VALUE = 0.0, sloupec FLAG1 lze tedy odstranit.

In [8]:
# Let's clean it:
df_rain = df_rain.drop(columns=['WSI', 'EG_EL_ABBREVIATION', 'TIME', 'Unnamed: 7'])
df_rain.to_csv('rain.csv', index=False) 

## Sluneční svit

In [9]:
df_sun = pd.read_csv('dly-0-20000-0-11519-SSV.csv')
profile_sun = ProfileReport(df_sun, title="Profiling Report")
profile_sun

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 8/8 [00:00<00:00, 107.40it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [10]:
# Let's clean it:
df_sun = df_sun.drop(columns=['WSI', 'EG_EL_ABBREVIATION', 'TIME', 'FLAG1', 'Unnamed: 7'])
df_sun.to_csv('sun.csv', index=False) 

## Teplota

In [11]:
df_temp = pd.read_csv('dly-0-20000-0-11519-T.csv')
profile_temp = ProfileReport(df_temp, title="Profiling Report")
profile_temp

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 8/8 [00:00<00:00, 20.19it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [12]:
# Let's clean it:
df_temp = df_temp.drop(columns=['WSI', 'EG_EL_ABBREVIATION', 'FLAG1', 'Unnamed: 7'])
df_temp.to_csv('temp.csv', index=False) 

## Závěr

In [13]:
df_rain.columns

Index(['DT', 'VALUE', 'FLAG1', 'QUALITY'], dtype='object')

In [14]:
df = df_rain.copy()
df['DATE'] = pd.to_datetime(df.DT).dt.date
df = df.drop(columns=['DT', 'VALUE', 'FLAG1', 'QUALITY'])
df['RAIN'] = df_rain.VALUE.copy()
df['SNOW'] = df_snow.VALUE.copy()
df['SUN'] = df_sun.VALUE.copy()
df['TEMP'] = df_temp.VALUE.copy()
df['WIND'] = df_wind.VALUE.copy()
df = df[df['DATE'] >= datetime.date(2020, 1, 1)].reset_index(drop=True)

In [15]:
display(df)
display(df.info())

Unnamed: 0,DATE,RAIN,SNOW,SUN,TEMP,WIND
0,2020-01-01,0.0,0.0,0.0,8.5,3.0
1,2020-01-02,0.0,0.0,0.0,9.4,3.7
2,2020-01-03,0.1,0.0,5.8,7.1,5.7
3,2020-01-04,0.1,0.0,0.9,5.9,8.3
4,2020-01-05,0.1,0.0,0.0,4.7,8.7
...,...,...,...,...,...,...
1822,2024-12-27,0.0,0.0,0.1,5.2,8.0
1823,2024-12-28,0.0,0.0,0.0,4.4,4.0
1824,2024-12-29,0.0,0.0,3.9,-0.8,1.0
1825,2024-12-30,0.0,0.0,2.2,-2.1,1.3


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1827 entries, 0 to 1826
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   DATE    1827 non-null   object 
 1   RAIN    1827 non-null   float64
 2   SNOW    1827 non-null   float64
 3   SUN     1827 non-null   float64
 4   TEMP    1827 non-null   float64
 5   WIND    1827 non-null   float64
dtypes: float64(5), object(1)
memory usage: 85.8+ KB


None

In [16]:
df['DATE'] = pd.to_datetime(df.DATE)

In [17]:
profile = ProfileReport(df, title="Profiling Report")
profile

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 6/6 [00:00<00:00, 9400.76it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [18]:
df.to_csv('weather.csv', index=False)