```
From: https://github.com/ksatola
Version: 1.0.0
```

# ETL Weather Data Pipeline

## Table of Contents

- [Introduction, Methodology and Comments](#intro)
- [Data Web Scraping](#web)
- [Data Transformation](#trans)

---
<a id='intro'></a>

## Introduction

The data comes from the website of [Polish Institute of Meteorology and Water Management - National Research Institute](http://site.imgw.datatask.net/sites/default/files/IMGW_About_2019.pdf) - Instytut Meteorologii i Gospodarki Wodnej (IMGW). Among others, the [IMGW data archive](https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/) contains hourly synoptic instruments measurements and observations results (from 1960 to 2020).

The synoptic data from 2001 to 2019 is dowloaded on Feb 19th, 2020.

## Methodology

For the download, I use web scraping techniques. The ETL logic is defined as follows:

- Locate the links to the files we want to download inside the multiple levels of HTML tags. Examplary link to a file with weather partial data from 2001: `https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2001/2001_100_s.zip`. There are many such files for each year.
- Download the synoptic data from 2001 to 2019.
- Extract all ZIP files to a common folder.
- For each year:
    - Combine separate data and metadata files into an analytical view with timeseries as index,
    - Filter the data to have only measurements taken in the Krakow area observatory stactions. The list of stations can be taken from `wykaz_stacji.csv` file (downloaded manually from https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/),
    - build datetime index from separate columns representing the moment of taking the measurement,
    - add yearly data filtered by stations in scope to the final dataset.
- Correct time data according to instructions provided by IMGW. Time of irradiance measure (czas pomiaru usłonecznienia godzinowego) taken before March 2015 needs to be adjusted (substract one hour) in order to match later data time collection specified in UTC.

## Comments

- The archive consists of multiyear folders (till 1960-2000), and yearly folders (from 2001) with multiple ZIP files within each. The folder contains also a metadata description in a TXT file (`s_t_format.txt`).
- The extracted CSV files are encoded with 'Central European (`Windows 1250`)' encoding. The ultimate encoding is changed to `utf-8`.
- The metadata file structure (`s_t_format.txt`) is not consistent across rows of data. It consists of two columns but witout any distinguishable separator, and the first column (containing dataset column names) consists of multiple words separated by different number of white spaces between the words (some of them contain more spaces than between columns). Because of this, I had to perform cleaning of the file manually by removing: 
    - multiple spaces from between the words in the first column,
    - empty rows,
    - non-standard rows (additional comments), in order to do the rest (columns name extraction) automatic (107 rows remained). The output file's name is `s_t_format_corrected_input.txt`. This TXT file is processed automatically to derive columns name.



---
<a id='web'></a>

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

In [3]:
import sys
sys.path.insert(0, '../src')

In [4]:
import pandas as pd
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import os
from pathlib import Path
import random
import re

In [5]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 1000)

In [6]:
from prepare import (
    extract_archived_data,
    get_imgw_yearly_weather_data_files,
    parse_imgw_metadata,
    build_imgw_analytical_view
)

## Data Web Scraping

In [7]:
# Set the url to the website and access the site with our requests library
url = 'https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2001'
response = requests.get(url)
response

<Response [200]>

In [8]:
# Get HTML content of the page
soup = BeautifulSoup(response.text, "html.parser")

In [9]:
# We use the method .findAll to locate all of our <a> tags
soup.findAll('a')[:10]

[<a href="?C=N;O=D">Name</a>,
 <a href="?C=M;O=A">Last modified</a>,
 <a href="?C=S;O=A">Size</a>,
 <a href="?C=D;O=A">Description</a>,
 <a href="/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/">Parent Directory</a>,
 <a href="2001_100_s.zip">2001_100_s.zip</a>,
 <a href="2001_105_s.zip">2001_105_s.zip</a>,
 <a href="2001_115_s.zip">2001_115_s.zip</a>,
 <a href="2001_120_s.zip">2001_120_s.zip</a>,
 <a href="2001_125_s.zip">2001_125_s.zip</a>]

In [10]:
# Extract the actual link that we want. Let’s test out the first link
one_a_tag = soup.findAll('a')[5]
filename = one_a_tag['href']
filename

'2001_100_s.zip'

In [11]:
path_to_save = "/Users/ksatola/Documents/git/air-polution/data/imgw/etl"
path_to_save

'/Users/ksatola/Documents/git/air-polution/data/imgw/etl'

In [12]:
fullfilename = os.path.join(path_to_save, filename)
fullfilename

'/Users/ksatola/Documents/git/air-polution/data/imgw/etl/2001_100_s.zip'

In [13]:
# Use the urllib.request library to download this file path to the local file system 
# Provide request.urlretrieve with two parameters: file url and the filename.
urllib.request.urlretrieve(url+'/'+filename, path_to_save+filename) 

('/Users/ksatola/Documents/git/air-polution/data/imgw/etl2001_100_s.zip',
 <http.client.HTTPMessage at 0x11ee1f978>)

### Download IMGW data files

In [19]:
years = [
    '2001',
    '2002',
    '2003',
    '2004',
    '2005',
    '2006',
    '2007',
    '2008',
    '2009',
    '2010',
    '2011',
    '2012',
    '2013',
    '2014',
    '2015',
    '2016',
    '2017',
    '2018',
    '2019'
]

In [20]:
download_base_url = 'https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop'
path_to_save = "/Users/ksatola/Documents/git/air-polution/data/imgw/etl"

In [21]:
%%time

get_imgw_yearly_weather_data_files(years, download_base_url, path_to_save)

ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_100_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_105_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_115_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_120_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_125_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_135_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_155_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/terminowe/synop/2019/2019_160_s.zip
ok: 200 https://dane.imgw.pl/data/dane_pomiarowo

---
<a id='trans'></a>

## Data Transformation

### Tests

In [22]:
# List all files in a directory using scandir()
basepath = '.'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)

numpy_random_numbers_normal_distribution.png
.DS_Store
010_Project_Environment_Setup.ipynb
200_References.ipynb
test.csv
028_ETL_Clean_Complete.ipynb
025_ETL_Pollution.ipynb
030_EDA.ipynb
Distributions.ipynb
tmp_remove.csv
020_ETL_Weather.ipynb
005_General_Notes.ipynb


In [23]:
# List all subdirectories using scandir()
basepath = '.'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)

.ipynb_checkpoints


In [24]:
from datetime import datetime
from os import scandir

def convert_date(timestamp):
    d = datetime.utcfromtimestamp(timestamp)
    formated_date = d.strftime('%d %b %Y')
    return formated_date

def get_files(dir: str):
    dir_entries = scandir(dir)
    for entry in dir_entries:
        if entry.is_file():
            info = entry.stat()
            print(f'{entry.name}\t Last Modified: {convert_date(info.st_mtime)}')

In [25]:
get_files('.')

numpy_random_numbers_normal_distribution.png	 Last Modified: 14 Feb 2020
.DS_Store	 Last Modified: 19 Feb 2020
010_Project_Environment_Setup.ipynb	 Last Modified: 19 Feb 2020
200_References.ipynb	 Last Modified: 19 Feb 2020
test.csv	 Last Modified: 16 Feb 2020
028_ETL_Clean_Complete.ipynb	 Last Modified: 19 Feb 2020
025_ETL_Pollution.ipynb	 Last Modified: 19 Feb 2020
030_EDA.ipynb	 Last Modified: 19 Feb 2020
Distributions.ipynb	 Last Modified: 14 Feb 2020
tmp_remove.csv	 Last Modified: 16 Feb 2020
020_ETL_Weather.ipynb	 Last Modified: 19 Feb 2020
005_General_Notes.ipynb	 Last Modified: 19 Feb 2020


In [26]:
from pathlib import Path
# Create a single repository
p = Path('example_directory')
p.mkdir(exist_ok=True) # do not raise error if the dir exists

In [27]:
# Create multiple sub-directories
p = Path('2018/10/05')
p.mkdir(parents=True)

In [28]:
! tree

[01;34m.[00m
├── 005_General_Notes.ipynb
├── 010_Project_Environment_Setup.ipynb
├── 020_ETL_Weather.ipynb
├── 025_ETL_Pollution.ipynb
├── 028_ETL_Clean_Complete.ipynb
├── 030_EDA.ipynb
├── 200_References.ipynb
├── [01;34m2018[00m
│   └── [01;34m10[00m
│       └── [01;34m05[00m
├── Distributions.ipynb
├── [01;34mexample_directory[00m
├── numpy_random_numbers_normal_distribution.png
├── test.csv
└── tmp_remove.csv

4 directories, 11 files


In [29]:
# Find all files matching filter
base_dir = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/2019'

import fnmatch

# Get .zip files
for f_name in os.listdir(base_dir):
    #if f_name.endswith('.zip'):
    if fnmatch.fnmatch(f_name, '*.zip'): # wildcard search for files
        print(f_name)

2019_560_s.zip
2019_235_s.zip
2019_270_s.zip
2019_155_s.zip
2019_195_s.zip
2019_600_s.zip
2019_272_s.zip
2019_566_s.zip
2019_250_s.zip
2019_580_s.zip
2019_418_s.zip
2019_540_s.zip
2019_135_s.zip
2019_660_s.zip
2019_625_s.zip
2019_330_s.zip
2019_375_s.zip
2019_500_s.zip
2019_295_s.zip
2019_585_s.zip
2019_210_s.zip
2019_465_s.zip
2019_424_s.zip
2019_115_s.zip
2019_400_s.zip
2019_230_s.zip
2019_520_s.zip
2019_310_s.zip
2019_100_s.zip
2019_399_s.zip
2019_185_s.zip
2019_570_s.zip
2019_415_s.zip
2019_469_s.zip
2019_120_s.zip
2019_435_s.zip
2019_205_s.zip
2019_550_s.zip
2019_280_s.zip
2019_360_s.zip
2019_510_s.zip
2019_488_s.zip
2019_200_s.zip
2019_595_s.zip
2019_160_s.zip
2019_125_s.zip
2019_670_s.zip
2019_497_s.zip
2019_385_s.zip
2019_455_s.zip
2019_300_s.zip
2019_345_s.zip
2019_495_s.zip
2019_575_s.zip
2019_530_s.zip
2019_650_s.zip
2019_628_s.zip
2019_105_s.zip
2019_690_s.zip


In [30]:
# Walking a directory tree and printing the names of the directories and files

base_dir = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl'
#base_dir = '.'

for dirpath, dirnames, files in os.walk(base_dir, topdown=True, followlinks=False): # no symbolic links following
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)

Found directory: /Users/ksatola/Documents/git/air-polution/data/imgw/etl
.DS_Store
Found directory: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013
2013_310_s.zip
2013_520_s.zip
2013_230_s.zip
2013_400_s.zip
2013_115_s.zip
2013_424_s.zip
2013_465_s.zip
2013_210_s.zip
2013_585_s.zip
2013_295_s.zip
2013_500_s.zip
2013_330_s.zip
2013_375_s.zip
2013_660_s.zip
2013_625_s.zip
2013_135_s.zip
2013_540_s.zip
2013_418_s.zip
2013_580_s.zip
2013_250_s.zip
2013_566_s.zip
2013_272_s.zip
2013_600_s.zip
2013_195_s.zip
2013_155_s.zip
2013_235_s.zip
2013_270_s.zip
2013_560_s.zip
2013_690_s.zip
2013_105_s.zip
2013_650_s.zip
2013_575_s.zip
2013_530_s.zip
2013_495_s.zip
2013_300_s.zip
2013_345_s.zip
2013_455_s.zip
2013_385_s.zip
2013_497_s.zip
2013_670_s.zip
2013_160_s.zip
2013_125_s.zip
2013_200_s.zip
2013_488_s.zip
2013_510_s.zip
2013_360_s.zip
2013_280_s.zip
2013_550_s.zip
2013_205_s.zip
2013_435_s.zip
2013_120_s.zip
2013_469_s.zip
2013_415_s.zip
2013_570_s.zip
2013_185_s.zip
2013_399_s.zip

In [31]:
import zipfile

zip_dir = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/2019/'
zip_file = zip_dir+'2019_100_s.zip'

with zipfile.ZipFile(zip_file, 'r') as zipobj:
    contents = zipobj.namelist()
    print(contents)
    first_file = zipobj.getinfo(contents[0])
    print(first_file)
    print(first_file.filename)
    print(first_file.file_size)
    print(first_file.date_time)
    zipobj.extractall(path=zip_dir+'extracted/')

['s_t_100_2019.csv']
<ZipInfo filename='s_t_100_2019.csv' compress_type=deflate external_attr=0x20 file_size=3627030 compress_size=369035>
s_t_100_2019.csv
3627030
(2020, 1, 28, 8, 16, 40)


### Extract Data

In [32]:
%%time

source_dir = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/'
target_dir = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/'
file_search_pattern = '*.zip'

extract_archived_data(source_dir, target_dir, file_search_pattern)

Found directory: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/
Found directory: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_310_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_520_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_230_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_400_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_115_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_424_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_465_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_210_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/2013/2013_585_s.zip
Extracting: /Users/ksatola/Documents/git/air-polution/da

### Parse metadata file

In [33]:
%%timeit

file_input = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/metadata/s_t_format_corrected_input.txt'
file_output = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/metadata/s_t_format_corrected_output.csv'

parse_imgw_metadata(file_input, file_output, input_encoding="cp1250", output_encoding="utf-8")

855 µs ± 56.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### Build 1g analytical view

In [34]:
columns = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/metadata/s_t_format_corrected_output.csv'
source_dir = '/Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/'
file_search_pattern = '*.csv'

# Synoptic stations codes in the Krakow area
sms_codes = [
    "250190410", # "KRAKÓW HISTORYCZNE"
    "350190566", # "KRAKÓW-BALICE"
    "250199987", # "KRAKÓW-BIELANY-KLASZTOR"
    "250209979", # "KRAKÓW-ŁĘG"
    "250190390", # "KRAKÓW-OBSERWATORIUM"
    "250199984", # "KRAKÓW-SWOSZOWICE"
    "250190470" # "KRAKÓW-WOLA JUSTOWSKA"
]

In [35]:
# Read columns file
cols = pd.read_csv(columns, encoding='utf-8', sep=",")
cols.columns.tolist()

['Kod stacji',
 'Nazwa stacji',
 'Rok',
 'Miesiąc',
 'Dzień',
 'Godzina',
 'Wysokość podstawy chmur CL CM szyfrowana [kod]',
 'Status pomiaru HPOD',
 'Wysokość podstawy niższej [m]',
 'Status pomiaru HPON',
 'Wysokość podstawy wyższej [m]',
 'Status pomiaru HPOW',
 'Wysokość podstawy tekstowy [opis]',
 'Pomiar przyrzadem 1 (niższa) [P]',
 'Pomiar przyrzadem 2 (wyższa) [P]',
 'Widzialność [kod]',
 'Status pomiaru WID',
 'Widzialność operatora [m]',
 'Status pomiaru WIDO',
 'Widzialność automat [m]',
 'Status pomiaru WIDA',
 'Zachmurzenie ogólne [oktanty]',
 'Status pomiaru NOG',
 'Kierunek wiatru [°]',
 'Status pomiaru KRWR',
 'Prędkość wiatru [m/s]',
 'Status pomiaru FWR',
 'Poryw wiatru [m/s]',
 'Status pomiaru PORW',
 'Temperatura powietrza [°C]',
 'Status pomiaru TEMP',
 'Temperatura termometru zwilżonego [°C]',
 'Status pomiaru TTZW',
 'Wskaźnik wentylacji [W/N]',
 'Wskaźnik lodu [L/W]',
 'Ciśnienie pary wodnej [hPa]',
 'Status pomiaru CPW',
 'Wilgotność względna [%]',
 'Status pom

In [36]:
%%time

df = build_imgw_analytical_view(source_dir, columns, file_search_pattern, sms_codes)

0001 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_330_2013.csv, total rows: 0
0002 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_488_2008.csv, total rows: 107
0003 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_125_2018.csv, total rows: 214
0004 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_330_2007.csv, total rows: 321
0005 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_650_2011.csv, total rows: 428
0006 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_650_2005.csv, total rows: 535
0007 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/etl/extracted/s_t_560_2003.csv, total rows: 642
0008 -> Shape: (0, 107), file: /Users/ksatola/Documents/git/air-polution/data/imgw/et

In [37]:
df.shape

(166536, 107)

In [38]:
df.head()

Unnamed: 0_level_0,Kod stacji,Nazwa stacji,year,month,day,hour,Wysokość podstawy chmur CL CM szyfrowana [kod],Status pomiaru HPOD,Wysokość podstawy niższej [m],Status pomiaru HPON,Wysokość podstawy wyższej [m],Status pomiaru HPOW,Wysokość podstawy tekstowy [opis],Pomiar przyrzadem 1 (niższa) [P],Pomiar przyrzadem 2 (wyższa) [P],Widzialność [kod],Status pomiaru WID,Widzialność operatora [m],Status pomiaru WIDO,Widzialność automat [m],Status pomiaru WIDA,Zachmurzenie ogólne [oktanty],Status pomiaru NOG,Kierunek wiatru [°],Status pomiaru KRWR,Prędkość wiatru [m/s],Status pomiaru FWR,Poryw wiatru [m/s],Status pomiaru PORW,Temperatura powietrza [°C],Status pomiaru TEMP,Temperatura termometru zwilżonego [°C],Status pomiaru TTZW,Wskaźnik wentylacji [W/N],Wskaźnik lodu [L/W],Ciśnienie pary wodnej [hPa],Status pomiaru CPW,Wilgotność względna [%],Status pomiaru WLGW,Temperatura punktu rosy [°C],Status pomiaru TPTR,Ciśnienie na pozimie stacji [hPa],Status pomiaru PPPS,Ciśnienie na pozimie morza [hPa],Status pomiaru PPPM,Charakterystyka tendencji [kod],Wartość tendencji [wartość],Status pomiaru APP,Opad za 6 godzin [mm],Status pomiaru WO6G,Rodzaj opadu za 6 godzin [kod],Status pomiaru ROPT,Pogoda bieżąca [kod],Pogoda ubiegła [kod],Zachmurzenie niskie [oktanty],Status pomiaru CLCM,Chmury CL [kod],Status pomiaru CHCL,Chmury CL tekstem,Chmury CM [kod],Status pomiaru CHCM,Chmury CM tekstem,Chmury CH [kod],Status pomiaru CHCH,Chmury CH tekstem,Stan gruntu [kod],Status pomiaru SGRN,Niedosyt wilgotności [hPa],Status pomiaru DEFI,Usłonecznienie,Status pomiaru USLN,Wystąpienie rosy [0/1],Status pomiaru ROSW,Poryw maksymalny za okres WW [m/s],Status pomiaru PORK,Godzina wystąpienia porywu,Minuta wystąpienia porywu,Temperatura gruntu -5 [°C],Status pomiaru TG05,Temperatura gruntu -10 [°C],Status pomiaru TG10,Temperatura gruntu -20 [°C],Status pomiaru TG20,Temperatura gruntu -50 [°C],Status pomiaru TG50,Temperatura gruntu -100 [°C],Status pomiaru TG100,Temperatura minimalna za 12 godzin [°C],Status pomiaru TMIN,Temperatura maksymalna za 12 godzin [°C],Status pomiaru TMAX,Temperatura minimalna przy gruncie za 12 godzin [°C],Status pomiaru TGMI,Równoważnik wodny śniegu [mm/cm],Status pomiaru RWSN,Wysokość pokrywy śnieżnej [cm],Status pomiaru PKSN,Wysokość świeżo spadłego śniegu [cm],Status pomiaru HSS,Wysokość śniegu na poletku [cm],Status pomiaru GRSN,Gatunek śniegu [kod],Ukształtowanie pokrywy [kod],Wysokość próbki [cm],Status pomiaru HPRO,Ciężar próbki [g],Status pomiaru CIPR
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1
2005-01-01 00:00:00,350190566,KRAKÓW-BALICE,2005,1,1,0,5,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,7,,210,,1,,0,,2.3,,1.5,,W,W,6.3,,87,,0.3,,993.6,,1023.4,,7,-0.1,,0.0,9.0,0,9.0,10,2,7,,5,,,/,,,/,,,1,,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 01:00:00,350190566,KRAKÓW-BALICE,2005,1,1,1,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,230,,2,,0,,2.0,,1.1,,W,W,6.0,,85,,-0.2,,993.2,,1023.0,,8,-0.4,,0.0,8.0,0,8.0,10,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 02:00:00,350190566,KRAKÓW-BALICE,2005,1,1,2,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,220,,3,,0,,2.0,,1.1,,W,W,6.0,,85,,-0.2,,992.5,,1022.3,,8,-1.2,,0.0,8.0,0,8.0,10,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 03:00:00,350190566,KRAKÓW-BALICE,2005,1,1,3,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,210,,2,,0,,2.4,,1.7,,W,W,6.4,,89,,0.7,,992.6,,1022.3,,5,-1.0,,0.0,8.0,0,8.0,10,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 04:00:00,350190566,KRAKÓW-BALICE,2005,1,1,4,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,220,,2,,0,,2.4,,1.8,,W,W,6.5,,90,,1.0,,992.3,,1022.0,,7,-0.9,,0.0,8.0,0,8.0,60,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0


In [39]:
df.tail()

Unnamed: 0_level_0,Kod stacji,Nazwa stacji,year,month,day,hour,Wysokość podstawy chmur CL CM szyfrowana [kod],Status pomiaru HPOD,Wysokość podstawy niższej [m],Status pomiaru HPON,Wysokość podstawy wyższej [m],Status pomiaru HPOW,Wysokość podstawy tekstowy [opis],Pomiar przyrzadem 1 (niższa) [P],Pomiar przyrzadem 2 (wyższa) [P],Widzialność [kod],Status pomiaru WID,Widzialność operatora [m],Status pomiaru WIDO,Widzialność automat [m],Status pomiaru WIDA,Zachmurzenie ogólne [oktanty],Status pomiaru NOG,Kierunek wiatru [°],Status pomiaru KRWR,Prędkość wiatru [m/s],Status pomiaru FWR,Poryw wiatru [m/s],Status pomiaru PORW,Temperatura powietrza [°C],Status pomiaru TEMP,Temperatura termometru zwilżonego [°C],Status pomiaru TTZW,Wskaźnik wentylacji [W/N],Wskaźnik lodu [L/W],Ciśnienie pary wodnej [hPa],Status pomiaru CPW,Wilgotność względna [%],Status pomiaru WLGW,Temperatura punktu rosy [°C],Status pomiaru TPTR,Ciśnienie na pozimie stacji [hPa],Status pomiaru PPPS,Ciśnienie na pozimie morza [hPa],Status pomiaru PPPM,Charakterystyka tendencji [kod],Wartość tendencji [wartość],Status pomiaru APP,Opad za 6 godzin [mm],Status pomiaru WO6G,Rodzaj opadu za 6 godzin [kod],Status pomiaru ROPT,Pogoda bieżąca [kod],Pogoda ubiegła [kod],Zachmurzenie niskie [oktanty],Status pomiaru CLCM,Chmury CL [kod],Status pomiaru CHCL,Chmury CL tekstem,Chmury CM [kod],Status pomiaru CHCM,Chmury CM tekstem,Chmury CH [kod],Status pomiaru CHCH,Chmury CH tekstem,Stan gruntu [kod],Status pomiaru SGRN,Niedosyt wilgotności [hPa],Status pomiaru DEFI,Usłonecznienie,Status pomiaru USLN,Wystąpienie rosy [0/1],Status pomiaru ROSW,Poryw maksymalny za okres WW [m/s],Status pomiaru PORK,Godzina wystąpienia porywu,Minuta wystąpienia porywu,Temperatura gruntu -5 [°C],Status pomiaru TG05,Temperatura gruntu -10 [°C],Status pomiaru TG10,Temperatura gruntu -20 [°C],Status pomiaru TG20,Temperatura gruntu -50 [°C],Status pomiaru TG50,Temperatura gruntu -100 [°C],Status pomiaru TG100,Temperatura minimalna za 12 godzin [°C],Status pomiaru TMIN,Temperatura maksymalna za 12 godzin [°C],Status pomiaru TMAX,Temperatura minimalna przy gruncie za 12 godzin [°C],Status pomiaru TGMI,Równoważnik wodny śniegu [mm/cm],Status pomiaru RWSN,Wysokość pokrywy śnieżnej [cm],Status pomiaru PKSN,Wysokość świeżo spadłego śniegu [cm],Status pomiaru HSS,Wysokość śniegu na poletku [cm],Status pomiaru GRSN,Gatunek śniegu [kod],Ukształtowanie pokrywy [kod],Wysokość próbki [cm],Status pomiaru HPRO,Ciężar próbki [g],Status pomiaru CIPR
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1
2008-12-31 19:00:00,350190566,KRAKÓW-BALICE,2008,12,31,19,9,,0,8.0,0,8.0,,,,4,,0,8.0,0,8.0,0,,0,,0,,0,,-6.2,,0.0,8.0,U,,3.5,,92,,-7.3,,996.4,,1027.2,,7,-1.6,,0.0,8.0,0,8.0,10,0,0,,0,,,0,,,0,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2008-12-31 20:00:00,350190566,KRAKÓW-BALICE,2008,12,31,20,9,,0,8.0,0,8.0,,,,4,,0,8.0,0,8.0,0,,200,,1,,0,,-7.3,,0.0,8.0,U,,3.3,,94,,-8.1,,996.1,,1027.1,,6,-1.4,,0.0,8.0,0,8.0,10,0,0,,0,,,0,,,0,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2008-12-31 21:00:00,350190566,KRAKÓW-BALICE,2008,12,31,21,9,,0,8.0,0,8.0,,,,4,,0,8.0,0,8.0,0,,0,,0,,0,,-7.2,,0.0,8.0,U,,3.4,,94,,-8.0,,995.4,,1026.3,,7,-1.5,,0.0,8.0,0,8.0,10,0,0,,0,,,0,,,0,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2008-12-31 22:00:00,350190566,KRAKÓW-BALICE,2008,12,31,22,9,,0,8.0,0,8.0,,,,4,,0,8.0,0,8.0,0,,230,,1,,0,,-8.4,,0.0,8.0,U,,3.1,,94,,-9.2,,995.3,,1026.4,,7,-1.1,,0.0,8.0,0,8.0,10,0,0,,0,,,0,,,0,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2008-12-31 23:00:00,350190566,KRAKÓW-BALICE,2008,12,31,23,9,,0,8.0,0,8.0,,,,4,,0,8.0,0,8.0,0,,240,,1,,0,,-8.6,,0.0,8.0,U,,3.0,,94,,-9.4,,994.8,,1025.9,,7,-1.3,,0.0,8.0,0,8.0,10,0,0,,0,,,0,,,0,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0


In [40]:
df.sample(5)

Unnamed: 0_level_0,Kod stacji,Nazwa stacji,year,month,day,hour,Wysokość podstawy chmur CL CM szyfrowana [kod],Status pomiaru HPOD,Wysokość podstawy niższej [m],Status pomiaru HPON,Wysokość podstawy wyższej [m],Status pomiaru HPOW,Wysokość podstawy tekstowy [opis],Pomiar przyrzadem 1 (niższa) [P],Pomiar przyrzadem 2 (wyższa) [P],Widzialność [kod],Status pomiaru WID,Widzialność operatora [m],Status pomiaru WIDO,Widzialność automat [m],Status pomiaru WIDA,Zachmurzenie ogólne [oktanty],Status pomiaru NOG,Kierunek wiatru [°],Status pomiaru KRWR,Prędkość wiatru [m/s],Status pomiaru FWR,Poryw wiatru [m/s],Status pomiaru PORW,Temperatura powietrza [°C],Status pomiaru TEMP,Temperatura termometru zwilżonego [°C],Status pomiaru TTZW,Wskaźnik wentylacji [W/N],Wskaźnik lodu [L/W],Ciśnienie pary wodnej [hPa],Status pomiaru CPW,Wilgotność względna [%],Status pomiaru WLGW,Temperatura punktu rosy [°C],Status pomiaru TPTR,Ciśnienie na pozimie stacji [hPa],Status pomiaru PPPS,Ciśnienie na pozimie morza [hPa],Status pomiaru PPPM,Charakterystyka tendencji [kod],Wartość tendencji [wartość],Status pomiaru APP,Opad za 6 godzin [mm],Status pomiaru WO6G,Rodzaj opadu za 6 godzin [kod],Status pomiaru ROPT,Pogoda bieżąca [kod],Pogoda ubiegła [kod],Zachmurzenie niskie [oktanty],Status pomiaru CLCM,Chmury CL [kod],Status pomiaru CHCL,Chmury CL tekstem,Chmury CM [kod],Status pomiaru CHCM,Chmury CM tekstem,Chmury CH [kod],Status pomiaru CHCH,Chmury CH tekstem,Stan gruntu [kod],Status pomiaru SGRN,Niedosyt wilgotności [hPa],Status pomiaru DEFI,Usłonecznienie,Status pomiaru USLN,Wystąpienie rosy [0/1],Status pomiaru ROSW,Poryw maksymalny za okres WW [m/s],Status pomiaru PORK,Godzina wystąpienia porywu,Minuta wystąpienia porywu,Temperatura gruntu -5 [°C],Status pomiaru TG05,Temperatura gruntu -10 [°C],Status pomiaru TG10,Temperatura gruntu -20 [°C],Status pomiaru TG20,Temperatura gruntu -50 [°C],Status pomiaru TG50,Temperatura gruntu -100 [°C],Status pomiaru TG100,Temperatura minimalna za 12 godzin [°C],Status pomiaru TMIN,Temperatura maksymalna za 12 godzin [°C],Status pomiaru TMAX,Temperatura minimalna przy gruncie za 12 godzin [°C],Status pomiaru TGMI,Równoważnik wodny śniegu [mm/cm],Status pomiaru RWSN,Wysokość pokrywy śnieżnej [cm],Status pomiaru PKSN,Wysokość świeżo spadłego śniegu [cm],Status pomiaru HSS,Wysokość śniegu na poletku [cm],Status pomiaru GRSN,Gatunek śniegu [kod],Ukształtowanie pokrywy [kod],Wysokość próbki [cm],Status pomiaru HPRO,Ciężar próbki [g],Status pomiaru CIPR
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1
2015-08-29 02:00:00,350190566,KRAKÓW-BALICE,2015,8,29,2,7,,1600,,0,8.0,1600,,,8,,30000,,0,,7,,65,,2,,0,9.0,20.1,,0.0,8.0,U,,19.0,,81,,16.7,,993.8,,1022.1,,1,1.4,,0.0,8.0,0,8.0,3,2,7,,5,,Sc str pe,/,,/,/,,/,0,8.0,4.5,,0.0,8.0,0.0,,0,9.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2011-08-18 19:00:00,350190566,KRAKÓW-BALICE,2011,8,18,19,9,,0,8.0,0,8.0,>2500,,,8,,20000,,0,,1,,0,,0,,0,9.0,20.5,,0.0,8.0,U,,19.5,,81,,17.1,,989.3,,1017.5,,3,0.5,,0.0,8.0,0,8.0,3,0,1,,0,,.,3,,Ac str pe,0,,.,0,8.0,4.6,,0.0,8.0,,8.0,0,9.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2010-03-19 14:00:00,350190566,KRAKÓW-BALICE,2010,3,19,14,9,,0,8.0,0,8.0,,,,8,,0,8.0,0,8.0,7,,240,,6,,0,,14.5,,0.0,8.0,U,,6.3,,38,,0.3,,991.7,,1020.1,,6,-1.0,,0.0,8.0,0,8.0,2,2,0,,0,,,0,,,8,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2008-08-29 10:00:00,350190566,KRAKÓW-BALICE,2008,8,29,10,5,,0,8.0,0,8.0,,,,7,,0,8.0,0,8.0,7,,260,,7,,0,,19.8,,0.0,8.0,U,,15.4,,67,,13.5,,986.3,,1014.1,,8,-0.4,,0.0,8.0,0,8.0,1,2,3,,5,,,3,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2013-05-08 12:00:00,350190566,KRAKÓW-BALICE,2013,5,8,12,6,,1000,,0,8.0,1000,,,8,,30000,,0,,2,,44,,4,,0,9.0,25.2,,0.0,8.0,U,,17.3,,54,,15.2,,987.5,,1015.2,,8,-1.4,,0.0,9.0,0,9.0,3,4,1,,2,,Cu con,0,,.,2,,Ci spi2,1,,14.7,,0.0,8.0,,8.0,0,9.0,,,15.7,,14.2,,12.8,,11.2,,9.0,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0


In [41]:
# Create a save directory if not exists
save_dir = '/Users/ksatola/Documents/git/air-polution/data/final'
Path(save_dir).mkdir(parents=True, exist_ok=True)

In [42]:
# Save
imgw_all_file = '/Users/ksatola/Documents/git/air-polution/data/final/imgw_all.csv'
df.to_csv(imgw_all_file, encoding="utf-8", index=True)

In [43]:
# Test read

# when without low_memory=False
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.errors.DtypeWarning.html
#/Users/ksatola/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3058: 
# DtypeWarning: Columns (6,12,13,14,34,52,53,54,56,58,59,61,62,64,101,102) have mixed types. 
# Specify dtype option on import or set low_memory=False.
# interactivity=interactivity, compiler=compiler, result=result)

df_read = pd.read_csv(imgw_all_file, encoding='utf-8', sep=",", index_col="Datetime", low_memory=False)
df_read.head()

Unnamed: 0_level_0,Kod stacji,Nazwa stacji,year,month,day,hour,Wysokość podstawy chmur CL CM szyfrowana [kod],Status pomiaru HPOD,Wysokość podstawy niższej [m],Status pomiaru HPON,Wysokość podstawy wyższej [m],Status pomiaru HPOW,Wysokość podstawy tekstowy [opis],Pomiar przyrzadem 1 (niższa) [P],Pomiar przyrzadem 2 (wyższa) [P],Widzialność [kod],Status pomiaru WID,Widzialność operatora [m],Status pomiaru WIDO,Widzialność automat [m],Status pomiaru WIDA,Zachmurzenie ogólne [oktanty],Status pomiaru NOG,Kierunek wiatru [°],Status pomiaru KRWR,Prędkość wiatru [m/s],Status pomiaru FWR,Poryw wiatru [m/s],Status pomiaru PORW,Temperatura powietrza [°C],Status pomiaru TEMP,Temperatura termometru zwilżonego [°C],Status pomiaru TTZW,Wskaźnik wentylacji [W/N],Wskaźnik lodu [L/W],Ciśnienie pary wodnej [hPa],Status pomiaru CPW,Wilgotność względna [%],Status pomiaru WLGW,Temperatura punktu rosy [°C],Status pomiaru TPTR,Ciśnienie na pozimie stacji [hPa],Status pomiaru PPPS,Ciśnienie na pozimie morza [hPa],Status pomiaru PPPM,Charakterystyka tendencji [kod],Wartość tendencji [wartość],Status pomiaru APP,Opad za 6 godzin [mm],Status pomiaru WO6G,Rodzaj opadu za 6 godzin [kod],Status pomiaru ROPT,Pogoda bieżąca [kod],Pogoda ubiegła [kod],Zachmurzenie niskie [oktanty],Status pomiaru CLCM,Chmury CL [kod],Status pomiaru CHCL,Chmury CL tekstem,Chmury CM [kod],Status pomiaru CHCM,Chmury CM tekstem,Chmury CH [kod],Status pomiaru CHCH,Chmury CH tekstem,Stan gruntu [kod],Status pomiaru SGRN,Niedosyt wilgotności [hPa],Status pomiaru DEFI,Usłonecznienie,Status pomiaru USLN,Wystąpienie rosy [0/1],Status pomiaru ROSW,Poryw maksymalny za okres WW [m/s],Status pomiaru PORK,Godzina wystąpienia porywu,Minuta wystąpienia porywu,Temperatura gruntu -5 [°C],Status pomiaru TG05,Temperatura gruntu -10 [°C],Status pomiaru TG10,Temperatura gruntu -20 [°C],Status pomiaru TG20,Temperatura gruntu -50 [°C],Status pomiaru TG50,Temperatura gruntu -100 [°C],Status pomiaru TG100,Temperatura minimalna za 12 godzin [°C],Status pomiaru TMIN,Temperatura maksymalna za 12 godzin [°C],Status pomiaru TMAX,Temperatura minimalna przy gruncie za 12 godzin [°C],Status pomiaru TGMI,Równoważnik wodny śniegu [mm/cm],Status pomiaru RWSN,Wysokość pokrywy śnieżnej [cm],Status pomiaru PKSN,Wysokość świeżo spadłego śniegu [cm],Status pomiaru HSS,Wysokość śniegu na poletku [cm],Status pomiaru GRSN,Gatunek śniegu [kod],Ukształtowanie pokrywy [kod],Wysokość próbki [cm],Status pomiaru HPRO,Ciężar próbki [g],Status pomiaru CIPR
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1
2005-01-01 00:00:00,350190566,KRAKÓW-BALICE,2005,1,1,0,5,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,7,,210,,1,,0,,2.3,,1.5,,W,W,6.3,,87,,0.3,,993.6,,1023.4,,7,-0.1,,0.0,9.0,0,9.0,10,2,7,,5,,,/,,,/,,,1,,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 01:00:00,350190566,KRAKÓW-BALICE,2005,1,1,1,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,230,,2,,0,,2.0,,1.1,,W,W,6.0,,85,,-0.2,,993.2,,1023.0,,8,-0.4,,0.0,8.0,0,8.0,10,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 02:00:00,350190566,KRAKÓW-BALICE,2005,1,1,2,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,220,,3,,0,,2.0,,1.1,,W,W,6.0,,85,,-0.2,,992.5,,1022.3,,8,-1.2,,0.0,8.0,0,8.0,10,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 03:00:00,350190566,KRAKÓW-BALICE,2005,1,1,3,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,210,,2,,0,,2.4,,1.7,,W,W,6.4,,89,,0.7,,992.6,,1022.3,,5,-1.0,,0.0,8.0,0,8.0,10,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0
2005-01-01 04:00:00,350190566,KRAKÓW-BALICE,2005,1,1,4,4,,0,8.0,0,8.0,,,,6,,0,8.0,0,8.0,8,,220,,2,,0,,2.4,,1.8,,W,W,6.5,,90,,1.0,,992.3,,1022.0,,7,-0.9,,0.0,8.0,0,8.0,60,2,7,,5,,,/,,,/,,,0,8.0,0.0,8.0,0.0,8.0,,8.0,0,8.0,,,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0.0,8.0,0,8.0,0,8.0,0,8.0,,,0,8.0,0,8.0


In [44]:
df_read.shape

(166536, 107)

In [45]:
assert df.shape[1] == df_read.shape[1]