# Heatmap visualization of shift in seasons for Budapest

With this notebook, I illustrate shift in seasons for the Hungarian captial, Budapest.<br>
So let's get started!<br>
As a start, I import the needed libraries.

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

#libraries for visualization
import seaborn as sb
import matplotlib.pyplot as plt
%matplotlib

import re # reqexp will be needed for some string manipulations

Using matplotlib backend: Qt5Agg


For the analysis, I will be using data available for the public at the Országos Meteorológiai Szolgálat (OMSZ) website.<br>
OMSZ stands for Countrywide Meteorolgy Service and in particular, I will be using their dataset called 'BP_M_tx.txt':<br><br>
https://www.met.hu/eghajlat/magyarorszag_eghajlata/eghajlati_adatsorok/Budapest/adatok/havi_adatok/
https://www.met.hu/downloads.php?id=12&file=eghajlati_adatsor_1901-2019+&no=Budapest
<br><br>
In a few words, the dataset contains historical data for Budapest monthly maximum temperatures, back to 1901, January.

Here are some sample rows:<br>
![Sample rows](https://raw.githubusercontent.com/pszakos/heatchart_shift_in_seasons/master/md_dtx_bp_sample.PNG)

The full description of of the dataset - in Hungarian - is the following:<br>
![Dataset description](https://raw.githubusercontent.com/pszakos/heatchart_shift_in_seasons/master/defintion_m_dtx_bp.PNG)

For the visualization, I will be using these columns:
- m_dtx0: number of winter days, when maximum temperautre is <= 0 Celsius
- m_dtx25: number of summer days, when maximum temperature is >= 25 Celsius
- m_dtx30: number of heat days, when maximum temperature is >= 30 Celsius
- m_dtx35: number of hot days, when maximum temperature is >= 35 Celsius

Let's read 'BP_M_tx.txt', into a dataframe:

In [2]:
df = pd.read_csv('SZO_M_tx.txt', delimiter=";")
df

Unnamed: 0,#datum,m_txx,m_txxd,m_txa,m_dtx0,m_dtx25,m_dtx30,m_dtx35
0,1901-01,9.0,1901-01-23,-1.6,17,0,0,0
1,1901-02,5.8,1901-02-05,0.7,12,0,0,0
2,1901-03,18.0,1901-03-19,9.0,0,0,0,0
3,1901-04,24.0,1901-04-09,16.5,0,0,0,0
4,1901-05,29.0,1901-05-31,21.8,0,4,0,0
...,...,...,...,...,...,...,...,...
1423,2019-08,32.0,2019-08-12,28.0,0,26,9,0
1424,2019-09,31.7,2019-09-01,22.5,0,5,1,0
1425,2019-10,26.6,2019-10-21,18.4,0,1,0,0
1426,2019-11,19.9,2019-11-03,10.8,0,0,0,0


For the visualization, I will be using 'm_dtx0' and 'm_dtx25', let's give them a more descriptive column name and drop unnecessary columns.

In [3]:
df["winter days"] = df["m_dtx0"]
df["summer days"] = df["m_dtx25"]
df = df.drop(df.columns[1:8], axis=1)
df

Unnamed: 0,#datum,winter days,summer days
0,1901-01,17,0
1,1901-02,12,0
2,1901-03,0,0
3,1901-04,0,0
4,1901-05,0,4
...,...,...,...
1423,2019-08,0,26
1424,2019-09,0,5
1425,2019-10,0,1
1426,2019-11,0,0


We will need to split years and columns into different columns.

In [4]:
df[["Year", "#Month"]] = df["#datum"].str.split("-", expand = True)
df

Unnamed: 0,#datum,winter days,summer days,Year,#Month
0,1901-01,17,0,1901,01
1,1901-02,12,0,1901,02
2,1901-03,0,0,1901,03
3,1901-04,0,0,1901,04
4,1901-05,0,4,1901,05
...,...,...,...,...,...
1423,2019-08,0,26,2019,08
1424,2019-09,0,5,2019,09
1425,2019-10,0,1,2019,10
1426,2019-11,0,0,2019,11


Create month names column

In [5]:
df["Month"] = df["#Month"]

df["Month"]= df["Month"].str.replace("01","Jan")
df["Month"]= df["Month"].str.replace("02","Feb")
df["Month"]= df["Month"].str.replace("03","Mar")
df["Month"]= df["Month"].str.replace("04","Apr")
df["Month"]= df["Month"].str.replace("05","May")
df["Month"]= df["Month"].str.replace("06","Jun")
df["Month"]= df["Month"].str.replace("07","Jul")
df["Month"]= df["Month"].str.replace("08","Aug")
df["Month"]= df["Month"].str.replace("09","Sep")
df["Month"]= df["Month"].str.replace("10","Oct")
df["Month"]= df["Month"].str.replace("11","Nov")
df["Month"]= df["Month"].str.replace("12","Dec")

df["Year"] = df["Year"].astype(int)
df["#Month"] = df["#Month"].astype(int)
df

Unnamed: 0,#datum,winter days,summer days,Year,#Month,Month
0,1901-01,17,0,1901,1,Jan
1,1901-02,12,0,1901,2,Feb
2,1901-03,0,0,1901,3,Mar
3,1901-04,0,0,1901,4,Apr
4,1901-05,0,4,1901,5,May
...,...,...,...,...,...,...
1423,2019-08,0,26,2019,8,Aug
1424,2019-09,0,5,2019,9,Sep
1425,2019-10,0,1,2019,10,Oct
1426,2019-11,0,0,2019,11,Nov


We will create year bins for grouping the dataset into 10-year groups.<br>
There will be 12 year bins overall.

In [6]:
bins = np.array([*range(1910,2030,10)])
labels = bins.astype(str)
df["Year bin"] = np.digitize(df["Year"], bins)
df

Unnamed: 0,#datum,winter days,summer days,Year,#Month,Month,Year bin
0,1901-01,17,0,1901,1,Jan,0
1,1901-02,12,0,1901,2,Feb,0
2,1901-03,0,0,1901,3,Mar,0
3,1901-04,0,0,1901,4,Apr,0
4,1901-05,0,4,1901,5,May,0
...,...,...,...,...,...,...,...
1423,2019-08,0,26,2019,8,Aug,11
1424,2019-09,0,5,2019,9,Sep,11
1425,2019-10,0,1,2019,10,Oct,11
1426,2019-11,0,0,2019,11,Nov,11


We create the labels for the bins.

In [7]:
df["Year bin label"] = labels[df["Year bin"]]
df

Unnamed: 0,#datum,winter days,summer days,Year,#Month,Month,Year bin,Year bin label
0,1901-01,17,0,1901,1,Jan,0,1910
1,1901-02,12,0,1901,2,Feb,0,1910
2,1901-03,0,0,1901,3,Mar,0,1910
3,1901-04,0,0,1901,4,Apr,0,1910
4,1901-05,0,4,1901,5,May,0,1910
...,...,...,...,...,...,...,...,...
1423,2019-08,0,26,2019,8,Aug,11,2020
1424,2019-09,0,5,2019,9,Sep,11,2020
1425,2019-10,0,1,2019,10,Oct,11,2020
1426,2019-11,0,0,2019,11,Nov,11,2020


Now we can start transforming the dataset for the heatchart.<br>
We transponse the month labels to indicatior columns with onehot coding.

In [8]:
onehot = pd.get_dummies(df["Month"])
df[['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']] = \
onehot[['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']]
df

Unnamed: 0,#datum,winter days,summer days,Year,#Month,Month,Year bin,Year bin label,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,1901-01,17,0,1901,1,Jan,0,1910,1,0,0,0,0,0,0,0,0,0,0,0
1,1901-02,12,0,1901,2,Feb,0,1910,0,1,0,0,0,0,0,0,0,0,0,0
2,1901-03,0,0,1901,3,Mar,0,1910,0,0,1,0,0,0,0,0,0,0,0,0
3,1901-04,0,0,1901,4,Apr,0,1910,0,0,0,1,0,0,0,0,0,0,0,0
4,1901-05,0,4,1901,5,May,0,1910,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1423,2019-08,0,26,2019,8,Aug,11,2020,0,0,0,0,0,0,0,1,0,0,0,0
1424,2019-09,0,5,2019,9,Sep,11,2020,0,0,0,0,0,0,0,0,1,0,0,0
1425,2019-10,0,1,2019,10,Oct,11,2020,0,0,0,0,0,0,0,0,0,1,0,0
1426,2019-11,0,0,2019,11,Nov,11,2020,0,0,0,0,0,0,0,0,0,0,1,0


We multiply the number of summer/winter days with the month indicator columns.<br>
Please note, we multiply the winter days with -1, so these will be presented on the chart with the negative scale.

In [9]:
for idx in range(8, 20):
    df[df.columns[idx]] = 0 + df[df.columns[idx]]*df["winter days"]*-1 + df[df.columns[idx]]*df["summer days"]
df

Unnamed: 0,#datum,winter days,summer days,Year,#Month,Month,Year bin,Year bin label,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,1901-01,17,0,1901,1,Jan,0,1910,-17,0,0,0,0,0,0,0,0,0,0,0
1,1901-02,12,0,1901,2,Feb,0,1910,0,-12,0,0,0,0,0,0,0,0,0,0
2,1901-03,0,0,1901,3,Mar,0,1910,0,0,0,0,0,0,0,0,0,0,0,0
3,1901-04,0,0,1901,4,Apr,0,1910,0,0,0,0,0,0,0,0,0,0,0,0
4,1901-05,0,4,1901,5,May,0,1910,0,0,0,0,4,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1423,2019-08,0,26,2019,8,Aug,11,2020,0,0,0,0,0,0,0,26,0,0,0,0
1424,2019-09,0,5,2019,9,Sep,11,2020,0,0,0,0,0,0,0,0,5,0,0,0
1425,2019-10,0,1,2019,10,Oct,11,2020,0,0,0,0,0,0,0,0,0,1,0,0
1426,2019-11,0,0,2019,11,Nov,11,2020,0,0,0,0,0,0,0,0,0,0,0,0


We drop the unused columns.

In [10]:
df = df.drop(columns = ["Year","Year bin", "#datum", "winter days", "summer days", "#Month", "Month"])
df

Unnamed: 0,Year bin label,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,1910,-17,0,0,0,0,0,0,0,0,0,0,0
1,1910,0,-12,0,0,0,0,0,0,0,0,0,0
2,1910,0,0,0,0,0,0,0,0,0,0,0,0
3,1910,0,0,0,0,0,0,0,0,0,0,0,0
4,1910,0,0,0,0,4,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1423,2020,0,0,0,0,0,0,0,26,0,0,0,0
1424,2020,0,0,0,0,0,0,0,0,5,0,0,0
1425,2020,0,0,0,0,0,0,0,0,0,1,0,0
1426,2020,0,0,0,0,0,0,0,0,0,0,0,0


We group the dataset using the 10 year bins. <br>
Please note, this way we have all months' data for the same bin in a single row.

In [11]:
#we divide by 10, so we will have the average number of winter/summer days per month
grouped_df = df.groupby("Year bin label").sum()/10
grouped_df

Unnamed: 0_level_0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
Year bin label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1910,-10.8,-4.3,-0.1,0.4,5.4,14.2,18.1,14.8,3.9,0.0,-1.5,-4.8
1920,-10.0,-7.0,-0.2,0.4,4.2,12.7,15.7,15.8,4.7,0.0,-1.8,-3.0
1930,-9.4,-7.0,-0.6,0.2,6.2,8.6,18.0,14.4,5.4,0.3,-2.5,-9.5
1940,-10.8,-4.0,-1.0,1.1,3.8,13.7,20.8,15.9,6.3,0.3,-0.7,-9.2
1950,-16.9,-8.0,-1.0,1.2,6.4,11.5,18.6,18.7,9.4,0.7,-0.5,-8.1
1960,-10.6,-8.4,-2.0,0.5,4.8,11.2,18.7,16.5,6.7,0.3,-0.7,-5.2
1970,-14.1,-5.6,-1.6,1.2,4.2,12.7,15.7,13.5,4.9,0.1,-1.5,-11.6
1980,-9.1,-3.2,-1.1,0.3,5.0,10.5,16.7,13.9,5.3,0.2,-0.9,-6.8
1990,-11.6,-6.7,-1.4,0.4,3.7,9.3,20.0,17.2,6.2,0.2,-1.4,-5.8
2000,-9.4,-3.6,-0.2,0.7,6.6,12.6,21.4,21.2,6.0,0.5,-2.0,-8.4


Now, let's reverse the row order, so we have the years in descending order.

In [12]:
grouped_df = grouped_df[::-1]
grouped_df

Unnamed: 0_level_0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
Year bin label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2020,-8.3,-4.3,-0.5,2.0,6.0,18.4,24.2,21.6,7.6,0.8,-0.9,-4.2
2010,-9.7,-2.6,-0.8,0.7,8.4,17.3,21.8,20.7,6.3,0.7,-0.3,-7.2
2000,-9.4,-3.6,-0.2,0.7,6.6,12.6,21.4,21.2,6.0,0.5,-2.0,-8.4
1990,-11.6,-6.7,-1.4,0.4,3.7,9.3,20.0,17.2,6.2,0.2,-1.4,-5.8
1980,-9.1,-3.2,-1.1,0.3,5.0,10.5,16.7,13.9,5.3,0.2,-0.9,-6.8
1970,-14.1,-5.6,-1.6,1.2,4.2,12.7,15.7,13.5,4.9,0.1,-1.5,-11.6
1960,-10.6,-8.4,-2.0,0.5,4.8,11.2,18.7,16.5,6.7,0.3,-0.7,-5.2
1950,-16.9,-8.0,-1.0,1.2,6.4,11.5,18.6,18.7,9.4,0.7,-0.5,-8.1
1940,-10.8,-4.0,-1.0,1.1,3.8,13.7,20.8,15.9,6.3,0.3,-0.7,-9.2
1930,-9.4,-7.0,-0.6,0.2,6.2,8.6,18.0,14.4,5.4,0.3,-2.5,-9.5


Now, we have the dataframe ready for the heatchart.<br>
Let's plot the chart.

In [13]:
heat_map = sb.heatmap(grouped_df, cmap = "coolwarm", center = 0, vmin = -20, vmax = 30)
plt.xlabel('Month')
plt.ylabel('Years')
plt.title('Average number of winter/summer days\n in last 10 years, Szombathely')
heat_map.set_yticklabels(heat_map.get_yticklabels(), rotation=0)

[Text(0, 0.5, '2020'),
 Text(0, 1.5, '2010'),
 Text(0, 2.5, '2000'),
 Text(0, 3.5, '1990'),
 Text(0, 4.5, '1980'),
 Text(0, 5.5, '1970'),
 Text(0, 6.5, '1960'),
 Text(0, 7.5, '1950'),
 Text(0, 8.5, '1940'),
 Text(0, 9.5, '1930'),
 Text(0, 10.5, '1920'),
 Text(0, 11.5, '1910')]

And with that, we are finished with this notebook.<br>
Thank you for watching!<br>
In case you are interested, you can find...