This file describes the contents of the file 
'stockholm_daily_mean_temperature_1756_2018.txt', which contains daily mean
temperatures at the Stockholm Old Astronomical Observatory (59.35N, 18.05E)
1756-2018.    

Contact person:    

Anders Moberg    
Department of Physical Geography    
Stockholm University    
SE-10691 Stockholm, Sweden    
anders.moberg@natgeo.su.se    

2019-05-03    


References:    

Moberg A, Bergström H, Ruiz Krigsman J, Svanered O. 2002: Daily air temperature     
and pressure series for Stockholm (1756-1998). Climatic Change 53: 171-212    

Moberg A, Alexandersson H, Bergström H, Jones PD. 2003: Were Southern Swedish     
summer temperatures before 1860 as warm as measured?     
Int. J. Climatol. 23: 1495-1521    

Project:    

IMPROVE - Improved Understanding of Past Climatic Variability from Early Daily     
European Instrumental Sources. EU 4th Framework Programme. Contract    
ENV4-CT97-0511. 1998-1999. Co-ordinator: Dario Camuffo, Consiglio Nazionale    
delle Ricerche, Istituto di Scienze dell'Atmosfera e del Clima, Padova, Italy.    
PI at Stockholm University: Anders Moberg    

Original data sources:    

1756-1858: Archive of the Royal Swedish Academy of Sciences    
1859-now:  Swedish Meteorological and Hydrological Institute (SMHI)    

Data for 1756-1838 were digitized at the SMHI. Data for 1839-1960 were digitized    
within IMPROVE. Data for 1961-2012 were provided digitally by the SMHI.    
Data for 2013 and later are obtained from SMHI Öppna Data:    
http://opendata-download-metobs.smhi.se/explore/#    


--------------------------------------------------------------------------------    

Contents of the file 'stockholm_daily_mean_temperature_1756_2018.txt':    

Daily temperature data for Stockholm Old Astronomical Observatory     
1 Jan 1756 -  31 Dec 2018.    


column		data    

1-3          Year, month, day    
4            Daily average temperature according to observations.     
               Unit: C, Missing values: -999.0    
5            Daily average temperature after homogenization and    
               with gaps filled in using data from Uppsala.     
	       (see Moberg et al. 2002)    
6            Daily average temperatures after adjustment before September 1858     
	       for a supposed warm bias of May-August temperatures.    
	       (see Moberg et al. 2003)    
7            Data id no. meaning data from:    
               1=Stockholm    
               2=Uppsala (ajusted to represent Stockholm)    
               3=Stockholm, automatic station (used from 2013 onwards)    

Period       Typical number of observations per day,     
	     used in calclulation of daily mean temp.    
1757-1760    2    
1761-1858    3    
1859-1946    3 (& Tmin in summer)    
1947-2018    3 & Tmin & Tmax    

During 1756-1875 the thermometer was hung in the free air outside a north-facing    
window on the second floor of the old astronomical observatory building in    
Stockholm. No detailed description is available on this site.     

During 1876-1960 the thermometer was placed outside a north-facing window    
on the first floor of the old astronomical observatory building in Stockholm.    
A window-screeen was in use since 1878.    

During 1961- summer 2006 the thermometer was placed in a SMHI-screen     
(Stevenson-type screen) about ten metres north-east of the former position.    

Since summer 2006, a platinum resistance thermometer in a modern cylindrical     
screen close to the SMHI-screen replaced the mercury thermometer in the    
SMHI-screen.    

From January 2013 onwards, the data used here are based on observations recorded    
at the automatic station, which is of the same type as the manually observed    
resistance thermometer.    

The following adjustments are made to the data column 5:    

17560403-17561031	The observed temperatures during this period are judged    
			to be too high, and of poor quality. They have been    
			replaced with data extrapolated from Uppsala through    
			linear regression.     

17630225-17630228	No observations were made.     
			Daily temperatures have been extrapolated from Uppsala.    

18190801-18250113	Correction by -0.7 C.     
			Incorrect thermometer probably used.    

18700111-20181231	Correction for urban heat island trend     
			and other inhomogeneities.    

The following additional adjustments are made to the data column 6:    

1756-1858, May:  day 1-4:   0.0 C, day 5-9: -0.1 C, day 10-13: -0.2 C,     
		 day 14-18: -0.3 C, day 19-22: -0.4 C, day 23-27: -0.5 C,     
		 day 28-31: -0.6 C.     
           Jun:  day 1-30: -0.7 C    
           Jul:  day 1-31: -0.7 C    
           Aug:  day 1-4:  -0.6 C, day 5-9: -0.5 C, day 10-13: -0.4 C,     
		 day 14-18: -0.3 C, day 19-22: -0.2 C, day 23-27: -0.1 C,     
		 day 28-31: 0.0 C.    

This gives an average adjustment by -0.3 C both  May and August and -0.7 C for    
June and July. This adjustment is in agreement with conclusions drawn by Moberg    
et al. (2003), but have been determined on an ad hoc basis rather than from a    
strict statistical analysis.    


#### Download and read data

In [1]:
import os, requests, re
from zipfile import ZipFile

source_dir = os.path.join("data","source")
url = "https://www.smhi.se/polopoly_fs/1.2864!/stockholm_daily_mean_temperature_1756_2018.zip"
txt = "stockholm_daily_mean_temperature_1756_2018.txt"

request = requests.get(url, allow_redirects=True)
request_file = os.path.join(source_dir, 'tmp.zip')
open(request_file, 'wb').write(request.content)

raw_data = []
with ZipFile(request_file) as tmpzip:
    with tmpzip.open(txt) as file:
        for line in file.readlines():
            linesplit = re.findall(r'\S.*?(?<=\s)|\d(?=\n)', line.decode())
            raw_data.append(linesplit)

os.remove(request_file)

#### Convert data

In [2]:
import pandas as pd
import numpy as np
columns = ('year','month','day','daily_temp','daily_temp_homo','daily_temp_adj','ID')

dtypes = {'year':int,
          'month':int,
          'day':int,
          'daily_temp':float,
          'daily_temp_homo':float,
          'daily_temp_adj':float,
          'ID':int}

i = 0
raw_data_array = np.array(raw_data)
sthlm_dataset1 = pd.DataFrame(columns=columns)
for column in sthlm_dataset1.columns:
    sthlm_dataset1[column] = raw_data_array[:,i].astype(dtypes[column])
    i+=1

sthlm_dataset1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96059 entries, 0 to 96058
Data columns (total 7 columns):
year               96059 non-null int64
month              96059 non-null int64
day                96059 non-null int64
daily_temp         96059 non-null float64
daily_temp_homo    96059 non-null float64
daily_temp_adj     96059 non-null float64
ID                 96059 non-null int64
dtypes: float64(3), int64(4)
memory usage: 5.1 MB


#### Encapsulate as a timeindex DataFrame

In [3]:
timeindex = []
for year, month, day in zip(sthlm_dataset1["year"], sthlm_dataset1["month"], sthlm_dataset1["day"]):
    timeindex.append(pd.Timestamp(year=year, month=month, day=day))
datetimeindex = pd.DatetimeIndex(timeindex)

sthlm_df = pd.DataFrame(sthlm_dataset1.iloc[:,3:]).set_index(datetimeindex)
sthlm_df1 = pd.DataFrame(np.array(sthlm_df.iloc[:,2]), columns=["temperature"]).set_index(datetimeindex)
sthlm_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 96059 entries, 1756-01-01 to 2018-12-31
Data columns (total 4 columns):
daily_temp         96059 non-null float64
daily_temp_homo    96059 non-null float64
daily_temp_adj     96059 non-null float64
ID                 96059 non-null int64
dtypes: float64(3), int64(1)
memory usage: 3.7 MB


#### Quality data

In [30]:
def smhi_reader(csv_path):
    datetimelist = []
    temperature = []
    with open(csv_path, 'rt') as file:
        for line in file.readlines()[10:]:
            linesplit = line.split(';')
            date = linesplit[2]
            temp = linesplit[3]
            datetimelist.append(pd.Timestamp(date))
            temperature.append(float(temp))
    
    df = pd.DataFrame(temperature, columns=["temperature"]).set_index(pd.DatetimeIndex(datetimelist))
    return df

def smhi_concater(hist_df, latest_df):
    from_date = hist_df.index[-1] + pd.Timedelta(1, 'D')
    from_date = str(from_date).split(' ')[0]
    df = pd.concat([hist_df, latest_df[from_date:]])
    return df

def smhi_temperature_data(hist_path, latest_path):
    hist_df = smhi_reader(os.path.join(source_dir, "sthlm_historical.csv"))
    latest_df = smhi_reader(os.path.join(source_dir, "sthlm_latest.csv"))
    df = smhi_concater(hist_df, latest_df)
    print("Data from",df.index[0].date(),"to",df.index[-1].date())
    return df

def merge(hist_df, latest_df):
    from_date = latest_df.index[0] - pd.Timedelta(1, 'D')
    from_date = str(from_date).split(' ')[0]
    df = pd.concat([hist_df, latest_df[from_date:]])
    print("Data from",df.index[0].date(),"to",df.index[-1].date())
    return df

In [31]:
sthlm_hist_csv = os.path.join(source_dir, "sthlm_historical.csv")
sthlm_latest_csv = os.path.join(source_dir, "sthlm_latest.csv")
sthlm_df2 = smhi_temperature_data(sthlm_hist_csv, sthlm_latest_csv)

Data from 1859-01-01 to 2020-02-01


In [32]:
sthlm_full = merge(sthlm_df1, sthlm_df2)

Data from 1756-01-01 to 2020-02-01


#### Save to disk

In [33]:
def save_to_disk(df, file_name, compression="bz2"):
    pickle_path = os.path.join("data", file_name + '.' + compression)
    df.to_pickle(pickle_path, compression=compression)

In [29]:
save_to_disk(sthlm_df, "sthlm_uni_full")
save_to_disk(sthlm_df1, "sthlm_uni_mini")
save_to_disk(sthlm_df2, "smhi_daily")
save_to_disk(sthlm_full, "su_smhi_daily_full")
pickle_path = os.path.join("data", "su_smhi_daily_full.bz2")
pd.read_pickle(pickle_path)

Unnamed: 0,temperature
1756-01-01,-8.7
1756-01-02,-9.2
1756-01-03,-8.6
1756-01-04,-7.7
1756-01-05,-7.2
...,...
2020-01-28,3.4
2020-01-29,3.3
2020-01-30,2.4
2020-01-31,3.0
