# Merging all Austrian Data
Includes:
- Holzeinschlagsmeldung (Annual Logging report)
- Dokumentation der Waldschädigungsfaktoren (Documentation of forest damage factors)
- Waldinventur (Forest inventory)
- BOKU's Improved Forest Structure Data Set (based on remote sensing data)
- state-level reports on logging or state of the forest
- Wikipedia: district area, population, population density
- Coordinates: google maps
- Elevation: google maps and Copernicus E-OBS

## Presettings

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import xarray as xr
import os
import re

from scipy import stats

In [2]:
# All directories
root       = "C:/Users/freiste/OneDrive - Ilmatieteen laitos/Documents/IIASA YSSP 2023"
this_dir   =  os.getcwd()

input_dir1 = f"{root}/02 - Data/EU/Copernicus_E-OBS_Weather_Postprocessed"
input_dir2 = f"{root}/02 - Data/AUT"
output_dir = input_dir2

## Global Methods

In [3]:
# get all certain-type files inside a specific folder

def show_all_files(input_dir, typ='csv'):

    regexp = re.compile(fr"{typ}")
    files  = [path.name for path in os.scandir(input_dir) if path.is_file() if regexp.search(path.name)]
    
    return files


In [4]:
# Default value of display.max_rows is 10, so max 10 rows will be printed.
# Set it None to display all rows in the dataframe

def show_entire_df(switch = True):
    
    if switch == True:
        pd.set_option('display.max_rows', None)
        pd.set_option('display.max_columns', None)
    else:
        pd.set_option('display.max_rows', 10)
        pd.set_option('display.max_columns', 10)


In [5]:
def check_empty_cells(column):
    
    empty_cells = []
    
    for i, el in enumerate(column):
        
        if type(el) == str:
            el = el.strip(" - NaN")
            if el == '':
                empty_cells.append(i)
        
    return empty_cells

In [6]:
def correct_empty_cells(column):
    
    suspects = check_empty_cells(column)
    
    for i in suspects:

        column.at[i] = np.nan

    return column

In [7]:
# Fill in missing years
def resample_years(df):
    mux = pd.MultiIndex.from_product([ df.ForestryDistrict.unique() , range(df.Year.min(), df.Year.max() + 1)], 
                                     names=['ForestryDistrict', 'Year'])

    return df.set_index(['ForestryDistrict', 'Year']).reindex(mux).reset_index()

In [8]:
def linreg(x):
    return slope * x + intercept

In [9]:
def total_to_agriusable_area(ha):
    # Linear regression performed on Austrian state-level
    slope     =   0.03460584797747747
    intercept = 201.98643840068985
    rvalue    =   0.9551544308177479
    
    return slope * ha + intercept

In [10]:
# Fill in missing years
def resample_years(df, firstyear, lastyear):
    mux = pd.MultiIndex.from_product([ df.ForestryDistrict.unique() , range(firstyear,lastyear + 1)], 
                                     names=['ForestryDistrict', 'Year'])

    return df.set_index(['ForestryDistrict', 'Year']).reindex(mux).reset_index()

# Import Data

#### Forest Inventory (FI)

In [14]:
FI = pd.read_csv(f'{input_dir2}/Data_BWF_ForestInventory_Postprocessed.csv')
FI.drop(columns=['Period', 'PeriodLength', 'Unnamed: 0'], inplace=True)
FI

Unnamed: 0,ForestryDistrict,FederalState,Year,Area,DistrictShareAustrForest,DistrictShareStateForest,TotalForestShare,TotalForestArea,ErtragswaldShare,ErtragswaldArea,...,SpruceArea-SD,SpruceArea2,SpruceStockShare,SpruceStock,DWStShare,DWStStock,DWSt-SD,DeadSpruceStShare,DeadSpruceArea,DeadSpruceDensity
0,Eisenstadt+Rust,Burgenland,1996,516.03,0.004017,0.122137,0.310059,16000.000000,0.950,13236.169500,...,500.0,0.000000,0.213627,400000.0,0.028,6.800000,1.3,0.00280,3.706127,0.01652
1,Eisenstadt+Rust,Burgenland,1997,516.03,0.004033,0.122145,0.310867,16041.666667,0.950,13236.169500,...,500.0,146.520147,0.213627,400000.0,0.028,6.800000,1.3,0.00280,3.706127,0.01652
2,Eisenstadt+Rust,Burgenland,1998,516.03,0.004050,0.122152,0.311674,16083.333333,0.950,13236.169500,...,500.0,293.040293,0.213627,400000.0,0.028,6.800000,1.3,0.00280,3.706127,0.01652
3,Eisenstadt+Rust,Burgenland,1999,516.03,0.004066,0.122159,0.312482,16125.000000,0.950,13236.169500,...,500.0,439.560440,0.213627,400000.0,0.028,6.800000,1.3,0.00280,3.706127,0.01652
4,Eisenstadt+Rust,Burgenland,2000,516.03,0.004082,0.122166,0.313289,16166.666667,0.950,13236.169500,...,500.0,586.080586,0.213627,400000.0,0.028,6.800000,1.3,0.00280,3.706127,0.01652
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2153,Vorarlberg,Vorarlberg_total,2017,2601.67,0.024404,1.000000,0.373612,97201.638667,0.637,61981.665746,...,4000.0,9285.115207,0.725265,15499000.0,0.025,10.633333,2.8,0.01595,462.550000,0.16588
2154,Vorarlberg,Vorarlberg_total,2018,2601.67,0.024411,1.000000,0.373709,97226.843500,0.637,61981.665746,...,4000.0,6963.836406,0.725265,15499000.0,0.025,10.575000,2.8,0.01595,462.550000,0.16588
2155,Vorarlberg,Vorarlberg_total,2019,2601.67,0.024417,1.000000,0.373806,97252.048333,0.637,61981.665746,...,4000.0,4642.557604,0.725265,15499000.0,0.025,10.516667,2.8,0.01595,462.550000,0.16588
2156,Vorarlberg,Vorarlberg_total,2020,2601.67,0.024423,1.000000,0.373903,97277.253167,0.637,61981.665746,...,4000.0,2321.278802,0.725265,15499000.0,0.025,10.458333,2.8,0.01595,462.550000,0.16588


#### Documentation of Forest Damage Factors (DFDF)

In [12]:
# Documentation of Forest damage factors (BWF)   (DWF)

DFDF = pd.read_excel(f'{input_dir2}/Data_BWF_DocumentationOfForestdamagefactors.xlsx', 
                     usecols=list(range(11)))

DFDF.drop(index=0, columns=['PeriodLength'], inplace=True)  # drop source and unit lines

DFDF

Unnamed: 0,ForestryDistrict,FederalState,Year,Area,YrlyStormDamage,BBDamageYrly,IpsTypographusShare,IpsTypDamageYrly,IpsTypAreaDamage,IpsTypAreaDamage2
1,Burgenland,Burgenland_total,1989,3965.2,,5960.265,0.95,6794.7021,,
2,Burgenland,Burgenland_total,1990,3965.2,,5960.265,0.95,6794.7021,,
3,Burgenland,Burgenland_total,1991,3965.2,,5960.265,0.95,6794.7021,,
4,Burgenland,Burgenland_total,1992,3965.2,,29801.325,0.95,33973.5105,,
5,Burgenland,Burgenland_total,1993,3965.2,,41721.854,0.95,47562.91356,,
...,...,...,...,...,...,...,...,...,...,...
276,Tirol,Tyrol_total,2020,12648.38,,81000,0.9,87480,,293
277,Tirol,Tyrol_total,2021,12648.38,,196000,0.94,221088,,614
278,Tirol,Tyrol_total,2022,12648.38,,1280000,0.98,1505280,,
279,Lienz,Tirol,2021,2020.08,,98700,0.94,111333.6,,


In [None]:
# checking of linear regression between ha_aff and ha_red

# Prepare Data
A = DFDF[DFDF['IpsTypAreaDamage'].notnull() & DFDF['IpsTypAreaDamage2'].notnull()]

x1 = A['IpsTypAreaDamage'].to_list()
y1 = A['IpsTypAreaDamage2'].to_list()

# Make linear regression model
slope, intercept, r, p, std_err = stats.linregress(x1, y1)
# Calculate modelled y-points
ha_aff_red = list(map(linreg, x1))


# Plot
fig, ax = plt.subplots(1,1, dpi=200)

plt.scatter(x1, y1)
plt.plot(x1, ha_aff_red)

plt.title('Affected total Area vs. affected usuable Area\nAustrian states-scale')
plt.xlabel('ha$_{affected}$', fontsize=14)
plt.ylabel('ha$_{red}$', fontsize=14)


plt.text(0, 7000, f" y  = {slope :.3f} * x + {intercept :.3f}", fontsize=14)
plt.text(0, 6200, f"R$^{2}$ = {r**2 :.3f}", fontsize=14)


In [None]:
print(DFDF['BBDamageYrly'].isnull().values.any())
DFDF[DFDF['BBDamageYrly'].isnull()]

#### Annual Logging Reports (ALR)

In [13]:
# Annual Logging Reports (Ministry of Transport & Co.)   (HEM)

ALR = pd.read_excel(f'{input_dir2}/Data_BMLRT_AnnualLoggingReports.xlsx')

ALR.drop(index=0, columns=['beetle_salvage_2', 'beetle_salvage_3', 'storm_salvage_2', 'storm_salvage_3'], inplace=True)

ALR = ALR.sort_values(by=['FederalState', 'ForestryDistrict', 'Year'])
alr_order = 'ForestryDistrict 	FederalState 	Year 	beetle_salvage_1 	storm_salvage_1'.split(' 	')
ALR = ALR[alr_order]

ALR

Unnamed: 0,ForestryDistrict,FederalState,Year,beetle_salvage_1,storm_salvage_1
1,Eisenstadt+Rust,Burgenland,2003,2160.848636,574.902042
2,Eisenstadt+Rust,Burgenland,2004,2988.799699,1224.941506
3,Eisenstadt+Rust,Burgenland,2005,3074.588787,964.695414
4,Eisenstadt+Rust,Burgenland,2006,2032.452887,1984.376452
5,Eisenstadt+Rust,Burgenland,2007,1941.769791,370.217339
...,...,...,...,...,...
1799,Wien,Wien,2018,401,8
1800,Wien,Wien,2019,207,111
1801,Wien,Wien,2020,717,4
1802,Wien,Wien,2021,16,4


In [None]:
print(ALR['beetle_salvage_1'].isnull().values.any())
print(ALR['storm_salvage_1'].isnull().values.any())

In [None]:
print(ALR[ALR['storm_salvage_1'].isnull()].ForestryDistrict.unique())
print(ALR[ALR['storm_salvage_1'].isnull()].Year.unique())
# Bundesland-Level 1988-2022 fehlt Sturmdata

#### BOKU's forest structure data set (FSDS)

In [None]:
# Must be summed up, all data points within a distirct

#### Geographical Data

In [None]:
# Geographical Details

Geo = pd.read_excel(f'{input_dir2}/Data_Geography.xlsx')
    
Geo.drop(columns='lat_center1,lon_center1,elev_center1,lat_center2,lon_center2,elev_center2'.split(','), 
         index=[0, 85],         # drop units line
         inplace=True)

Geo

#### Climate Data

In [None]:
# Matching Climate Data with Forestry District LAT & LON
import xarray as xr

In [None]:
postpr_clim_files = show_all_files(typ='nc', input_dir=input_dir1)
postpr_clim_files

In [None]:
# Prepare dummy dataset to fill with for climate data, 
Clim = Geo['ForestryDistrict 	FederalState 	lat_center 	lon_center'.split(' 	')]
# add years!
years = sorted(ALR.Year.unique())
Clim['Year']=None
Clim = resample_years(Clim, years[0], years[-1])

In [None]:
ds = xr.open_dataset(f'{input_dir1}/cumulative_relevant_degreedays_europe_1980-2022_0.25deg.nc')
ds

In [None]:
addlist1 = 'relevant_degreedays 	max_generations 	season_start 	season_end'.split(' 	')

for ad in addlist1:
    
    def lookup_climate_val(i, par=ad):

        val =\
        ds.sel(longitude = Clim.lon_center[i],
               latitude  = Clim.lat_center[i],
               year      = Clim.Year[i],
               method    = 'nearest')[par].values

        return val


    def add_looked_up_column(par=ad):

        Clim[par] = Clim.index.map(lookup_climate_val)

    add_looked_up_column()
Clim

In [None]:
Clim['season_length'] = (Clim['season_end'] - Clim['season_start']).dt.days

In [None]:
d0 = date(2008, 8, 18)
d1 = date(2008, 9, 26)
delta = d1 - d0
print(delta.days)

In [None]:
Clim

## Merge Data

In [None]:
m1 = pd.merge(ALR, DFDF, on=['ForestryDistrict', 'FederalState', 'Year'], how='outer')
m1.drop(columns=['beetle_salvage_2', 'beetle_salvage_3', 'storm_salvage_2', 'storm_salvage_3',
                 'PeriodLength', 'Area',
                 'IpsTypDamageYrly'], 
        inplace=True)
m1

# Data Visualization

In [None]:
# Total Forest Share
fig, ax = plt.subplots(1,1,dpi=200)


FIg = FI.groupby(['ForestryDistrict'])

for place in FI.ForestryDistrict.unique():
    
    if place in ['Steiermark', 'Kärnten', 'Tirol', 'Niederösterreich', 'Oberösterreich', 'Salzburg', 'Vorarlberg', 'Innsbruck-Stadt', 'Weiz'] :
        pass
    else:
        # 'SpruceShareTotalForest', 'TotalForestShare', 'TotalForestArea', 'SpruceArea'
        FIg.get_group(place).sort_values(by='Year').plot(x='Year', y='SpruceArea', ax=ax, label=place, legend=False) 
    
# Ich brauch von jedem Jahr und Ort zwischen 1988 und 2022 Daten

In [None]:
fig, ax = plt.subplots(1,1,dpi=200)

for place in ['Steiermark', 'Kärnten', 'Tirol', 'Niederösterreich', 'Oberösterreich', 'Salzburg', 'Vorarlberg']:
    FIg.get_group(place).sort_values(by='Year').plot(x='Year', y='SpruceArea', ax=ax, label=place)

plt.legend(bbox_to_anchor=(1,1))
plt.title('Spruce in Austrian Forest - State-level')
#plt.ylabel('Share')

## How to do a correlation Matrix

In [None]:
dat = {'A': [45, 37, 42, 35, 39],
        'B': [38, 31, 26, 28, 33],
        'C': [10, 15, 17, 21, 12]
        }

df = pd.DataFrame(dat)

corr_matrix = df.corr()
corr_matrix

In [None]:
# Show visually
import seaborn as sn

sn.heatmap(corr_matrix, annot=True)

In [None]:
# To create the ultimate data table...
# Merge the HEM, Waldinventur and DWF data together.
data3 = pd.merge(data2, forest_table, on=['Distrctforestryoffice', 'FederalState', 'Year'], how='outer')


#new_index = [1803] + list(data3.index[1::])
#data3 = data3.reindex(new_index).reset_index(drop=True)


data3.drop(index=[0, 1803], inplace=True)
data3.sort_values(['FederalState', 'Distrctforestryoffice', 'Year'], inplace=True)
data3.reset_index(inplace=True, drop=True)




def check_empty_cells(column):
    
    empty_cells = []
    
    for i, el in enumerate(column):
        
        if type(el) == str:
            el = el.strip(" - NaN")
            if el == '':
                empty_cells.append(i)
        
    return empty_cells




for col in data3.columns[3::]:
    #print(col)
    
    try:
        data3[col] = data3[col].astype(float)
        
    except:
        suspects = check_empty_cells(data3[col])
        #print(f"   --> Problem at indices {suspects}")
        for i in suspects:
           
            data3.at[i, col] = np.nan

        data3[col] = data3[col].astype(float)
        

In [None]:
# Fill in missing area and population data for districts
data3.PeriodLength.fillna(1, inplace=True)

cols = ['DistrictArea', 'Population', 'PopulationDensity']
data3[cols] = data3.groupby('Distrctforestryoffice')[cols].ffill()
data3.drop(columns='Area', inplace=True)

In [None]:
# Merge two different bark beetle and storm data from different years
data3['BB_Damaged_Salvage'] = data3[['BBDamageYrly', 'Beetle_damage']].mean(axis=1)
data3['Storm_Damaged_Salvage'] = data3[['YrlyStormDamage', 'Storm_damage']].mean(axis=1)

data3.drop(columns=['Beetle_damage','BBDamageYrly',
                    'Storm_damage', 'YrlyStormDamage'], 
           inplace=True)

In [None]:
# reorder and rename some parts of the dataframe
new_order2 = [data3.columns[i] for i in [0,6,1,2,3,4,5] + list(range(7, len(data3.columns)))]
data3 = data3[new_order2]
data3.rename(columns={'Year' : 'Years'}, inplace=True)

In [None]:
data3

In [None]:
data3.columns

In [None]:
# Still gotta try to fill up the NaNs as much as possible...
data3[pd.notna(data3.TotalForestArea)].head(25)

In [None]:
list(data3.Years.unique())

In [None]:
data3[]


In [None]:
data3[(data3.Years.isin(range(1992,1996+1))) | (data3.Years == '1992-1996')]

In [None]:
#  A units line can still be added if wanted

units_dict = {
 'Year'                                           :   'year(s)',
 'FederalState'                                   :   'name',
 'Distrctforestryoffice'                          :   'name',
 'DistrictArea'                                   :   'km2',
 'Population'                                     :   'ppl',
 'PopulationDensity'                              :   'ppl/km2',
 'Beetle_damage'                                  :   'harvest-m3',
 'Storm_damage'                                   :   'years',
 'PeriodLength'                                   :   'years',
 'Area'                                           :   'km2 ',
 'DistrictShareAustrForest'                       :   'percent(ha)',
 'DistrictShareStateForest'                       :   'percent(ha)',
 'TotalForestShare'                               :   'percent(ha)',
 'TotalForestArea'                                :   'ha',
 'ErtragswaldShare'                               :   'percent(ha)',
 'ErtragswaldArea'                                :   'ha',
 'ErtragswaldStock'                               :   'stock-m3/ha',
 'ErtragswaldTotalStock'                          :   'stock-m3',
 'ConiferousShare'                                :   'percent(ha)',
 'ConiferousShare2'                               :   'percent(stock-m3)',
 'ConiferousArea'                                 :   'ha',
 'ConiferousStock'                                :   'stock-m3',
 'SpruceShareTotalArea'                           :   'percent(ha)',
 'SpruceShareTotalForest'                         :   'percent(ha)',
 'SpruceEWShare'                                  :   'percent(stock-m3)',
 'SpruceConifShare'                               :   'percent(ha)',
 'SpruceArea'                                     :   'ha',
 'SpruceArea-SD'                                  :   'ha',
 'SpruceArea2'                                    :   'ha',
 'SpruceStockShare'                               :   'percent(stock-m3)',
 'SpruceStock'                                    :   'stock-m3',
 'DWStShare'                                      :   'percent',
 'DWStStock'                                      :   'stock-m3/ha',
 'DWSt-SD'                                        :   'stock-m3/ha',
 'DeadSpruceStShare'                              :   'percent',
 'DeadSpruceArea'                                 :   'ha',
 'DeadSpruceDensity'                              :   'stock-m3/ha',
 'YrlyStormDamage'                                :   'harvest-m3',
 'BBDamageYrly'                                   :   'harvest-m3',
 'IpsTypographusShare'                            :   'percent(BB)',
 'IpsTypDamageYrly'                               :   'stock-m3',
 'IpsTypAreaDamage'                               :   'ha',
 'IpsTypAreaDamage2'                              :   'ha_red',
 'Indicator1: SprArea, StromDmg'                  :   np.nan,
 'Indicator2: SprShareConif, StormDmg, Deadwood'  :   np.nan,
 'Indicator3: SprArea, StormDmg, Deadwood'        :   np.nan
}

#data3.iloc[0] = units_dict.values()

# Export

In [None]:
data3.to_csv("BBDamage_ForestStructure_AUT_Districts.csv", sep=';')

### Access postprocessed data

In [None]:
pp_regex = re.compile(r'_1980-2022_')
pp_files = [path.name for path in os.scandir(obs_path) if path.is_file() if pp_regex.search(path.name)]
pp_files

In [None]:
data = xr.open_dataset(f"{obs_path}/{pp_files[1]}")
data.fg.isel(time=0).plot()

##### Notes

In [None]:
# Swarm days


In [None]:
# Too hot days


In [None]:
# Wir brauchen eine Fichten Mask!!! Des wär so geil wenn wir nur die grid points beachten, wo Fichten sind! 
# Und dann natuerlich noch fuer jeden grid point die % an Fichte im Wald :D 

########################

In [None]:
# for reference   (and because this was so much work, I didn't just wanna delete it)
full_list_of_districts = {
 'Amstetten'              : 'LowerAustria' ,
 'Baden'                  : 'LowerAustria' ,
 'Bludenz'                : 'Vorarlberg'   ,
 'Braunau/Inn'            : 'UpperAustria' ,
 'Bregenz'                : 'Vorarlberg'   ,
 'Bruck/Leitha+Mödling'   : 'LowerAustria' ,
 'Bruck/Mur+Mürzzuschlag' : 'Styria'       ,
 'Deutschlandsberg'       : 'Styria'       ,
 'Dornbirn'               : 'Vorarlberg'   ,
 'Eferding'               : 'UpperAustria' ,
 'Eisenstadt+Rust'        : 'Burgenland'   ,
 'Feldkirch'              : 'Vorarlberg'   ,
 'Feldkirchen'            : 'Carinthia'    ,
 'Freistadt'              : 'UpperAustria' ,
 'Gmunden'                : 'UpperAustria' ,
 'Gmünd+Waidhofen/Thaya'  : 'LowerAustria' ,
 'Graz'                   : 'Styria'       ,
 'Grieskirchen'           : 'UpperAustria' ,
 'Gänserndorf+Mistelbach' : 'LowerAustria' ,
 'Güssing'                : 'Burgenland'   ,
 'Hallein'                : 'Salzburg'     ,
 'Hartberg+Fürstenfeld'   : 'Styria'       ,
 'Hermagor'               : 'Carinthia'    ,
 'Horn+Hollabrunn'        : 'LowerAustria' ,
 'Imst'                   : 'Tyrol'        ,
 'Innsbruck-Land'         : 'Tyrol'        ,   # inkl. Hall, Telfs und Steinach
 'Innsbruck-Stadt'        : 'Tyrol'        ,
 'Jennersdorf'            : 'Burgenland'   ,
 'Kirchdorf/Krems'        : 'UpperAustria' ,
 'Kitzbühel'              : 'Tyrol'        ,
 'Klagenfurt'             : 'Carinthia'    ,
 'Korneuburg+Tulln'       : 'LowerAustria' ,
 'Krems'                  : 'LowerAustria' ,
 'Kufstein'               : 'Tyrol'        ,
 'Landeck'                : 'Tyrol'        ,
 'Leibnitz'               : 'Styria'       ,
 'Leoben'                 : 'Styria'       ,
 'Lienz'                  : 'Tyrol'        ,   # Osttirol
 'Liezen'                 : 'Styria'       ,
 'Lilienfeld'             : 'LowerAustria' ,
 'Linz'                   : 'UpperAustria' ,
 'Mattersburg'            : 'Burgenland'   ,
 'Melk'                   : 'LowerAustria' ,
 'Murau'                  : 'Styria'       ,
 'Murtal'                 : 'Styria'       ,   # inkl. Judenburg und Knittelfeld
 'Neunkirchen'            : 'LowerAustria' ,
 'Neusiedl/See'           : 'Burgenland'   ,
 'Oberpullendorf'         : 'Burgenland'   , 
 'Oberwart'               : 'Burgenland'   ,
 'Perg'                   : 'UpperAustria' ,
 'Reutte'                 : 'Tyrol'        ,   # inkl. Lechtal
 'Ried/Innkreis'          : 'UpperAustria' ,
 'Rohrbach'               : 'UpperAustria' ,
 'Salzburg-Umgebung'      : 'Salzburg'     ,
 'Scheibbs'               : 'LowerAustria' ,
 'Schwaz'                 : 'Tyrol'        ,
 'Schärding'              : 'UpperAustria' ,
 'Spittal/Drau'           : 'Carinthia'    ,
 'St.Johann'              : 'Salzburg'     ,
 'St.Pölten'              : 'LowerAustria' ,
 'St.Veit/Glan'           : 'Carinthia'    ,
 'Steyr'                  : 'UpperAustria' ,
 'Südoststeiermark'       : 'Styria'       ,   # inkl. Feldbach und Radkersburg
 'Tamsweg'                : 'Salzburg'     ,
 'Urfahr'                 : 'UpperAustria' ,
 'Villach'                : 'Carinthia'    ,
 'Voitsberg'              : 'Styria'       ,
 'Vöcklabruck'            : 'UpperAustria' ,
 'Völkermarkt'            : 'Carinthia'    ,
 'Weiz'                   : 'Styria'       ,
 'Wels'                   : 'UpperAustria' ,
 'WienerNeustadt'         : 'LowerAustria' ,
 'Wolfsberg'              : 'Carinthia'    ,
 'Zell/See'               : 'Salzburg'     ,
 'Zwettl'                 : 'LowerAustria' ,
}

# Notes

In [None]:
PCA (Principal Component Analysis) --> to find which ones have the biggest impact on your outcome

In [None]:
Mixed Effects Modelling / Multilevel / Hirarchical /Regularized Regression (panalizing amounts of parameters)

In [None]:
Sensitivity Analysis, make sure to higher the frequency 

In [None]:
scipy = sklearn

In [None]:
cross validation --> don't be that person that doesn't do it = separting the data into training and validation dataset
and you have to separtae the dataset into meaningful subgroups
--> Nabin will help
--> Esther: trained on countries and left countries out and used the countries that were left out to train the dataset (group k-fold)
--> random-fold done by python automatically


In [None]:
Start with:
Correlation Matrix --> put in all x and the y and eleminate variables (x) that are autocorrelated!
draw scatter plots -->  

Multiple regregression analysis

backward elimination --> remove one variable from multiple regression equation'
forward elimination --> the other way round
--> Do those elimination both 
AIC = how good is the fit?
BIC = ?

In [None]:
Check other epidemilogical models, forest models, use whatever people , physically-based
Estimating forestdamages methods