### Economic Geography of Interwar Poland
This notebook uses the administrative database and files with economic data to compute the GDP of Interwar Poland on the district level and present an overview of the Polish economic geography in the Interwar Period.

It is a showcase of the use of the whole toolkit. An instance of the AdministrativeHistory class constructed with inputs describing administrative history is used as an API that allows to access, impute, and harmonize data in the database. In the mature form of the toolkit, a PostgreSQL database with economic data will be created and hosted online and a pip-installable package called interwar_poland_database will be publicly shared. The pip package will reconstruct the administrative base locally, and will connect as a read-only-user to the PostgreSQL database to import the necessary data. In this preliminary version of the toolkit, the economic data is stored in the CSV form.

In [1]:
%matplotlib inline

In [2]:
# Necessary imports
import pandas as pd
from utils.helper_functions import load_config
from core.core import AdministrativeHistory
from utils.helper_functions import extract_date_parts

In [3]:
# Load the configuration
config = load_config("config.json")

# Create an administrative history object
administrative_history = AdministrativeHistory(config, load_geometries=True)

Loading changes list...
✅ Loaded 302 validated changes in 0.06 seconds.
Loading initial state...
✅ Loaded initial state.
Loading initial district registry...
✅ Loaded 292 validated districts in 0.02 seconds. Set their initial state timespands to (1921-02-19, 1939-09-01).
Loading initial region registry...
✅ Loaded 19 validated regions in 0.00 seconds. Set their initial state timespands to (1921-02-19, 1939-09-01)
Creating administrative history (sequentially applying changes)...
✅ Successfully applied all changes in 10.07 seconds. Administrative history database created.
Loading territories...
Loaded: powiaty_1921_corrected_names.shp (276 rows)
Loaded: powiaty_1931_corrected_names.shp (283 rows)
Loaded: powiaty_1938_corrected_names_modified.shp (264 rows)
No match found for district 'None' on 1931-04-18
No match found for district 'None' on 1931-04-18
No match found for district 'None' on 1931-04-18
No match found for district 'None' on 1931-04-18
No match found for district 'None' on 

In [4]:
# Use administrative history to harmonize all the input data tables to one administrative state
administrative_history.harmonize_data()

Harmonizing example data in the 'input/harmonization_input/data_test_2/' folder.
Constructing conversion matrix between two administrative states:
Administrative State from: <AdministrativeState timespan=(1932-05-18, 1933-04-01), regions=19, districts=282>
Administrative State to: <AdministrativeState timespan=(1938-04-01, 1938-10-01), regions=19, districts=282>
✅ Successfully constructed conversion dict in 17.68 seconds.
Constructing conversion matrix based on the dict.
✅ Successfully constructed conversion matrix in 17.70 seconds.
Harmonizing csv file 'input/harmonization_input/data_test_2/9.12.1931-occupation_central_eastern_voivodships_ready_clean.csv' from 1933-01-01 to 1938-04-01.
Original borders: <AdministrativeState timespan=(1932-05-18, 1933-04-01), regions=19, districts=282>.
Target borders: <AdministrativeState timespan=(1938-04-01, 1938-10-01), regions=19, districts=282>.
Attempting to read: input/harmonization_input/data_test_2/9.12.1931-occupation_central_eastern_voivods

In [5]:
administrative_history.post_organization_reorganize_data_tables()

Beginning post-processing. Total number of methods to apply: 2
Calling sum_up_data_tables method...
🟡 Starting sum_up_data_tables: ['9.12.1931-occupation_central_eastern_voivodships_ready_clean', '9.12.1931-occupation_southern_western_voivodships_ready_clean'] -> 1931_occupation_ready_clean.csv
✅ Finished sum_up_data_tables: Output written to output/harmonized_data/1931_occupation_ready_clean.csv
Calling create_dist_area_dataset method...
🟡 Starting create_dist_area_dataset (adm. state for 1938-04-01)
✅ Finished create_dist_area_dataset: Metadata and output added to the database.
🎉 All post-processing methods applied successfully.


In [12]:
example_df, example_data_table_metadata, example_adm_state_date = administrative_history.load_data_table(data_table_id = '1931_occupation_ready_clean', version='harmonized')

In [13]:
example_df

Unnamed: 0,District,Total - all : Together,Total - m: Together,Total - f: Together,"08-00 Section B. Horticulture, fishing, forestry - all: Together","08-00 Section B. Horticulture, fishing, forestry - m: Together","08-00 Section B. Horticulture, fishing, forestry - f: Together",08 Horticulture and beekeeping: Together,09 Fishing: Together,00 Forestry and hunting: Together,...,813 Hairdressing and beauty salons: Together,814 Laundries and ironing services: Together,815 Street and square cleaning: Together,816 Cemeteries and funeral services: Together,82 Social welfare institutions: Together,R8 Other branches of Section I: Together,9 Domestic service: Together,x Persons living without gainful employment: Together,x1 Capitalists and rentiers: Together,"n Section N. Persons with unspecified occupation, delinquent persons, and persons with unknown occupation: Together"
0,AUGUSTOWSKI,6774.000000,4846.000000,1928.000000,450.000000,426.0,24.0,37.0,83.0,330.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,515.000000,433.000000,0.0,144.000000
1,BIAŁOSTOCKI,15892.000000,12131.000000,3761.000000,610.000000,554.0,56.0,0.0,0.0,456.000000,...,0.0,0.0,0.0,0.0,0.0,134.000000,566.000000,953.000000,33.0,255.000000
2,BIELSKI (BIELSK PODLASKI),17639.000000,13531.000000,4108.000000,1452.000000,1378.0,74.0,0.0,0.0,1321.000000,...,0.0,0.0,0.0,0.0,0.0,198.000000,1088.000000,588.000000,0.0,113.000000
3,GRODZIEŃSKI,30844.000000,21297.000000,9547.000000,714.000000,627.0,87.0,272.0,68.0,374.000000,...,199.0,0.0,0.0,0.0,72.0,0.000000,2323.000000,2063.000000,42.0,1521.000000
4,ŁOMŻYŃSKI,20056.000000,14186.000000,5870.000000,325.000000,276.0,49.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,55.0,0.000000,1221.000000,1744.000000,86.0,946.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,DZIŚNIEŃSKI,11835.000000,8341.000000,3494.000000,246.000000,0.0,0.0,0.0,0.0,149.000000,...,0.0,0.0,0.0,0.0,10.0,0.000000,1025.000000,469.000000,0.0,133.000000
260,WILEJSKI,7726.000000,5421.000000,2305.000000,213.000000,0.0,0.0,0.0,0.0,176.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,798.000000,688.000000,0.0,73.000000
261,WILEŃSKO-TROCKI,15995.502168,11637.656164,4357.846004,894.620865,0.0,0.0,0.0,0.0,702.098707,...,0.0,0.0,0.0,0.0,0.0,0.794115,1328.266695,1012.623271,0.0,525.533036
262,POSTAWSKI,4247.000000,2807.000000,1440.000000,210.000000,0.0,0.0,0.0,0.0,130.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,485.000000,263.000000,0.0,52.000000


In [17]:
print(example_df.columns)

Index(['District', 'Total - all  : Together', 'Total - m: Together',
       'Total - f: Together',
       '08-00 Section B. Horticulture, fishing, forestry - all: Together',
       '08-00 Section B. Horticulture, fishing, forestry - m: Together',
       '08-00 Section B. Horticulture, fishing, forestry - f: Together',
       '08 Horticulture and beekeeping: Together', '09 Fishing: Together',
       '00 Forestry and hunting: Together',
       ...
       '813 Hairdressing and beauty salons: Together',
       '814 Laundries and ironing services: Together',
       '815 Street and square cleaning: Together',
       '816 Cemeteries and funeral services: Together',
       '82 Social welfare institutions: Together',
       'R8 Other branches of Section I: Together',
       '9 Domestic service: Together',
       'x Persons living without gainful employment: Together',
       'x1 Capitalists and rentiers: Together',
       'n Section N. Persons with unspecified occupation, delinquent persons, an

In [16]:
administrative_history.plot_dataset(df = example_df, col_name = 'Total - all : Together', adm_level='District', adm_state_date=example_adm_state_date)

KeyError: 'Total - all : Together'

In [11]:
columns_to_select = ['Pszenica: Together', 'Żyto: Together', 'Jęczmień: Together', 'Owies: Together', 'Ziemniaki: Together']
print(f"Select columns {columns_to_select} where present.")
# Load the crops size
crops_size_data_table_ids = [data_table.data_table_id for data_table in administrative_history.harmonized_data_metadata if data_table.category=='Crops prices']
print(len(crops_size_data_table_ids))
crops_size_dfs = [administrative_history.load_data_table(data_table_id = crops_data_table_id, version = 'harmonized') for crops_data_table_id in crops_size_data_table_ids]
crops_size_year_df_pairs = [(extract_date_parts(metadata.date)[0], df) for df, metadata, date in crops_size_dfs if extract_date_parts(metadata.date)[1] == '08' or extract_date_parts(metadata.date)[0] is None]
for year, df in crops_size_year_df_pairs:
    print(f"Year: {year}, columns:")
    print(f"All columns: {df.columns}")
    existing_columns = [col for col in columns_to_select if col in df.columns]
    df = df[existing_columns]
    # Print all column names in the loaded data table
    print(f"After selection: {df.columns}")
    df.columns = [col.split(':')[0].strip() for col in df.columns]
    print(f"After stripping: {df.columns}")

Select columns ['Pszenica: Together', 'Żyto: Together', 'Jęczmień: Together', 'Owies: Together', 'Ziemniaki: Together'] where present.
0


In [None]:
# Load the crops prices
crops_prices_data_table_ids = [data_table.data_table_id for data_table in administrative_history.harmonized_data_metadata if data_table.category=='Crops - Size']
crops_prices_dfs = [administrative_history.load_data_table(data_table_id = crops_data_table_id, version = 'harmonized') for crops_data_table_id in crops_prices_data_table_ids]
crops_prices_year_df_pairs = [(extract_date_parts(metadata.date)[0], df) for df, metadata, date in crops_prices_dfs]
for year, df in crops_prices_year_df_pairs:
    columns_to_select = ['Pszenica: Together', 'Żyto: Together', 'Jęczmień: Together', 'Owies: Together', 'Ziemniaki: Together']
    existing_columns = [col for col in columns_to_select if col in df.columns]
    df = df[existing_columns]
    # Print all column names in the loaded data table
    print(f"Year: {year}, columns:")
    print(f"Before stripping: {df.columns}")
    df.columns = [col.split(':')[0].strip() for col in df.columns]
    print(f"After stripping: {df.columns}")

In [None]:
# Load livestock datasets
livestock_data_table_ids = [data_table.data_table_id for data_table in administrative_history.harmonized_data_metadata if data_table.category=='Livestock']
livestock_dfs = [administrative_history.load_data_table(data_table_id = livestock_data_table_id, version = 'harmonized') for livestock_data_table_id in livestock_data_table_ids]
livestock_year_df_pairs = [(extract_date_parts(metadata.date)[0], df) for df, metadata, date in livestock_dfs]
for year, df in livestock_year_df_pairs
for year, df in crops_prices_year_df_pairs:
    columns_to_select = ['Pszenica: Together', 'Żyto: Together', 'Jęczmień: Together', 'Owies: Together', 'Ziemniaki: Together']
    existing_columns = [col for col in columns_to_select if col in df.columns]
    df = df[existing_columns]
    # Print all column names in the loaded data table
    print(f"Year: {year}, columns:")
    print(f"Before stripping: {df.columns}")
    df.columns = [col.split(':')[0].strip() for col in df.columns]
    print(f"After stripping: {df.columns}")