Notebook in 4 sections:

(1) Import county codes

(2) Population by powiat:
- Aim: person hours worked (not possible). Next best is employed people by powiat (as this is the denominator for the average wage measure - technically should be employed people ate firms greater than size 10 (**I think - check wage measure**) - but this can be mentioned in notes).
- Various measures explored as listed in excel sheet (**will upload**)

(3) Population by sex-age by powiat

(4) Population by sex-education by powiat

In [550]:
import sys
from pathlib import Path

p = Path.cwd().resolve()
repo_root = next((parent for parent in [p] + list(p.parents) if (parent / ".git").exists()), None)
if repo_root is None:
    raise RuntimeError("Repo root not found. Open the repo folder in VS Code.")

sys.path.insert(0, str(repo_root))
print("Repo root:", repo_root)

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

Repo root: C:\Users\harri\OneDrive - Imperial College London\Year 3 Group Project\Group_Project_Y3


****SECTION 1 - IMPORT COUNTY CODES****

In [551]:
# Get the county codes table
county_codes = pd.read_csv(repo_root / "cleaned/00_codes/county_codes.csv")
print(county_codes.shape)
county_codes.head()

(380, 3)


Unnamed: 0,county_code,county_kts,county_name
0,201,10030210101000,Powiat bolesławiecki
1,202,10030210302000,Powiat dzierżoniowski
2,203,10030210203000,Powiat głogowski
3,204,10030210204000,Powiat górowski
4,205,10030210105000,Powiat jaworski


****SECTION 2 - POPULATION BY POWIAT****

**Option 1 - NC 2021 (as in Barcelona)**

Uses P4315 / P4181 (both give the same 13+ population) to give 2021 census population for powiat. They then treat population as constant and as denominator for unemployment rate.

In [552]:
p4181 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_nc_sex_age_p4181.csv", index_col=0)
p4181.head()

p4315 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_nc_sex_ed_p4315.csv", index_col=0)
p4315.head()

# Filter to ages 13 upwards and total for sex
age_filter = ['total', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
p4181 = p4181[
    ~(p4181["age"].isin(age_filter)) & (p4181["sex"]=="total")
]
# p4181

# Filter to just total education and sex
p4315 = p4315[
    (p4315["education"]=="total") & (p4315["sex"]=="total")
]
# p4315

# Check they give the same powiat totals
p4181_total = p4181.groupby("code")["count"].sum()
p4315_total = p4315.groupby("code")["count"].sum()

(p4181_total == p4315_total).sum()

np.int64(380)

In [553]:
# They are consistent so will go with the first
ptot_option1 = pd.DataFrame(p4181_total).reset_index()

ptot_option1["merge_code"] = ptot_option1["code"].apply(lambda x: int(str(x)[:-3]))

ptot_option1 = ptot_option1.merge(
    county_codes,
    how="left",
    left_on="merge_code",
    right_on="county_code"
)

ptot_option1

Unnamed: 0,code,count,merge_code,county_code,county_kts,county_name
0,201000,76739,201,201,10030210101000,Powiat bolesławiecki
1,202000,86543,202,202,10030210302000,Powiat dzierżoniowski
2,203000,75191,203,203,10030210203000,Powiat głogowski
3,204000,29042,204,204,10030210204000,Powiat górowski
4,205000,42493,205,205,10030210105000,Powiat jaworski
...,...,...,...,...,...,...
375,3217000,44491,3217,3217,10023216417000,Powiat wałecki
376,3218000,30007,3218,3218,10023216418000,Powiat łobeski
377,3261000,93083,3261,3261,10023216361000,Powiat m. Koszalin
378,3262000,349790,3262,3262,10023216562000,Powiat m. Szczecin


**Option 2 - Yearly powiat population:**

Uses p2137 - yearly population by sex and age group by powiat. We will filter to 15+

In [554]:
p2137 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_yr_sex_agegr_p2137.csv", index_col=0)

p2137

Unnamed: 0,code,powiat,year,sex,age_group,count
0,201000,Powiat bolesławiecki,1995,total,total,89407.0
1,202000,Powiat dzierżoniowski,1995,total,total,113810.0
2,203000,Powiat głogowski,1995,total,total,91373.0
3,204000,Powiat górowski,1995,total,total,37826.0
4,205000,Powiat jaworski,1995,total,total,54914.0
...,...,...,...,...,...,...
721975,3217000,Powiat wałecki,2024,females,0-14,3279.0
721976,3218000,Powiat łobeski,2024,females,0-14,2178.0
721977,3261000,City with powiat status Koszalin,2024,females,0-14,6713.0
721978,3262000,City with powiat status Szczecin,2024,females,0-14,24100.0


In [555]:
print(p2137.age_group.unique())
print(p2137.sex.unique())

# Filter to just relevant (total sex and age group 15+)
age_filter = ['total', '0-4', '5-9', '10-14', '0-14', '70 and more']
p2137 = p2137[
    ~(p2137["age_group"].isin(age_filter)) & (p2137["sex"]=="total")
].copy()
p2137

['total' '0-4' '5-9' '10-14' '15-19' '20-24' '25-29' '30-34' '35-39'
 '40-44' '45-49' '50-54' '55-59' '60-64' '65-69' '70 and more' '70-74'
 '75-79' '80-84' '85 and more' '0-14']
['total' 'males' 'females']


Unnamed: 0,code,powiat,year,sex,age_group,count
137520,201000,Powiat bolesławiecki,1995,total,15-19,7577.0
137521,202000,Powiat dzierżoniowski,1995,total,15-19,9301.0
137522,203000,Powiat głogowski,1995,total,15-19,9744.0
137523,204000,Powiat górowski,1995,total,15-19,3236.0
137524,205000,Powiat jaworski,1995,total,15-19,4761.0
...,...,...,...,...,...,...
664675,3217000,Powiat wałecki,2024,total,85 and more,846.0
664676,3218000,Powiat łobeski,2024,total,85 and more,673.0
664677,3261000,City with powiat status Koszalin,2024,total,85 and more,3031.0
664678,3262000,City with powiat status Szczecin,2024,total,85 and more,10557.0


In [556]:
p2137.age_group.unique()

array(['15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45-49',
       '50-54', '55-59', '60-64', '65-69', '70-74', '75-79', '80-84',
       '85 and more'], dtype=object)

In [557]:
p2137["merge_code"] = p2137["code"].apply(lambda x: int(str(x)[:-3]))

p2137 = p2137.merge(
    county_codes,
    how="left",
    left_on="merge_code",
    right_on="county_code"
)

p2137

Unnamed: 0,code,powiat,year,sex,age_group,count,merge_code,county_code,county_kts,county_name
0,201000,Powiat bolesławiecki,1995,total,15-19,7577.0,201,201.0,1.003021e+13,Powiat bolesławiecki
1,202000,Powiat dzierżoniowski,1995,total,15-19,9301.0,202,202.0,1.003021e+13,Powiat dzierżoniowski
2,203000,Powiat głogowski,1995,total,15-19,9744.0,203,203.0,1.003021e+13,Powiat głogowski
3,204000,Powiat górowski,1995,total,15-19,3236.0,204,204.0,1.003021e+13,Powiat górowski
4,205000,Powiat jaworski,1995,total,15-19,4761.0,205,205.0,1.003021e+13,Powiat jaworski
...,...,...,...,...,...,...,...,...,...,...
171895,3217000,Powiat wałecki,2024,total,85 and more,846.0,3217,3217.0,1.002322e+13,Powiat wałecki
171896,3218000,Powiat łobeski,2024,total,85 and more,673.0,3218,3218.0,1.002322e+13,Powiat łobeski
171897,3261000,City with powiat status Koszalin,2024,total,85 and more,3031.0,3261,3261.0,1.002322e+13,Powiat m. Koszalin
171898,3262000,City with powiat status Szczecin,2024,total,85 and more,10557.0,3262,3262.0,1.002322e+13,Powiat m. Szczecin


In [558]:
p2137[p2137["county_code"].isna()]

Unnamed: 0,code,powiat,year,sex,age_group,count,merge_code,county_code,county_kts,county_name
28,263000,City with powiat status Wałbrzych to 2002,1995,total,15-19,11449.0,263,,,
168,1431000,Powiat warszawski,1995,total,15-19,117361.0,1431,,,
410,263000,City with powiat status Wałbrzych to 2002,1996,total,15-19,11621.0,263,,,
550,1431000,Powiat warszawski,1996,total,15-19,116256.0,1431,,,
792,263000,City with powiat status Wałbrzych to 2002,1997,total,15-19,11807.0,263,,,
...,...,...,...,...,...,...,...,...,...,...
170922,1431000,Powiat warszawski,2022,total,85 and more,,1431,,,
171164,263000,City with powiat status Wałbrzych to 2002,2023,total,85 and more,,263,,,
171304,1431000,Powiat warszawski,2023,total,85 and more,,1431,,,
171546,263000,City with powiat status Wałbrzych to 2002,2024,total,85 and more,,263,,,


In [559]:
p2137[p2137["merge_code"]==1431].head(10)

Unnamed: 0,code,powiat,year,sex,age_group,count,merge_code,county_code,county_kts,county_name
168,1431000,Powiat warszawski,1995,total,15-19,117361.0,1431,,,
550,1431000,Powiat warszawski,1996,total,15-19,116256.0,1431,,,
932,1431000,Powiat warszawski,1997,total,15-19,116149.0,1431,,,
1314,1431000,Powiat warszawski,1998,total,15-19,117247.0,1431,,,
1696,1431000,Powiat warszawski,1999,total,15-19,128590.0,1431,,,
2078,1431000,Powiat warszawski,2000,total,15-19,121749.0,1431,,,
2460,1431000,Powiat warszawski,2001,total,15-19,114527.0,1431,,,
2842,1431000,Powiat warszawski,2002,total,15-19,,1431,,,
3224,1431000,Powiat warszawski,2003,total,15-19,,1431,,,
3606,1431000,Powiat warszawski,2004,total,15-19,,1431,,,


In [560]:
missing = p2137["merge_code"]==263 # as before
p2137.loc[missing, "county_code"] = 265
p2137.loc[missing, "county_kts"] = 10030210365000
p2137.loc[missing, "county_name"] = "Powiat m. Wałbrzych"

# 1431 only exists up to 2001 - we will just drop
p2137 = p2137.dropna(subset=["county_code"]).copy()

In [561]:
ptot_option2 = pd.DataFrame(p2137.groupby(["county_code", "county_kts", "year"])["count"].sum()).reset_index()
ptot_option2

Unnamed: 0,county_code,county_kts,year,count
0,201.0,1.003021e+13,1995,63763.0
1,201.0,1.003021e+13,1996,64107.0
2,201.0,1.003021e+13,1997,64719.0
3,201.0,1.003021e+13,1998,65261.0
4,201.0,1.003021e+13,1999,63818.0
...,...,...,...,...
11395,3263.0,1.002322e+13,2020,35382.0
11396,3263.0,1.002322e+13,2021,34960.0
11397,3263.0,1.002322e+13,2022,34619.0
11398,3263.0,1.002322e+13,2023,34317.0


**Option 3 - NC 2021 (a) Employed, (b) Economically Active:**

From P4292 - economic activity of the population aged 15 years and more by sex by powiat

In [562]:
p4292 = pd.read_csv(repo_root / r"cleaned\03_01_outcome_data\nc_activity_table_p4292.csv", index_col=0)

# Filter to just total
p4292_total = p4292[p4292["sex"]=="total"].copy()

# Merge with county codes
p4292_total["merge_code"] = p4292_total["code"].apply(lambda x: int(str(x)[:-3]))
p4292_total = p4292_total.merge(
    county_codes,
    how="left",
    left_on="merge_code",
    right_on="county_code"
)

# Take colummns for the option
ptot_option3 = p4292_total[["county_code", "economically active population", "employed"]]

ptot_option3

Unnamed: 0,county_code,economically active population,employed
0,201,40280.0,38792.0
1,202,41553.0,39657.0
2,203,37825.0,36209.0
3,204,14860.0,13765.0
4,205,21874.0,20715.0
...,...,...,...
375,3217,22675.0,21486.0
376,3218,14465.0,13463.0
377,3261,48337.0,46160.0
378,3262,186667.0,179608.0


**Option 4 - 2021 NC Rate x Yearly Population Series: (a - employment rate, b - activity rate):**

From P4292 (activity table) and P2137 (yearly population) as above

In [563]:
# Select p4292 rate columns
p4292_total_rates = p4292_total[["county_code", "employment rate", "activity rate"]]

# Merge with p2137 measure of yearly population
ptot_option4 = ptot_option2.merge(
    p4292_total_rates,
    how="left",
    on="county_code"
)

# Construct columns wanted
ptot_option4["employed"] = ptot_option4["count"].mul(ptot_option4["employment rate"]).div(100).round()
ptot_option4["active"] = ptot_option4["count"].mul(ptot_option4["activity rate"]).div(100).round()

# Save pre filter copy for later
ptot_option4_full = ptot_option4.copy()

ptot_option4 = ptot_option4[["county_code", "year", "employed", "active"]]

ptot_option4

Unnamed: 0,county_code,year,employed,active
0,201.0,1995,35835.0,37238.0
1,201.0,1996,36028.0,37438.0
2,201.0,1997,36372.0,37796.0
3,201.0,1998,36677.0,38112.0
4,201.0,1999,35866.0,37270.0
...,...,...,...,...
11395,3263.0,2020,19036.0,19991.0
11396,3263.0,2021,18808.0,19752.0
11397,3263.0,2022,18625.0,19560.0
11398,3263.0,2023,18463.0,19389.0


**Option 5 - Vo Rates x Yearly Population (a - employment, b - activity, c - employmnet (total - 2019 on))**

P4113 - Employment Rate by vo (NOTE - THIS IS 18-64 ONLY)

P4108 - Activity Rate by vo

P2317 - Population in powiat

In [564]:
p4113 = pd.read_csv(repo_root / r"cleaned\03_01_outcome_data\lfs_vo_employ_rate_p4113.csv", index_col=0)

p4113

Unnamed: 0,code,voivodeship,year,age,rate
0,200000,DOLNOŚLĄSKIE,2010,total,
1,400000,KUJAWSKO-POMORSKIE,2010,total,
2,600000,LUBELSKIE,2010,total,
3,800000,LUBUSKIE,2010,total,
4,1000000,ŁÓDZKIE,2010,total,
...,...,...,...,...,...
3115,2400000,ŚLĄSKIE,2024,60-89,12.4
3116,2600000,ŚWIĘTOKRZYSKIE,2024,60-89,12.6
3117,2800000,WARMIŃSKO-MAZURSKIE,2024,60-89,14.8
3118,3000000,WIELKOPOLSKIE,2024,60-89,14.5


In [565]:
p4113[(p4113["voivodeship"]=="DOLNOŚLĄSKIE") & (p4113["age"].isin(["total", "18-59/64"]))]

Unnamed: 0,code,voivodeship,year,age,rate
0,200000,DOLNOŚLĄSKIE,2010,total,
16,200000,DOLNOŚLĄSKIE,2011,total,
32,200000,DOLNOŚLĄSKIE,2012,total,
48,200000,DOLNOŚLĄSKIE,2013,total,
64,200000,DOLNOŚLĄSKIE,2014,total,
80,200000,DOLNOŚLĄSKIE,2015,total,
96,200000,DOLNOŚLĄSKIE,2016,total,
112,200000,DOLNOŚLĄSKIE,2017,total,
128,200000,DOLNOŚLĄSKIE,2018,total,
144,200000,DOLNOŚLĄSKIE,2019,total,56.6


In [566]:
a = p4113.groupby(["year", "age"]).count()
a[a["rate"]>0].head(40)

Unnamed: 0_level_0,Unnamed: 1_level_0,code,voivodeship,rate
year,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010,15-24,16,16,16
2010,18-59/64,16,16,16
2010,50-89,16,16,16
2011,15-24,16,16,16
2011,18-59/64,16,16,16
2011,50-89,16,16,16
2012,15-24,16,16,16
2012,18-59/64,16,16,16
2012,50-89,16,16,16
2013,15-24,16,16,16


In [567]:
p4108 = pd.read_csv(repo_root / r"cleaned\03_01_outcome_data\lfs_vo_activity_rate_p4108.csv", index_col=0)

p4108

Unnamed: 0,code,voivodeship,year,age,rate
0,200000,DOLNOŚLĄSKIE,2010,total,53.9
1,400000,KUJAWSKO-POMORSKIE,2010,total,53.8
2,600000,LUBELSKIE,2010,total,52.9
3,800000,LUBUSKIE,2010,total,55.2
4,1000000,ŁÓDZKIE,2010,total,54.6
...,...,...,...,...,...
1915,2400000,ŚLĄSKIE,2024,50-89,33.1
1916,2600000,ŚWIĘTOKRZYSKIE,2024,50-89,35.7
1917,2800000,WARMIŃSKO-MAZURSKIE,2024,50-89,36.0
1918,3000000,WIELKOPOLSKIE,2024,50-89,37.5


In [568]:
a = p4108.groupby(["year", "age"]).count()
a[a["rate"]>0].head(40)

Unnamed: 0_level_0,Unnamed: 1_level_0,code,voivodeship,rate
year,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010,15-24,16,16,16
2010,18-59/64,16,16,16
2010,50-89,16,16,16
2010,total,16,16,16
2011,15-24,16,16,16
2011,18-59/64,16,16,16
2011,50-89,16,16,16
2011,total,16,16,16
2012,15-24,16,16,16
2012,18-59/64,16,16,16


Option A - this is employment rate vo 18-59/64 * p2137 population 18-59/64 (constructed)

- Could insert analysis here on how much of the population we are missing under this measure

In [569]:
# P2137
# Filter to age range 18-64
ptot_option5a = p2137[p2137["age_group"].isin(['15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45-49',
       '50-54', '55-59', '60-64'])]

# Linear get 18-19 population from 15-19
ptot_option5a.loc[ptot_option5a["age_group"]=="15-19", "count"] *= 2/5

# Group to get population
ptot_option5a = ptot_option5a.groupby(["county_code", "county_kts", "county_name", "year"])["count"].sum().reset_index()

# Convert kts to string for easier matching
ptot_option5a["county_kts"] = ptot_option5a["county_kts"].apply(lambda x: str(x)[:-2])

# Construct voivodeship matching key
ptot_option5a["voivodeship"] = ptot_option5a["county_kts"].apply(lambda x: int(x[4:6]))

# P4113 - EMPLOYMENT RATE
# Construct voi matching key
p4113["voivodeship"] = p4113["code"].apply(lambda x: int(str(x)[:-5]))

# Merge
ptot_option5a = ptot_option5a.merge(
       p4113[p4113["age"]=="18-59/64"],
       how="left",
       on=["year", "voivodeship"]
)

# Drop years with no merged data
ptot_option5a.dropna(subset=["code"], inplace=True)

# Calculate employed (18-64)
ptot_option5a["employed_18-64"] = ptot_option5a["count"].mul(ptot_option5a["rate"]).div(100).round()

# Save copy for option 6
ptot_option6a = ptot_option5a.copy()

# Filter down columns
ptot_option5a = ptot_option5a[["county_code", "year", "employed_18-64"]]
ptot_option5a

Unnamed: 0,county_code,year,employed_18-64
15,201.0,2010,38802.0
16,201.0,2011,39159.0
17,201.0,2012,39143.0
18,201.0,2013,39290.0
19,201.0,2014,40936.0
...,...,...,...
11395,3263.0,2020,18426.0
11396,3263.0,2021,18423.0
11397,3263.0,2022,18431.0
11398,3263.0,2023,18407.0


Option B and C - this is rate vo total * p2137 population total

In [570]:
# Start with total p2137 measure
# Convert county kts to vo code
ptot_option5bc = ptot_option2.copy()
ptot_option5bc["voivodeship"] = ptot_option5bc["county_kts"].apply(lambda x: (int(str(x)[4:6])))

# Now merge with p4113
ptot_option5bc = ptot_option5bc.merge(
    p4113[p4113["age"]=="total"],
    how="left",
    on=["voivodeship", "year"]
)

ptot_option5bc = ptot_option5bc.rename(
    columns={"rate": "employ_rate"}
)

# Now merge with p4108 - activity rate
p4108["voivodeship"] = p4108["code"].apply(lambda x: int(str(x)[:-5]))
ptot_option5bc = ptot_option5bc.merge(
    p4108[p4108["age"]=="total"],
    how="left",
    on=["voivodeship", "year"]
)

ptot_option5bc = ptot_option5bc.rename(
    columns={"rate": "activity_rate"}
)

# Drop non merged rows
ptot_option5bc.dropna(subset=["code_x", "code_y"], inplace=True)

# Construct outcome vars
ptot_option5bc["employed_total"] = ptot_option5bc["count"].mul(ptot_option5bc["employ_rate"]).div(100).round()
ptot_option5bc["active_total"] = ptot_option5bc["count"].mul(ptot_option5bc["activity_rate"]).div(100).round()

# Save copy for later
ptot_option6bc = ptot_option5bc.copy()

# Filter to relevant cols
ptot_option5bc = ptot_option5bc[["county_code", "county_kts", "year", "employed_total", "active_total"]]

ptot_option5bc

Unnamed: 0,county_code,county_kts,year,employed_total,active_total
15,201.0,1.003021e+13,2010,,41244.0
16,201.0,1.003021e+13,2011,,40876.0
17,201.0,1.003021e+13,2012,,41287.0
18,201.0,1.003021e+13,2013,,41477.0
19,201.0,1.003021e+13,2014,,42427.0
...,...,...,...,...,...
11395,3263.0,1.002322e+13,2020,18823.0,19460.0
11396,3263.0,1.002322e+13,2021,18878.0,19578.0
11397,3263.0,1.002322e+13,2022,19214.0,19664.0
11398,3263.0,1.002322e+13,2023,19218.0,19664.0


OPTION 6:
- Takes option 5
- And uses 2021 NC powiat rate (employ / activity) (from option 4 full)
- To edit yearly vo rate for powiat variaton


In [571]:
# First merge option 5s into one table for easier analysis
ptot_option6a = ptot_option6a.rename(
    columns={
        "count": "pop_18-64",
        "rate": "vo_employ_rate_18-64"
    }
).loc[:, ["county_code", "county_kts", "year", "pop_18-64", "vo_employ_rate_18-64"]]

ptot_option6bc = ptot_option6bc.rename(
    columns={
        "count": "pop",
        "employ_rate": "vo_employ_rate",
        "activity_rate": "vo_activity_rate"
    }
).loc[:, ["county_code", "year", "pop", "vo_employ_rate", "vo_activity_rate"]]

ptot_option6 = ptot_option6a.merge(
    ptot_option6bc,
    how="left",
    on=["county_code", "year"]
)

# Now we have to merge in the 2021 po rate
# We want to first create a table which is
# powiat -> relative factor on vo rate (2021) to powait rate (nc 2021)
ptot_option6_helper = ptot_option6[ptot_option6["year"]==2021][["county_code", "vo_employ_rate_18-64", "vo_employ_rate", "vo_activity_rate"]]
ptot_option4_full = ptot_option4_full[ptot_option4_full["year"]==2021]
ptot_option6_helper = ptot_option6_helper.merge(
    ptot_option4_full[["county_code", "employment rate", "activity rate"]],
    how="left",
    on="county_code"
)

ptot_option6_helper["vo_e_18-64_factor"] = ptot_option6_helper["employment rate"].div(ptot_option6_helper["vo_employ_rate_18-64"])
ptot_option6_helper["vo_e_factor"] = ptot_option6_helper["employment rate"].div(ptot_option6_helper["vo_employ_rate"])
ptot_option6_helper["vo_a_factor"] = ptot_option6_helper["activity rate"].div(ptot_option6_helper["vo_activity_rate"])

# Merge back onto option 6
ptot_option6 = ptot_option6.merge(
    ptot_option6_helper[["county_code", "vo_e_18-64_factor", "vo_e_factor", "vo_a_factor"]],
    how="left",
    on="county_code"
)

# Construct outcomes
ptot_option6["employed_18-64"] = ptot_option6["pop_18-64"].mul(ptot_option6["vo_employ_rate_18-64"]).mul(ptot_option6["vo_e_18-64_factor"]).div(100).round()
ptot_option6["employed"] = ptot_option6["pop"].mul(ptot_option6["vo_employ_rate"]).mul(ptot_option6["vo_e_factor"]).div(100).round()
ptot_option6["active"] = ptot_option6["pop"].mul(ptot_option6["vo_activity_rate"]).mul(ptot_option6["vo_a_factor"]).div(100).round()

ptot_option6

Unnamed: 0,county_code,county_kts,year,pop_18-64,vo_employ_rate_18-64,pop,vo_employ_rate,vo_activity_rate,vo_e_18-64_factor,vo_e_factor,vo_a_factor,employed_18-64,employed,active
0,201.0,10030210101000,2010,62083.0,62.5,76519.0,,53.9,0.71867,0.985965,0.983165,27886.0,,40549.0
1,201.0,10030210101000,2011,62156.4,63.0,76691.0,,53.3,0.71867,0.985965,0.983165,28142.0,,40188.0
2,201.0,10030210101000,2012,61838.0,63.3,76742.0,,53.8,0.71867,0.985965,0.983165,28131.0,,40592.0
3,201.0,10030210101000,2013,61487.2,63.9,76810.0,,54.0,0.71867,0.985965,0.983165,28237.0,,40779.0
4,201.0,10030210101000,2014,61097.8,67.0,76861.0,,55.2,0.71867,0.985965,0.983165,29419.0,,41713.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5695,3263.0,10023216663000,2020,25344.6,72.7,35382.0,53.2,55.0,0.72118,0.996296,1.008929,13288.0,18754.0,19634.0
5696,3263.0,10023216663000,2021,24695.8,74.6,34960.0,54.0,56.0,0.72118,0.996296,1.008929,13286.0,18808.0,19752.0
5697,3263.0,10023216663000,2022,24124.2,76.4,34619.0,55.5,56.8,0.72118,0.996296,1.008929,13292.0,19142.0,19839.0
5698,3263.0,10023216663000,2023,23508.4,78.3,34317.0,56.0,57.3,0.72118,0.996296,1.008929,13275.0,19146.0,19839.0


In [572]:
# Filter to relevant outcomes
ptot_option6 = ptot_option6[["county_code", "year", "employed_18-64", "employed", "active"]]
ptot_option6

Unnamed: 0,county_code,year,employed_18-64,employed,active
0,201.0,2010,27886.0,,40549.0
1,201.0,2011,28142.0,,40188.0
2,201.0,2012,28131.0,,40592.0
3,201.0,2013,28237.0,,40779.0
4,201.0,2014,29419.0,,41713.0
...,...,...,...,...,...
5695,3263.0,2020,13288.0,18754.0,19634.0
5696,3263.0,2021,13286.0,18808.0,19752.0
5697,3263.0,2022,13292.0,19142.0,19839.0
5698,3263.0,2023,13275.0,19146.0,19839.0


Now merge the options together to get final table of population by powiat

In [573]:
# option 1
ptot = ptot_option1.copy().set_index(["county_code"])[["county_kts", "county_name", "count"]].rename(columns={"count": "tp1_nc_pop"})

# option 2
ptot_merge = ptot_option2.copy().set_index(["county_code", "year"])[["count"]].rename(columns={"count": "tp2_yr_pop"})
ptot = ptot.join(ptot_merge)

# option 3
ptot_merge = ptot_option3.copy().set_index(["county_code"])[["economically active population", "employed"]].rename(
    columns={"economically active population": "tp3b_nc_active", 
             "employed": "tp3a_nc_employed"}
)
ptot = ptot.join(ptot_merge)

# option 4
ptot_merge = ptot_option4.copy().set_index(["county_code", "year"])[["employed", "active"]].rename(columns={"employed": "tp4a_employed", "active": "tp4b_active"})
ptot = ptot.join(ptot_merge)

# option 5a
ptot_merge = ptot_option5a.copy().set_index(["county_code", "year"])[["employed_18-64"]].rename(columns={"employed_18-64":"tp5a_employed_18-64"})
ptot = ptot.join(ptot_merge)

# option 5b,c
ptot_merge = ptot_option5bc.copy().set_index(["county_code", "year"])[["employed_total","active_total"]].rename(columns={"employed_total":"tp5c_employed", "active_total":"tp5b_active"})
ptot = ptot.join(ptot_merge)

# option 6
ptot_merge = ptot_option6.copy().set_index(["county_code", "year"])[["employed_18-64", "employed", "active"]].rename(
    columns={"employed_18-64": "tp6a_employed_18-64", "employed": "tp6c_employed", "active": "tp6b_active"}
)
ptot = ptot.join(ptot_merge)
ptot = ptot.reset_index()

# Save
ptot.to_csv(repo_root / "cleaned/03_01_outcome_tables/population_powiat.csv")
ptot

Unnamed: 0,county_code,year,county_kts,county_name,tp1_nc_pop,tp2_yr_pop,tp3b_nc_active,tp3a_nc_employed,tp4a_employed,tp4b_active,tp5a_employed_18-64,tp5c_employed,tp5b_active,tp6a_employed_18-64,tp6c_employed,tp6b_active
0,201.0,1995,10030210101000,Powiat bolesławiecki,76739,63763.0,40280.0,38792.0,35835.0,37238.0,,,,,,
1,201.0,1996,10030210101000,Powiat bolesławiecki,76739,64107.0,40280.0,38792.0,36028.0,37438.0,,,,,,
2,201.0,1997,10030210101000,Powiat bolesławiecki,76739,64719.0,40280.0,38792.0,36372.0,37796.0,,,,,,
3,201.0,1998,10030210101000,Powiat bolesławiecki,76739,65261.0,40280.0,38792.0,36677.0,38112.0,,,,,,
4,201.0,1999,10030210101000,Powiat bolesławiecki,76739,63818.0,40280.0,38792.0,35866.0,37270.0,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11395,3263.0,2020,10023216663000,Powiat m. Świnoujście,35988,35382.0,17737.0,16905.0,19036.0,19991.0,18426.0,18823.0,19460.0,13288.0,18754.0,19634.0
11396,3263.0,2021,10023216663000,Powiat m. Świnoujście,35988,34960.0,17737.0,16905.0,18808.0,19752.0,18423.0,18878.0,19578.0,13286.0,18808.0,19752.0
11397,3263.0,2022,10023216663000,Powiat m. Świnoujście,35988,34619.0,17737.0,16905.0,18625.0,19560.0,18431.0,19214.0,19664.0,13292.0,19142.0,19839.0
11398,3263.0,2023,10023216663000,Powiat m. Świnoujście,35988,34317.0,17737.0,16905.0,18463.0,19389.0,18407.0,19218.0,19664.0,13275.0,19146.0,19839.0


Quick analysis of these measures:

In [574]:
ignore_cols = ["county_code", "year", "county_kts", "county_name"]
measure_cols = []
for c in ptot.columns:
    if c not in ignore_cols:
        measure_cols.append(c)

ptot[measure_cols].corr()

Unnamed: 0,tp1_nc_pop,tp2_yr_pop,tp3b_nc_active,tp3a_nc_employed,tp4a_employed,tp4b_active,tp5a_employed_18-64,tp5c_employed,tp5b_active,tp6a_employed_18-64,tp6c_employed,tp6b_active
tp1_nc_pop,1.0,0.92013,0.997617,0.997063,0.922899,0.922901,0.994155,0.997535,0.9958,0.993719,0.996924,0.996705
tp2_yr_pop,0.92013,1.0,0.90964,0.908336,0.997193,0.997842,0.995736,0.997929,0.997395,0.993629,0.997441,0.997211
tp3b_nc_active,0.997617,0.90964,1.0,0.999944,0.917242,0.916667,0.992511,0.997816,0.99369,0.994887,0.998995,0.996235
tp3a_nc_employed,0.997063,0.908336,0.999944,1.0,0.916534,0.915831,0.992059,0.997548,0.99315,0.994799,0.998985,0.995898
tp4a_employed,0.922899,0.997193,0.917242,0.916534,1.0,0.999925,0.996281,0.998154,0.997292,0.997319,0.999626,0.999043
tp4b_active,0.922901,0.997842,0.916667,0.915831,0.999925,1.0,0.99647,0.998362,0.997577,0.997108,0.999576,0.999108
tp5a_employed_18-64,0.994155,0.995736,0.992511,0.992059,0.996281,0.99647,1.0,0.999494,0.999035,0.998242,0.999048,0.99843
tp5c_employed,0.997535,0.997929,0.997816,0.997548,0.998154,0.998362,0.999494,1.0,0.999965,0.997037,0.998506,0.998767
tp5b_active,0.9958,0.997395,0.99369,0.99315,0.997292,0.997577,0.999035,0.999965,1.0,0.9966,0.998486,0.998722
tp6a_employed_18-64,0.993719,0.993629,0.994887,0.994799,0.997319,0.997108,0.998242,0.997037,0.9966,1.0,0.999456,0.9988


****SECTION 2 - POPULATION BY POWIAT SEX-AGE:****

**Option 1 - NC 2021 Sex-age (as in Barcelona):**

P4181 - Population by powiat sex and single age

In [575]:
# Re read p4181 (we dropped sex earlier)
p4181 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_nc_sex_age_p4181.csv", index_col=0)
p4181_backup = p4181.copy()

# drop total sex
p4181 = p4181[~(p4181["sex"]=="total")]
# drop total age
p4181 = p4181[~(p4181["age"]=="total")]

# bins & labels for age grouping
bins   = [0, 25, 35, 45, 55, float("inf")]  
labels = ['under 25 years', '25-34', '35-44', '45-54', '55 and more']

# Convert 90+ to 90 and age column to numeric for grouping
p4181["age"] = pd.to_numeric(p4181["age"].replace("90 and more", 90))
# set minimum age
p4181 = p4181[p4181["age"]>=15]

p4181["age_group"] = pd.cut(
    p4181["age"],
    bins=bins,
    labels=labels,
    right=False,          # left-closed, right-open: [25,35) means 25–34
    include_lowest=True
)

# Now create the population table required
psa_option1 = p4181.groupby(["code", "year", "sex", "age_group"], observed=False)["count"].sum().reset_index()

# Create county_code for merging
psa_option1["county_code"] = psa_option1["code"].apply(lambda x: int(str(x)[:-3]))
psa_option1.drop(columns=["code"], inplace=True)
psa_option1.rename(columns={"count": "psa1_nc_pop"}, inplace=True)

psa_option1

Unnamed: 0,year,sex,age_group,psa1_nc_pop,county_code
0,2021,females,under 25 years,4001,201
1,2021,females,25-34,5751,201
2,2021,females,35-44,7153,201
3,2021,females,45-54,5805,201
4,2021,females,55 and more,16198,201
...,...,...,...,...,...
3795,2021,males,under 25 years,1489,3263
3796,2021,males,25-34,2464,3263
3797,2021,males,35-44,3441,3263
3798,2021,males,45-54,2795,3263


**Option 2 - Yearly population series**

P2137 - population by sex and age group

Bucket these into the RU groups

In [576]:
# Re read (earlier dropped total)
p2137 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_yr_sex_agegr_p2137.csv", index_col=0)

# drop total sex
p2137 = p2137[~(p2137["sex"]=="total")]

# drop unwanted age groups
age_filter = p2137["age_group"].isin(["total", "70 and more", "85 and more", "0-4", "5-9", "10-14", "0-14"])
p2137 = p2137[~age_filter]

# now want to map these to groups
p2137["age"] = p2137["age_group"].apply(lambda x: int(x.split("-")[0]))

# p2137[["age", "age_group"]].value_counts() # uncomment to see these map properly

# using the same bins and labels as above
p2137["new_age_group"] = pd.cut(
    p2137["age"],
    bins=bins,
    labels=labels,
    right=False,          # left-closed, right-open: [25,35) means 25–34
    include_lowest=True
)

# p2137[["new_age_group", "age", "age_group"]].value_counts() # uncomment to see these map properly

# Convert to required population table form
psa_option2 = p2137.groupby(["code", "year", "sex", "new_age_group"], observed=False)["count"].sum().reset_index()

# rename column
psa_option2 = psa_option2.rename(columns={"new_age_group": "age_group"})

# create county_code col for merging
psa_option2["county_code"] = psa_option2["code"].apply(lambda x: int(str(x)[:-3]))
# As with earlier convert 263 -> 265 and drop 1431
psa_option2["count"] = psa_option2["count"].replace(0, np.nan)
psa_option2.dropna(subset=["count"], inplace=True)
psa_option2["county_code"] = psa_option2["county_code"].replace(263, 265)
psa_option2 = psa_option2[~(psa_option2["county_code"]==1431)].copy()

# Drop original code and rename cols
psa_option2.drop(columns=["code"], inplace=True)
psa_option2.rename(columns={"count": "psa2_yr_pop"}, inplace=True)

# psa_option2[(psa_option2["county_code"]==265) & (psa_option2["year"]==2021)]
psa_option2

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code
0,1995,females,under 25 years,6879.0,201
1,1995,females,25-34,5740.0,201
2,1995,females,35-44,7825.0,201
3,1995,females,45-54,5051.0,201
4,1995,females,55 and more,6775.0,201
...,...,...,...,...,...
114595,2024,males,under 25 years,1556.0,3263
114596,2024,males,25-34,1898.0,3263
114597,2024,males,35-44,3070.0,3263
114598,2024,males,45-54,3132.0,3263


**Option 3a - Yearly Population x NC 2021 powait sex age employment rate:**

In [577]:
p4407 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/emp_nc_sex_age_p4407.csv", index_col=0)

# remove sex total
p4407 = p4407[~(p4407["sex"]=="total")]
p4407

Unnamed: 0,code,powiat,year,sex,age,count
2660,201000,Powiat bolesławiecki,2021,males,total,20869
2661,202000,Powiat dzierżoniowski,2021,males,total,21238
2662,203000,Powiat głogowski,2021,males,total,19946
2663,204000,Powiat górowski,2021,males,total,7834
2664,205000,Powiat jaworski,2021,males,total,11289
...,...,...,...,...,...,...
7975,3217000,Powiat wałecki,2021,females,65 and more,287
7976,3218000,Powiat łobeski,2021,females,65 and more,184
7977,3261000,City with powiat status Koszalin,2021,females,65 and more,1281
7978,3262000,City with powiat status Szczecin,2021,females,65 and more,4510


In [578]:
p4407.age.value_counts()

age
total          760
15-24          760
25-34          760
35-44          760
45-54          760
55-64          760
65 and more    760
Name: count, dtype: int64

In [579]:
# remove age total and then bin by unemployment bins
p4407 = p4407[~(p4407["age"]=="total")].copy()

# Convert age column to start age - then bin as before
p4407["age"] = p4407["age"].replace("65 and more", "65")
p4407["age_start"] = p4407["age"].apply(lambda x: int(x.split("-")[0]))

# using the same bins and labels as above
p4407["age_group"] = pd.cut(
    p4407["age_start"],
    bins=bins,
    labels=labels,
    right=False,          # left-closed, right-open: [25,35) means 25–34
    include_lowest=True
)

# p4407[["age", "age_start", "age_group"]].value_counts() # uncomment to see these map properly

# Create county_code
p4407["county_code"] = p4407["code"].apply(lambda x: int(str(x)[:-3]))

# Convert to required output format
psa_option3_temp = p4407.groupby(["county_code", "year", "sex", "age_group"], observed=False)["count"].sum().reset_index()
# psa_option3.county_code.value_counts()
psa_option3_temp.rename(columns={"count":"employed"}, inplace=True)

psa_option3_temp

Unnamed: 0,county_code,year,sex,age_group,employed
0,201,2021,females,under 25 years,1431
1,201,2021,females,25-34,4012
2,201,2021,females,35-44,4946
3,201,2021,females,45-54,4252
4,201,2021,females,55 and more,3282
...,...,...,...,...,...
3795,3263,2021,males,under 25 years,460
3796,3263,2021,males,25-34,1737
3797,3263,2021,males,35-44,2267
3798,3263,2021,males,45-54,1841


So we now have powiat year sex age employed count - what we want to do with this? Construct 2021 employmnent rate for powiat by dividing by option 1 (population 15+) - this is psa1. Then we will multiply this rate on psa2 (the yearly population series).  

In [580]:
psa_option3_temp = psa_option3_temp.merge(
    psa_option2[psa_option2["year"]==2021],
    how="left",
    on=["county_code", "year", "age_group", "sex"]
)

psa_option3_temp["po_emp_rate"] = psa_option3_temp["employed"].div(psa_option3_temp["psa2_yr_pop"])

psa_option3_temp

Unnamed: 0,county_code,year,sex,age_group,employed,psa2_yr_pop,po_emp_rate
0,201,2021,females,under 25 years,1431,4051.0,0.353246
1,201,2021,females,25-34,4012,5577.0,0.719383
2,201,2021,females,35-44,4946,7133.0,0.693397
3,201,2021,females,45-54,4252,5967.0,0.712586
4,201,2021,females,55 and more,3282,14914.0,0.220062
...,...,...,...,...,...,...,...
3795,3263,2021,males,under 25 years,460,1482.0,0.310391
3796,3263,2021,males,25-34,1737,2363.0,0.735083
3797,3263,2021,males,35-44,2267,3394.0,0.667943
3798,3263,2021,males,45-54,1841,2859.0,0.643931


In [581]:
# Now we merge back onto yearly population (psa2) to create psa3
psa_option3 = psa_option2.merge(
    psa_option3_temp[["county_code", "sex", "age_group", "po_emp_rate"]],
    how="left",
    on=["county_code", "sex", "age_group"]
)

In [582]:
# Now create psa_option 3
psa_option3["psa3_employed"] = psa_option3["psa2_yr_pop"].mul(psa_option3["po_emp_rate"]).round()
psa_option3

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code,po_emp_rate,psa3_employed
0,1995,females,under 25 years,6879.0,201,0.353246,2430.0
1,1995,females,25-34,5740.0,201,0.719383,4129.0
2,1995,females,35-44,7825.0,201,0.693397,5426.0
3,1995,females,45-54,5051.0,201,0.712586,3599.0
4,1995,females,55 and more,6775.0,201,0.220062,1491.0
...,...,...,...,...,...,...,...
113335,2024,males,under 25 years,1556.0,3263,0.310391,483.0
113336,2024,males,25-34,1898.0,3263,0.735083,1395.0
113337,2024,males,35-44,3070.0,3263,0.667943,2051.0
113338,2024,males,45-54,3132.0,3263,0.643931,2017.0


In [583]:
# Select only relevant columns
psa_option3 = psa_option3[["county_code", "year", "sex", "age_group", "psa3_employed"]]
psa_option3

Unnamed: 0,county_code,year,sex,age_group,psa3_employed
0,201,1995,females,under 25 years,2430.0
1,201,1995,females,25-34,4129.0
2,201,1995,females,35-44,5426.0
3,201,1995,females,45-54,3599.0
4,201,1995,females,55 and more,1491.0
...,...,...,...,...,...
113335,3263,2024,males,under 25 years,483.0
113336,3263,2024,males,25-34,1395.0
113337,3263,2024,males,35-44,2051.0
113338,3263,2024,males,45-54,2017.0


**Option 3b - Scaled to match powiat-sex economically active in NC 2021**

Uses p4292

In [584]:
# Construct relevant dataset from p4292
p4292_fil = p4292[~(p4292["sex"]=="total")][["code", "sex", "economically active population"]]
p4292_fil.rename(columns={"economically active population": "active_pop"}, inplace=True)
p4292_fil["county_code"] = p4292["code"].apply(lambda x: int(str(x)[:-3]))
p4292_fil.drop(columns=["code"], inplace=True)

# Take psa_option3 and create 2021 only totals by sex
psa_option3_temp2 = psa_option3[psa_option3["year"]==2021].groupby(["county_code", "sex"])["psa3_employed"].sum().reset_index()

# We want to get a scaling factor such these equal
psa_option3_temp2 = psa_option3_temp2.merge(
    p4292_fil,
    how="left",
    on=["county_code", "sex"]
)

psa_option3_temp2["scale"] = psa_option3_temp2["active_pop"].div(psa_option3_temp2["psa3_employed"])
psa_option3_temp2

Unnamed: 0,county_code,sex,psa3_employed,active_pop,scale
0,201,females,17923.0,18598.0,1.037661
1,201,males,20869.0,21682.0,1.038957
2,202,females,18419.0,19229.0,1.043976
3,202,males,21238.0,22324.0,1.051135
4,203,females,16263.0,17168.0,1.055648
...,...,...,...,...,...
755,3261,males,23793.0,24995.0,1.050519
756,3262,females,88122.0,91046.0,1.033181
757,3262,males,91486.0,95621.0,1.045198
758,3263,females,8232.0,8556.0,1.039359


In [585]:
# Now we merge this onto psa_option3 and scale for psa_option 3b
psa_option3 = psa_option3.merge(
    psa_option3_temp2[["county_code", "sex", "scale"]],
    how="left",
    on=["county_code", "sex"]   
)

# Create active proxy
psa_option3["psa3b_active"] = psa_option3["psa3_employed"].mul(psa_option3["scale"]).round()
# Drop scale col and rename first to 3a
psa_option3.rename(columns={"psa3_employed": "psa3a_employed"}, inplace=True)
psa_option3.drop(columns=["scale"],inplace=True)
psa_option3

Unnamed: 0,county_code,year,sex,age_group,psa3a_employed,psa3b_active
0,201,1995,females,under 25 years,2430.0,2522.0
1,201,1995,females,25-34,4129.0,4285.0
2,201,1995,females,35-44,5426.0,5630.0
3,201,1995,females,45-54,3599.0,3735.0
4,201,1995,females,55 and more,1491.0,1547.0
...,...,...,...,...,...,...
113335,3263,2024,males,under 25 years,483.0,511.0
113336,3263,2024,males,25-34,1395.0,1477.0
113337,3263,2024,males,35-44,2051.0,2171.0
113338,3263,2024,males,45-54,2017.0,2135.0


**Option 4 - Vo rate (yearly) sex age x population yearly sex age:**

P2137 - Po population yearly. 

P4437 - Vo Activity rate by age and sex yearly -> (b)

P4112 - Vo Employment rate by age and sex yearly -> (a)

First (b)

In [586]:
# First need to build base of option 4 - this will be p2137 (option 2) merged with voi (from county_code)
psa_option4 = psa_option2.merge(
    county_codes,
    how="left",
    on="county_code"
)
psa_option4["voivodeship_code"] = psa_option4["county_kts"].apply(lambda x: int(str(x)[4:6]))

psa_option4

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code,county_kts,county_name,voivodeship_code
0,1995,females,under 25 years,6879.0,201,10030210101000,Powiat bolesławiecki,2
1,1995,females,25-34,5740.0,201,10030210101000,Powiat bolesławiecki,2
2,1995,females,35-44,7825.0,201,10030210101000,Powiat bolesławiecki,2
3,1995,females,45-54,5051.0,201,10030210101000,Powiat bolesławiecki,2
4,1995,females,55 and more,6775.0,201,10030210101000,Powiat bolesławiecki,2
...,...,...,...,...,...,...,...,...
113335,2024,males,under 25 years,1556.0,3263,10023216663000,Powiat m. Świnoujście,32
113336,2024,males,25-34,1898.0,3263,10023216663000,Powiat m. Świnoujście,32
113337,2024,males,35-44,3070.0,3263,10023216663000,Powiat m. Świnoujście,32
113338,2024,males,45-54,3132.0,3263,10023216663000,Powiat m. Świnoujście,32


In [587]:
p4437 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/lfs_vo_activity_rate_sex_age_p4437.csv", index_col=0)

p4437 = p4437[~(p4437["sex"]=="total")]
p4437["voivodeship_code"] = p4437["code"].apply(lambda x: int(str(x)[:-5]))

p4437.age.value_counts()

# Match 15-24 to under 25 years
# Match 20-64 to 25-34 through 45-54
# Macth 50-89 to 55 and more

age
total       192
15-24       192
15-64       192
20-24       192
20-64       192
55-64       192
18-59/64    192
50-89       192
Name: count, dtype: int64

In [588]:
p4437

Unnamed: 0,code,voivodeship,year,sex,age,rate,voivodeship_code
96,200000,DOLNOŚLĄSKIE,2019,males,total,66.3,2
97,400000,KUJAWSKO-POMORSKIE,2019,males,total,64.8,4
98,600000,LUBELSKIE,2019,males,total,61.3,6
99,800000,LUBUSKIE,2019,males,total,64.6,8
100,1000000,ŁÓDZKIE,2019,males,total,64.2,10
...,...,...,...,...,...,...,...
2299,2400000,ŚLĄSKIE,2024,females,50-89,27.4,24
2300,2600000,ŚWIĘTOKRZYSKIE,2024,females,50-89,29.5,26
2301,2800000,WARMIŃSKO-MAZURSKIE,2024,females,50-89,29.3,28
2302,3000000,WIELKOPOLSKIE,2024,females,50-89,30.2,30


In [589]:
# Set up matching groups and merge
p4437["matching_group"] = np.nan
p4437.loc[(p4437["age"]=="15-24"), "matching_group"] = 0
p4437.loc[(p4437["age"]=="20-64"), "matching_group"] = 1
p4437.loc[(p4437["age"]=="50-89"), "matching_group"] = 2

psa_option4["matching_group"] = np.nan
psa_option4.loc[(psa_option4["age_group"]=="under 25 years"), "matching_group"] = 0
psa_option4.loc[
    (psa_option4["age_group"].isin(["25-34", "35-44", "45-54"])), "matching_group"
] = 1
psa_option4.loc[(psa_option4["age_group"]=="55 and more"), "matching_group"] = 2

psa_option4 = psa_option4.merge(
    p4437[["voivodeship_code", "year", "sex", "matching_group", "rate"]],
    how="left",
    on=["voivodeship_code", "year", "sex", "matching_group"]
)
psa_option4

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code,county_kts,county_name,voivodeship_code,matching_group,rate
0,1995,females,under 25 years,6879.0,201,10030210101000,Powiat bolesławiecki,2,0.0,
1,1995,females,25-34,5740.0,201,10030210101000,Powiat bolesławiecki,2,1.0,
2,1995,females,35-44,7825.0,201,10030210101000,Powiat bolesławiecki,2,1.0,
3,1995,females,45-54,5051.0,201,10030210101000,Powiat bolesławiecki,2,1.0,
4,1995,females,55 and more,6775.0,201,10030210101000,Powiat bolesławiecki,2,2.0,
...,...,...,...,...,...,...,...,...,...,...
113335,2024,males,under 25 years,1556.0,3263,10023216663000,Powiat m. Świnoujście,32,0.0,30.7
113336,2024,males,25-34,1898.0,3263,10023216663000,Powiat m. Świnoujście,32,1.0,86.7
113337,2024,males,35-44,3070.0,3263,10023216663000,Powiat m. Świnoujście,32,1.0,86.7
113338,2024,males,45-54,3132.0,3263,10023216663000,Powiat m. Świnoujście,32,1.0,86.7


In [590]:
# Now construct the measure - for p2137 x p4437 (vo sex age activity rate)
psa_option4["psa4b_active"] = psa_option4["psa2_yr_pop"].mul(psa_option4["rate"]).div(100).round()

# Drop some working columns
psa_option4.drop(columns=["county_kts", "county_name"], inplace=True)

psa_option4.rename(columns={"rate": "vo_activity_rate"}, inplace=True)

psa_option4

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code,voivodeship_code,matching_group,vo_activity_rate,psa4b_active
0,1995,females,under 25 years,6879.0,201,2,0.0,,
1,1995,females,25-34,5740.0,201,2,1.0,,
2,1995,females,35-44,7825.0,201,2,1.0,,
3,1995,females,45-54,5051.0,201,2,1.0,,
4,1995,females,55 and more,6775.0,201,2,2.0,,
...,...,...,...,...,...,...,...,...,...
113335,2024,males,under 25 years,1556.0,3263,32,0.0,30.7,478.0
113336,2024,males,25-34,1898.0,3263,32,1.0,86.7,1646.0
113337,2024,males,35-44,3070.0,3263,32,1.0,86.7,2662.0
113338,2024,males,45-54,3132.0,3263,32,1.0,86.7,2715.0


Now for (a):

P4112 - employ rate by age sex

In [591]:
p4112 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/lfs_vo_employ_rate_age_sex_p4112.csv", index_col=0)

p4112 = p4112[~(p4112["sex"]=="total")]
p4112["voivodeship_code"] = p4112["code"].apply(lambda x: int(str(x)[:-5]))

p4112.groupby("age")["rate"].count()

# Match 15-24 to under 25 years
# Match 25-54 to 25-34 through 45-54
# Macth 50-89 to 55 and more

age
15-24       480
15-29       480
15-64       480
18-59/64    192
20-24       192
20-64       480
25-54       480
30-39       480
40-49       480
50-89       480
55-64       480
total       192
Name: rate, dtype: int64

In [592]:
p4112

Unnamed: 0,code,voivodeship,year,sex,age,rate,voivodeship_code
240,200000,DOLNOŚLĄSKIE,2010,males,total,,2
241,400000,KUJAWSKO-POMORSKIE,2010,males,total,,4
242,600000,LUBELSKIE,2010,males,total,,6
243,800000,LUBUSKIE,2010,males,total,,8
244,1000000,ŁÓDZKIE,2010,males,total,,10
...,...,...,...,...,...,...,...
8635,2400000,ŚLĄSKIE,2024,females,18-59/64,74.3,24
8636,2600000,ŚWIĘTOKRZYSKIE,2024,females,18-59/64,74.5,26
8637,2800000,WARMIŃSKO-MAZURSKIE,2024,females,18-59/64,73.9,28
8638,3000000,WIELKOPOLSKIE,2024,females,18-59/64,75.5,30


In [593]:
# Set up matching groups and merge
p4112["matching_group"] = np.nan
p4112.loc[(p4112["age"]=="15-24"), "matching_group"] = 0
p4112.loc[(p4112["age"]=="25-54"), "matching_group"] = 1
p4112.loc[(p4112["age"]=="50-89"), "matching_group"] = 2

# psa_option4["matching_group"] = np.nan
# psa_option4.loc[(psa_option4["age_group"]=="under 25 years"), "matching_group"] = 0
# psa_option4.loc[
#     (psa_option4["age_group"].isin(["25-34", "35-44", "45-54"])), "matching_group"
# ] = 1
# psa_option4.loc[(psa_option4["age_group"]=="55 and more"), "matching_group"] = 2

psa_option4 = psa_option4.merge(
    p4112[["voivodeship_code", "year", "sex", "matching_group", "rate"]],
    how="left",
    on=["voivodeship_code", "year", "sex", "matching_group"]
)
psa_option4

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code,voivodeship_code,matching_group,vo_activity_rate,psa4b_active,rate
0,1995,females,under 25 years,6879.0,201,2,0.0,,,
1,1995,females,25-34,5740.0,201,2,1.0,,,
2,1995,females,35-44,7825.0,201,2,1.0,,,
3,1995,females,45-54,5051.0,201,2,1.0,,,
4,1995,females,55 and more,6775.0,201,2,2.0,,,
...,...,...,...,...,...,...,...,...,...,...
113335,2024,males,under 25 years,1556.0,3263,32,0.0,30.7,478.0,29.3
113336,2024,males,25-34,1898.0,3263,32,1.0,86.7,1646.0,91.3
113337,2024,males,35-44,3070.0,3263,32,1.0,86.7,2662.0,91.3
113338,2024,males,45-54,3132.0,3263,32,1.0,86.7,2715.0,91.3


In [594]:
# Now construct the measure - for p2137 x p4437 (vo sex age activity rate)
psa_option4["psa4a_employed"] = psa_option4["psa2_yr_pop"].mul(psa_option4["rate"]).div(100).round()

psa_option4.rename(columns={"rate": "vo_employ_rate"}, inplace=True)

psa_option4

Unnamed: 0,year,sex,age_group,psa2_yr_pop,county_code,voivodeship_code,matching_group,vo_activity_rate,psa4b_active,vo_employ_rate,psa4a_employed
0,1995,females,under 25 years,6879.0,201,2,0.0,,,,
1,1995,females,25-34,5740.0,201,2,1.0,,,,
2,1995,females,35-44,7825.0,201,2,1.0,,,,
3,1995,females,45-54,5051.0,201,2,1.0,,,,
4,1995,females,55 and more,6775.0,201,2,2.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...
113335,2024,males,under 25 years,1556.0,3263,32,0.0,30.7,478.0,29.3,456.0
113336,2024,males,25-34,1898.0,3263,32,1.0,86.7,1646.0,91.3,1733.0
113337,2024,males,35-44,3070.0,3263,32,1.0,86.7,2662.0,91.3,2803.0
113338,2024,males,45-54,3132.0,3263,32,1.0,86.7,2715.0,91.3,2860.0


**Construct final table for population_sex_age by powiat**

In [595]:
psa = psa_option1.set_index(["county_code", "year", "sex", "age_group"]).copy()

psa = psa.join(psa_option2.set_index(["county_code", "year", "sex", "age_group"]).copy(), how="outer")

psa = psa.join(psa_option3.set_index(["county_code", "year", "sex", "age_group"]).copy(), how="outer")

psa = psa.join(psa_option4.set_index(["county_code", "year", "sex", "age_group"])[["psa4a_employed", "psa4b_active"]].copy(), how="outer")

psa = psa.join(county_codes.set_index("county_code"))

psa = psa.reset_index()

psa.to_csv(repo_root / "cleaned/03_01_outcome_tables/population_powiat_sex_age.csv")
psa

Unnamed: 0,county_code,year,sex,age_group,psa1_nc_pop,psa2_yr_pop,psa3a_employed,psa3b_active,psa4a_employed,psa4b_active,county_kts,county_name
0,201,1995,females,under 25 years,,6879.0,2430.0,2522.0,,,10030210101000,Powiat bolesławiecki
1,201,1995,females,25-34,,5740.0,4129.0,4285.0,,,10030210101000,Powiat bolesławiecki
2,201,1995,females,35-44,,7825.0,5426.0,5630.0,,,10030210101000,Powiat bolesławiecki
3,201,1995,females,45-54,,5051.0,3599.0,3735.0,,,10030210101000,Powiat bolesławiecki
4,201,1995,females,55 and more,,6775.0,1491.0,1547.0,,,10030210101000,Powiat bolesławiecki
...,...,...,...,...,...,...,...,...,...,...,...,...
113335,3263,2024,males,under 25 years,,1556.0,483.0,511.0,456.0,478.0,10023216663000,Powiat m. Świnoujście
113336,3263,2024,males,25-34,,1898.0,1395.0,1477.0,1733.0,1646.0,10023216663000,Powiat m. Świnoujście
113337,3263,2024,males,35-44,,3070.0,2051.0,2171.0,2803.0,2662.0,10023216663000,Powiat m. Świnoujście
113338,3263,2024,males,45-54,,3132.0,2017.0,2135.0,2860.0,2715.0,10023216663000,Powiat m. Świnoujście


In [596]:
psa_temp_1 = set(psa[psa["year"]==2012]["county_code"].unique())
psa_temp_2 = set(psa[psa["year"]==2013]["county_code"].unique())
len(psa_temp_1)
len(psa_temp_2)
psa_temp_2.difference(psa_temp_1)

{np.int64(265)}