Notebook in 4 sections:

(1) Import county codes

(2) Population by powiat:
- Aim: person hours worked (not possible). Next best is employed people by powiat (as this is the denominator for the average wage measure - technically should be employed people ate firms greater than size 10 (**I think - check wage measure**) - but this can be mentioned in notes).
- Various measures explored as listed in excel sheet (**will upload**)

(3) Population by sex-age by powiat

(4) Population by sex-education by powiat

In [40]:
import sys
from pathlib import Path

p = Path.cwd().resolve()
repo_root = next((parent for parent in [p] + list(p.parents) if (parent / ".git").exists()), None)
if repo_root is None:
    raise RuntimeError("Repo root not found. Open the repo folder in VS Code.")

sys.path.insert(0, str(repo_root))
print("Repo root:", repo_root)

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

Repo root: C:\Users\harri\OneDrive - Imperial College London\Year 3 Group Project\Group_Project_Y3


****SECTION 1 - IMPORT COUNTY CODES****

In [41]:
# Get the county codes table
county_codes = pd.read_csv(repo_root / "cleaned/00_codes/county_codes.csv")
print(county_codes.shape)
county_codes.head()

(380, 3)


Unnamed: 0,county_code,county_kts,county_name
0,201,10030210101000,Powiat bolesławiecki
1,202,10030210302000,Powiat dzierżoniowski
2,203,10030210203000,Powiat głogowski
3,204,10030210204000,Powiat górowski
4,205,10030210105000,Powiat jaworski


****SECTION 2 - POPULATION BY POWIAT****

**Option 1 - NC 2021 (as in Barcelona)**

Uses P4315 / P4181 (both give the same 13+ population) to give 2021 census population for powiat. They then treat population as constant and as denominator for unemployment rate.

In [42]:
p4181 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_nc_sex_age_p4181.csv", index_col=0)
p4181.head()

p4315 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_nc_sex_ed_p4315.csv", index_col=0)
p4315.head()

# Filter to ages 13 upwards and total for sex
age_filter = ['total', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
p4181 = p4181[
    ~(p4181["age"].isin(age_filter)) & (p4181["sex"]=="total")
]
# p4181

# Filter to just total education and sex
p4315 = p4315[
    (p4315["education"]=="total") & (p4315["sex"]=="total")
]
# p4315

# Check they give the same powiat totals
p4181_total = p4181.groupby("code")["count"].sum()
p4315_total = p4315.groupby("code")["count"].sum()

(p4181_total == p4315_total).sum()

np.int64(380)

In [43]:
# They are consistent so will go with the first
ptot_option1 = pd.DataFrame(p4181_total).reset_index()

ptot_option1["merge_code"] = ptot_option1["code"].apply(lambda x: int(str(x)[:-3]))

ptot_option1 = ptot_option1.merge(
    county_codes,
    how="left",
    left_on="merge_code",
    right_on="county_code"
)

ptot_option1

Unnamed: 0,code,count,merge_code,county_code,county_kts,county_name
0,201000,76739,201,201,10030210101000,Powiat bolesławiecki
1,202000,86543,202,202,10030210302000,Powiat dzierżoniowski
2,203000,75191,203,203,10030210203000,Powiat głogowski
3,204000,29042,204,204,10030210204000,Powiat górowski
4,205000,42493,205,205,10030210105000,Powiat jaworski
...,...,...,...,...,...,...
375,3217000,44491,3217,3217,10023216417000,Powiat wałecki
376,3218000,30007,3218,3218,10023216418000,Powiat łobeski
377,3261000,93083,3261,3261,10023216361000,Powiat m. Koszalin
378,3262000,349790,3262,3262,10023216562000,Powiat m. Szczecin


**Option 2 - Yearly powiat population:**

Uses p2137 - yearly population by sex and age group by powiat. We will filter to 15+

In [44]:
p2137 = pd.read_csv(repo_root / "cleaned/03_01_outcome_data/pop_yr_sex_agegr_p2137.csv", index_col=0)

p2137

Unnamed: 0,code,powiat,year,sex,age_group,count
0,201000,Powiat bolesławiecki,1995,total,total,89407.0
1,202000,Powiat dzierżoniowski,1995,total,total,113810.0
2,203000,Powiat głogowski,1995,total,total,91373.0
3,204000,Powiat górowski,1995,total,total,37826.0
4,205000,Powiat jaworski,1995,total,total,54914.0
...,...,...,...,...,...,...
721975,3217000,Powiat wałecki,2024,females,0-14,3279.0
721976,3218000,Powiat łobeski,2024,females,0-14,2178.0
721977,3261000,City with powiat status Koszalin,2024,females,0-14,6713.0
721978,3262000,City with powiat status Szczecin,2024,females,0-14,24100.0


In [45]:
print(p2137.age_group.unique())
print(p2137.sex.unique())

# Filter to just relevant (total sex and age group 15+)
age_filter = ['total', '0-4', '5-9', '10-14', '0-14']
p2137 = p2137[
    ~(p2137["age_group"].isin(age_filter)) & (p2137["sex"]=="total")
].copy()
p2137

['total' '0-4' '5-9' '10-14' '15-19' '20-24' '25-29' '30-34' '35-39'
 '40-44' '45-49' '50-54' '55-59' '60-64' '65-69' '70 and more' '70-74'
 '75-79' '80-84' '85 and more' '0-14']
['total' 'males' 'females']


Unnamed: 0,code,powiat,year,sex,age_group,count
137520,201000,Powiat bolesławiecki,1995,total,15-19,7577.0
137521,202000,Powiat dzierżoniowski,1995,total,15-19,9301.0
137522,203000,Powiat głogowski,1995,total,15-19,9744.0
137523,204000,Powiat górowski,1995,total,15-19,3236.0
137524,205000,Powiat jaworski,1995,total,15-19,4761.0
...,...,...,...,...,...,...
664675,3217000,Powiat wałecki,2024,total,85 and more,846.0
664676,3218000,Powiat łobeski,2024,total,85 and more,673.0
664677,3261000,City with powiat status Koszalin,2024,total,85 and more,3031.0
664678,3262000,City with powiat status Szczecin,2024,total,85 and more,10557.0


In [46]:
p2137.age_group.unique()

array(['15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45-49',
       '50-54', '55-59', '60-64', '65-69', '70 and more', '70-74',
       '75-79', '80-84', '85 and more'], dtype=object)

In [47]:
p2137["merge_code"] = p2137["code"].apply(lambda x: int(str(x)[:-3]))

p2137 = p2137.merge(
    county_codes,
    how="left",
    left_on="merge_code",
    right_on="county_code"
)

p2137

Unnamed: 0,code,powiat,year,sex,age_group,count,merge_code,county_code,county_kts,county_name
0,201000,Powiat bolesławiecki,1995,total,15-19,7577.0,201,201.0,1.003021e+13,Powiat bolesławiecki
1,202000,Powiat dzierżoniowski,1995,total,15-19,9301.0,202,202.0,1.003021e+13,Powiat dzierżoniowski
2,203000,Powiat głogowski,1995,total,15-19,9744.0,203,203.0,1.003021e+13,Powiat głogowski
3,204000,Powiat górowski,1995,total,15-19,3236.0,204,204.0,1.003021e+13,Powiat górowski
4,205000,Powiat jaworski,1995,total,15-19,4761.0,205,205.0,1.003021e+13,Powiat jaworski
...,...,...,...,...,...,...,...,...,...,...
183355,3217000,Powiat wałecki,2024,total,85 and more,846.0,3217,3217.0,1.002322e+13,Powiat wałecki
183356,3218000,Powiat łobeski,2024,total,85 and more,673.0,3218,3218.0,1.002322e+13,Powiat łobeski
183357,3261000,City with powiat status Koszalin,2024,total,85 and more,3031.0,3261,3261.0,1.002322e+13,Powiat m. Koszalin
183358,3262000,City with powiat status Szczecin,2024,total,85 and more,10557.0,3262,3262.0,1.002322e+13,Powiat m. Szczecin


In [48]:
p2137[p2137["county_code"].isna()]

Unnamed: 0,code,powiat,year,sex,age_group,count,merge_code,county_code,county_kts,county_name
28,263000,City with powiat status Wałbrzych to 2002,1995,total,15-19,11449.0,263,,,
168,1431000,Powiat warszawski,1995,total,15-19,117361.0,1431,,,
410,263000,City with powiat status Wałbrzych to 2002,1996,total,15-19,11621.0,263,,,
550,1431000,Powiat warszawski,1996,total,15-19,116256.0,1431,,,
792,263000,City with powiat status Wałbrzych to 2002,1997,total,15-19,11807.0,263,,,
...,...,...,...,...,...,...,...,...,...,...
182382,1431000,Powiat warszawski,2022,total,85 and more,,1431,,,
182624,263000,City with powiat status Wałbrzych to 2002,2023,total,85 and more,,263,,,
182764,1431000,Powiat warszawski,2023,total,85 and more,,1431,,,
183006,263000,City with powiat status Wałbrzych to 2002,2024,total,85 and more,,263,,,


In [49]:
p2137[p2137["merge_code"]==1431].head(10)

Unnamed: 0,code,powiat,year,sex,age_group,count,merge_code,county_code,county_kts,county_name
168,1431000,Powiat warszawski,1995,total,15-19,117361.0,1431,,,
550,1431000,Powiat warszawski,1996,total,15-19,116256.0,1431,,,
932,1431000,Powiat warszawski,1997,total,15-19,116149.0,1431,,,
1314,1431000,Powiat warszawski,1998,total,15-19,117247.0,1431,,,
1696,1431000,Powiat warszawski,1999,total,15-19,128590.0,1431,,,
2078,1431000,Powiat warszawski,2000,total,15-19,121749.0,1431,,,
2460,1431000,Powiat warszawski,2001,total,15-19,114527.0,1431,,,
2842,1431000,Powiat warszawski,2002,total,15-19,,1431,,,
3224,1431000,Powiat warszawski,2003,total,15-19,,1431,,,
3606,1431000,Powiat warszawski,2004,total,15-19,,1431,,,


In [50]:
missing = p2137["merge_code"]==263 # as before
p2137.loc[missing, "county_code"] = 265
p2137.loc[missing, "county_kts"] = 10030210365000
p2137.loc[missing, "county_name"] = "Powiat m. Wałbrzych"

# 1431 only exists up to 2001 - we will just drop
p2137 = p2137.dropna(subset=["county_code"])

In [51]:
ptot_option2 = pd.DataFrame(p2137.groupby(["county_code", "year"])["count"].sum()).reset_index()
ptot_option2

Unnamed: 0,county_code,year,count
0,201.0,1995,68528.0
1,201.0,1996,69108.0
2,201.0,1997,69943.0
3,201.0,1998,70777.0
4,201.0,1999,69443.0
...,...,...,...
11395,3263.0,2020,41252.0
11396,3263.0,2021,41035.0
11397,3263.0,2022,40940.0
11398,3263.0,2023,40897.0


**Option 3 - NC 2021 (a) Employed, (b) Economically Active:**

From P4292 - economic activity of the population aged 15 years and more by sex by powiat

In [52]:
p4292 = pd.read_csv(repo_root / r"cleaned\03_01_outcome_data\nc_activity_table_p4292.csv", index_col=0)

# Filter to just total
p4292_total = p4292[p4292["sex"]=="total"].copy()

# Merge with county codes
p4292_total["merge_code"] = p4292_total["code"].apply(lambda x: int(str(x)[:-3]))
p4292_total = p4292_total.merge(
    county_codes,
    how="left",
    left_on="merge_code",
    right_on="county_code"
)

# Take colummns for the option
ptot_option3 = p4292_total[["county_code", "economically active population", "employed"]]

ptot_option3

Unnamed: 0,county_code,economically active population,employed
0,201,40280.0,38792.0
1,202,41553.0,39657.0
2,203,37825.0,36209.0
3,204,14860.0,13765.0
4,205,21874.0,20715.0
...,...,...,...
375,3217,22675.0,21486.0
376,3218,14465.0,13463.0
377,3261,48337.0,46160.0
378,3262,186667.0,179608.0


**Option 4 - 2021 NC Rate x Yearly Population Series: (a - employment rate, b - activity rate):**

From P4292 (activity table) and P2137 (yearly population) as above

In [53]:
# Select p4292 rate columns
p4292_total_rates = p4292_total[["county_code", "employment rate", "activity rate"]]

# Merge with p2137 measure of yearly population
ptot_option4 = ptot_option2.merge(
    p4292_total_rates,
    how="left",
    on="county_code"
)

# Construct columns wanted
ptot_option4["employed"] = ptot_option4["count"].mul(ptot_option4["employment rate"]).div(100).round()
ptot_option4["active"] = ptot_option4["count"].mul(ptot_option4["activity rate"]).div(100).round()

ptot_option4 = ptot_option4[["county_code", "year", "employed", "active"]]

ptot_option4

Unnamed: 0,county_code,year,employed,active
0,201.0,1995,38513.0,40020.0
1,201.0,1996,38839.0,40359.0
2,201.0,1997,39308.0,40847.0
3,201.0,1998,39777.0,41334.0
4,201.0,1999,39027.0,40555.0
...,...,...,...,...
11395,3263.0,2020,22194.0,23307.0
11396,3263.0,2021,22077.0,23185.0
11397,3263.0,2022,22026.0,23131.0
11398,3263.0,2023,22003.0,23107.0


**Option 5 - Vo Rates x Yearly Population (a - employment, b - activity)**

P4113 - Employment Rate by vo

P4108 - Activity Rate by vo

P2317 - Population in powiat