<h1>Retired People and “Retired Places”: Italy, Data Preprocessing<span class="tocSkip"></span></h1>
<div class="toc">
  <ol class="toc-item">
    <li>
      <span>
        <a href="#Loading-input-datasets" data-toc-modified-id="Loading-input-datasets-1">
          <span class="toc-item-num"></span>Loading input datasets
        </a>
      </span>
    </li>
    <li>
      <span>
        <a href="#Processing-housing-data" data-toc-modified-id="Processing-housing-data-2">
          <span class="toc-item-num"></span>Processing housing data
        </a>
      </span>
    </li>
    <li>
      <span>
        <a href="#Processing-population-data" data-toc-modified-id="Processing-population-data-3">
          <span class="toc-item-num"></span>Processing population data
        </a>
      </span>
    </li>
    <li>
      <span>
        <a href="#Merging-into-final-dataset" data-toc-modified-id="Merging-into-final-dataset-4">
          <span class="toc-item-num"></span>Merging into final dataset
        </a>
      </span>
    </li>
  </ol>
</div>



## Loading input datasets <a id="Loading-input-datasets"></a>



In [27]:
import pandas as pd
from pathlib import Path

print('Libraries are downloaded')

Libraries are downloaded


In [28]:
PROJECT_ROOT = Path("..").resolve().parent

PROCESSED = PROJECT_ROOT / "data" / "processed"

PROCESSED

PosixPath('/Users/eugenia/Desktop/Open Access/project/retired_places/data/processed')

In [29]:
df_population_it = pd.read_csv(PROCESSED / "pop_reg_it_clean.csv")
df_housing_it = pd.read_csv(PROCESSED / "homes_it_clean.csv")

In [30]:
df_housing_it.head()

Unnamed: 0,region,homes_occupied,homes_unoccupied,homes_total
0,Piemonte,1964108,827768,2791876
1,Valle d'Aosta / Vallée d'Aoste,59616,75948,135564
2,Liguria,746686,431321,1178007
3,Lombardia,4415364,1184728,5600092
4,Trentino Alto Adige / Südtirol,463305,219888,683193


In [31]:
df_population_it.head(110)

Unnamed: 0,region_code,region,age,pop_male,pop_female,pop_total
0,13,Abruzzo,0.0,3842.0,3577.0,7419.0
1,13,Abruzzo,1.0,4010.0,3653.0,7663.0
2,13,Abruzzo,2.0,4260.0,3873.0,8133.0
3,13,Abruzzo,3.0,4298.0,4163.0,8461.0
4,13,Abruzzo,4.0,4456.0,4163.0,8619.0
...,...,...,...,...,...,...
105,17,Basilicata,3.0,1755.0,1608.0,3363.0
106,17,Basilicata,4.0,1864.0,1674.0,3538.0
107,17,Basilicata,5.0,1929.0,1749.0,3678.0
108,17,Basilicata,6.0,1904.0,1819.0,3723.0


## Processing housing data <a id="Processing-housing-data"></a>

In [32]:
df_housing_it

Unnamed: 0,region,homes_occupied,homes_unoccupied,homes_total
0,Piemonte,1964108,827768,2791876
1,Valle d'Aosta / Vallée d'Aoste,59616,75948,135564
2,Liguria,746686,431321,1178007
3,Lombardia,4415364,1184728,5600092
4,Trentino Alto Adige / Südtirol,463305,219888,683193
5,Provincia Autonoma Bolzano / Bozen,226675,67100,293775
6,Provincia Autonoma Trento,236630,152788,389418
7,Veneto,2076568,584378,2660946
8,Friuli-Venezia Giulia,557109,173363,730472
9,Emilia-Romagna,1993088,554077,2547165


In [33]:
# normalize region names 
def normalize_region_name(s: str) -> str:
    s = s.strip()
    s = s.replace(" / ", "/")          
    s = s.replace(" - ", "-")        
    s = s.replace("–", "-")            
    return s

df_housing_it["region_norm"] = (
    df_housing_it["region"]
    .astype(str)
    .apply(normalize_region_name)
)

df_housing_it

Unnamed: 0,region,homes_occupied,homes_unoccupied,homes_total,region_norm
0,Piemonte,1964108,827768,2791876,Piemonte
1,Valle d'Aosta / Vallée d'Aoste,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste
2,Liguria,746686,431321,1178007,Liguria
3,Lombardia,4415364,1184728,5600092,Lombardia
4,Trentino Alto Adige / Südtirol,463305,219888,683193,Trentino Alto Adige/Südtirol
5,Provincia Autonoma Bolzano / Bozen,226675,67100,293775,Provincia Autonoma Bolzano/Bozen
6,Provincia Autonoma Trento,236630,152788,389418,Provincia Autonoma Trento
7,Veneto,2076568,584378,2660946,Veneto
8,Friuli-Venezia Giulia,557109,173363,730472,Friuli-Venezia Giulia
9,Emilia-Romagna,1993088,554077,2547165,Emilia-Romagna


In [34]:
# create dict with region codes for datasets mashing 
region_codes = {
    "Piemonte": 1,
    "Valle d'Aosta/Vallée d'Aoste": 2,
    "Lombardia": 3,
    "Trentino-Alto Adige/Südtirol": 4,
    "Veneto": 5,
    "Friuli-Venezia Giulia": 6,
    "Liguria": 7,
    "Emilia-Romagna": 8,
    "Toscana": 9,
    "Umbria": 10,
    "Marche": 11,
    "Lazio": 12,
    "Abruzzo": 13,
    "Molise": 14,
    "Campania": 15,
    "Puglia": 16,
    "Basilicata": 17,
    "Calabria": 18,
    "Sicilia": 19,
    "Sardegna": 20,
}

# region code maping
df_housing_it["region_code"] = df_housing_it["region_norm"].map(region_codes)

In [35]:
# check
print(df_housing_it[df_housing_it["region_code"].isna()][["region", "region_norm"]].head())

                                 region                       region_norm
4      Trentino Alto Adige / Südtirol        Trentino Alto Adige/Südtirol
5  Provincia Autonoma Bolzano / Bozen    Provincia Autonoma Bolzano/Bozen
6           Provincia Autonoma Trento           Provincia Autonoma Trento


The data providers chose to disaggregate Trentino-Alto Adige/Südtirol into three separate units (the whole region plus the two autonomous provinces, Bolzano/Bozen and Trento). As a result, these entries did not initially have a corresponding ISTAT region code in our region_codes dictionary and had to be added manually.

In [36]:
region_codes.update({
    "Trentino Alto Adige/Südtirol": 4,
    "Provincia Autonoma Bolzano/Bozen": 4,
    "Provincia Autonoma Trento": 4,
})

df_housing_it["region_code"] = df_housing_it["region_norm"].map(region_codes)

In [37]:
df_housing_it

Unnamed: 0,region,homes_occupied,homes_unoccupied,homes_total,region_norm,region_code
0,Piemonte,1964108,827768,2791876,Piemonte,1
1,Valle d'Aosta / Vallée d'Aoste,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,2
2,Liguria,746686,431321,1178007,Liguria,7
3,Lombardia,4415364,1184728,5600092,Lombardia,3
4,Trentino Alto Adige / Südtirol,463305,219888,683193,Trentino Alto Adige/Südtirol,4
5,Provincia Autonoma Bolzano / Bozen,226675,67100,293775,Provincia Autonoma Bolzano/Bozen,4
6,Provincia Autonoma Trento,236630,152788,389418,Provincia Autonoma Trento,4
7,Veneto,2076568,584378,2660946,Veneto,5
8,Friuli-Venezia Giulia,557109,173363,730472,Friuli-Venezia Giulia,6
9,Emilia-Romagna,1993088,554077,2547165,Emilia-Romagna,8


We dropped the rows for Provincia Autonoma Bolzano/Bozen and Provincia Autonoma Trento because their values sum exactly to the regional total for Trentino-Alto Adige/Südtirol, and keeping all three would double-count the same housing units in our regional-level analysis.

In [38]:
mask = df_housing_it["region_norm"].str.contains("Provincia Autonoma", na=False)

df_housing_it = df_housing_it[~mask].reset_index(drop=True)


df_housing_it

Unnamed: 0,region,homes_occupied,homes_unoccupied,homes_total,region_norm,region_code
0,Piemonte,1964108,827768,2791876,Piemonte,1
1,Valle d'Aosta / Vallée d'Aoste,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,2
2,Liguria,746686,431321,1178007,Liguria,7
3,Lombardia,4415364,1184728,5600092,Lombardia,3
4,Trentino Alto Adige / Südtirol,463305,219888,683193,Trentino Alto Adige/Südtirol,4
5,Veneto,2076568,584378,2660946,Veneto,5
6,Friuli-Venezia Giulia,557109,173363,730472,Friuli-Venezia Giulia,6
7,Emilia-Romagna,1993088,554077,2547165,Emilia-Romagna,8
8,Toscana,1627013,506892,2133905,Toscana,9
9,Umbria,376747,126922,503669,Umbria,10


In [39]:
# count unoccupied homes share
df_housing_it["share_unoccupied"] = (
    df_housing_it["homes_unoccupied"] / df_housing_it["homes_total"] * 100
)

df_housing_it

Unnamed: 0,region,homes_occupied,homes_unoccupied,homes_total,region_norm,region_code,share_unoccupied
0,Piemonte,1964108,827768,2791876,Piemonte,1,29.649168
1,Valle d'Aosta / Vallée d'Aoste,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,2,56.023723
2,Liguria,746686,431321,1178007,Liguria,7,36.614468
3,Lombardia,4415364,1184728,5600092,Lombardia,3,21.15551
4,Trentino Alto Adige / Südtirol,463305,219888,683193,Trentino Alto Adige/Südtirol,4,32.185341
5,Veneto,2076568,584378,2660946,Veneto,5,21.961287
6,Friuli-Venezia Giulia,557109,173363,730472,Friuli-Venezia Giulia,6,23.733011
7,Emilia-Romagna,1993088,554077,2547165,Emilia-Romagna,8,21.752694
8,Toscana,1627013,506892,2133905,Toscana,9,23.754197
9,Umbria,376747,126922,503669,Umbria,10,25.199486


In [40]:
it_free_homes = PROCESSED / "free_homes_it.csv"
df_housing_it.to_csv(it_free_homes, index=False)

print(f"saved to: {it_free_homes}")

saved to: /Users/eugenia/Desktop/Open Access/project/retired_places/data/processed/free_homes_it.csv


## Processing population data <a id="Processing-population-data"></a>


In [41]:
# extract total regions population
it_total_age = df_population_it[df_population_it["age"] == 999]
it_total_age

Unnamed: 0,region_code,region,age,pop_male,pop_female,pop_total
101,13,Abruzzo,999.0,622936.0,645494.0,1268430.0
203,17,Basilicata,999.0,262604.0,267293.0,529897.0
305,18,Calabria,999.0,899712.0,932435.0,1832147.0
407,15,Campania,999.0,2726809.0,2848216.0,5575025.0
509,8,Emilia-Romagna,999.0,2194241.0,2271437.0,4465678.0
611,6,Friuli-Venezia Giulia,999.0,584758.0,609337.0,1194095.0
713,12,Lazio,999.0,2771470.0,2938802.0,5710272.0
815,7,Liguria,999.0,731614.0,778294.0,1509908.0
917,3,Lombardia,999.0,4946391.0,5089090.0,10035481.0
1019,11,Marche,999.0,726773.0,754479.0,1481252.0


In [42]:
# add total region population as a separate column
totals_for_merge = (
    it_total_age[["region_code", "pop_total"]]
    .rename(columns={"pop_total": "pop_total_all_ages"})
)

df_population_it = df_population_it.merge(
    totals_for_merge,
    on="region_code",
    how="left"
)

df_population_it

Unnamed: 0,region_code,region,age,pop_male,pop_female,pop_total,pop_total_all_ages
0,13,Abruzzo,0.0,3842.0,3577.0,7419.0,1268430.0
1,13,Abruzzo,1.0,4010.0,3653.0,7663.0,1268430.0
2,13,Abruzzo,2.0,4260.0,3873.0,8133.0,1268430.0
3,13,Abruzzo,3.0,4298.0,4163.0,8461.0,1268430.0
4,13,Abruzzo,4.0,4456.0,4163.0,8619.0,1268430.0
...,...,...,...,...,...,...,...
2035,5,Veneto,97.0,601.0,2244.0,2845.0,4851851.0
2036,5,Veneto,98.0,350.0,1524.0,1874.0,4851851.0
2037,5,Veneto,99.0,221.0,1023.0,1244.0,4851851.0
2038,5,Veneto,100.0,267.0,1654.0,1921.0,4851851.0


In [43]:
# count share of retired people
df_population_it = df_population_it[df_population_it["age"] != 999].copy()

df_population_it["pop_65plus_tmp"] = df_population_it["pop_total"].where(
    df_population_it["age"] >= 65,
    0
)

df_population_it.head(70)

Unnamed: 0,region_code,region,age,pop_male,pop_female,pop_total,pop_total_all_ages,pop_65plus_tmp
0,13,Abruzzo,0.0,3842.0,3577.0,7419.0,1268430.0,0.0
1,13,Abruzzo,1.0,4010.0,3653.0,7663.0,1268430.0,0.0
2,13,Abruzzo,2.0,4260.0,3873.0,8133.0,1268430.0,0.0
3,13,Abruzzo,3.0,4298.0,4163.0,8461.0,1268430.0,0.0
4,13,Abruzzo,4.0,4456.0,4163.0,8619.0,1268430.0,0.0
...,...,...,...,...,...,...,...,...
65,13,Abruzzo,65.0,8625.0,9412.0,18037.0,1268430.0,18037.0
66,13,Abruzzo,66.0,8259.0,9010.0,17269.0,1268430.0,17269.0
67,13,Abruzzo,67.0,8105.0,8812.0,16917.0,1268430.0,16917.0
68,13,Abruzzo,68.0,8100.0,8573.0,16673.0,1268430.0,16673.0


In [44]:
# count people after 65 in every region 
df_region_65_it = (
    df_population_it
    .groupby(["region_code", "region"], as_index=False)
    .agg(
        pop_65plus=("pop_65plus_tmp", "sum"),        
        tot_pop=("pop_total_all_ages", "first"),    
    )
)

df_region_65_it.head()

Unnamed: 0,region_code,region,pop_65plus,tot_pop
0,1,Piemonte,1142793.0,4255702.0
1,2,Valle d'Aosta/Vallée d'Aoste,31686.0,122714.0
2,3,Lombardia,2394067.0,10035481.0
3,4,Trentino-Alto Adige/Südtirol,244707.0,1086095.0
4,5,Veneto,1208173.0,4851851.0


In [45]:
# count share of 65 +
df_region_65_it["share_65plus"] = df_region_65_it["pop_65plus"] / df_region_65_it["tot_pop"] * 100

In [46]:
df_region_65_it.head(20)

Unnamed: 0,region_code,region,pop_65plus,tot_pop,share_65plus
0,1,Piemonte,1142793.0,4255702.0,26.85322
1,2,Valle d'Aosta/Vallée d'Aoste,31686.0,122714.0,25.821015
2,3,Lombardia,2394067.0,10035481.0,23.856026
3,4,Trentino-Alto Adige/Südtirol,244707.0,1086095.0,22.530902
4,5,Veneto,1208173.0,4851851.0,24.90128
5,6,Friuli-Venezia Giulia,328115.0,1194095.0,27.478132
6,7,Liguria,440645.0,1509908.0,29.183566
7,8,Emilia-Romagna,1112536.0,4465678.0,24.913037
8,9,Toscana,977876.0,3660834.0,26.711837
9,10,Umbria,232730.0,851954.0,27.317203


In [47]:
it_65_path = PROCESSED / "region_65_it.csv"
df_region_65_it.to_csv(it_65_path, index=False)

print(f"saved to: {it_65_path}")

saved to: /Users/eugenia/Desktop/Open Access/project/retired_places/data/processed/region_65_it.csv


## Merging into final dataset <a id="Merging-into-final-dataset"></a>

In [48]:
df_italy_merged = df_region_65_it.merge(
    df_housing_it,
    on="region_code",
    how="left"
)

df_italy_merged = df_italy_merged.drop(columns=["region_y"])
df_italy_merged = df_italy_merged.rename(columns={"region_x": "region"})

df_italy_merged

Unnamed: 0,region_code,region,pop_65plus,tot_pop,share_65plus,homes_occupied,homes_unoccupied,homes_total,region_norm,share_unoccupied
0,1,Piemonte,1142793.0,4255702.0,26.85322,1964108,827768,2791876,Piemonte,29.649168
1,2,Valle d'Aosta/Vallée d'Aoste,31686.0,122714.0,25.821015,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,56.023723
2,3,Lombardia,2394067.0,10035481.0,23.856026,4415364,1184728,5600092,Lombardia,21.15551
3,4,Trentino-Alto Adige/Südtirol,244707.0,1086095.0,22.530902,463305,219888,683193,Trentino Alto Adige/Südtirol,32.185341
4,5,Veneto,1208173.0,4851851.0,24.90128,2076568,584378,2660946,Veneto,21.961287
5,6,Friuli-Venezia Giulia,328115.0,1194095.0,27.478132,557109,173363,730472,Friuli-Venezia Giulia,23.733011
6,7,Liguria,440645.0,1509908.0,29.183566,746686,431321,1178007,Liguria,36.614468
7,8,Emilia-Romagna,1112536.0,4465678.0,24.913037,1993088,554077,2547165,Emilia-Romagna,21.752694
8,9,Toscana,977876.0,3660834.0,26.711837,1627013,506892,2133905,Toscana,23.754197
9,10,Umbria,232730.0,851954.0,27.317203,376747,126922,503669,Umbria,25.199486


In [49]:
# adding macro region
macro_map = {
    "Piemonte": "North",
    "Valle d'Aosta/Vallée d'Aoste": "North",
    "Lombardia": "North",
    "Trentino-Alto Adige/Südtirol": "North",
    "Veneto": "North",
    "Friuli-Venezia Giulia": "North",
    "Liguria": "North",
    "Emilia-Romagna": "North",
    "Toscana": "Centre",
    "Umbria": "Centre",
    "Marche": "Centre",
    "Lazio": "Centre",
    "Abruzzo": "South",
    "Molise": "South",
    "Campania": "South",
    "Puglia": "South",
    "Basilicata": "South",
    "Calabria": "South",
    "Sicilia": "Islands",
    "Sardegna": "Islands",
}

df_italy_merged["macro_region"] = df_italy_merged["region"].map(macro_map)

df_italy_merged

Unnamed: 0,region_code,region,pop_65plus,tot_pop,share_65plus,homes_occupied,homes_unoccupied,homes_total,region_norm,share_unoccupied,macro_region
0,1,Piemonte,1142793.0,4255702.0,26.85322,1964108,827768,2791876,Piemonte,29.649168,North
1,2,Valle d'Aosta/Vallée d'Aoste,31686.0,122714.0,25.821015,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,56.023723,North
2,3,Lombardia,2394067.0,10035481.0,23.856026,4415364,1184728,5600092,Lombardia,21.15551,North
3,4,Trentino-Alto Adige/Südtirol,244707.0,1086095.0,22.530902,463305,219888,683193,Trentino Alto Adige/Südtirol,32.185341,North
4,5,Veneto,1208173.0,4851851.0,24.90128,2076568,584378,2660946,Veneto,21.961287,North
5,6,Friuli-Venezia Giulia,328115.0,1194095.0,27.478132,557109,173363,730472,Friuli-Venezia Giulia,23.733011,North
6,7,Liguria,440645.0,1509908.0,29.183566,746686,431321,1178007,Liguria,36.614468,North
7,8,Emilia-Romagna,1112536.0,4465678.0,24.913037,1993088,554077,2547165,Emilia-Romagna,21.752694,North
8,9,Toscana,977876.0,3660834.0,26.711837,1627013,506892,2133905,Toscana,23.754197,Centre
9,10,Umbria,232730.0,851954.0,27.317203,376747,126922,503669,Umbria,25.199486,Centre


To move from two continuous indicators to an interpretable typology, we defined binary flags for “high” ageing and “high” vacancy using simple, data-driven thresholds (the median values of share_65plus and share_unoccupied). We then combined these flags into a four-fold categorical variable, category_2x2, which assigns each region to one of four types: “Old & Empty” (high ageing, high vacancy), “Old & Lived-in” (high ageing, low vacancy), “Younger but Emptying” (low ageing, high vacancy), and “Younger & Lived-in” (low ageing, low vacancy). This discrete typology makes it easier to communicate and compare how different regions position themselves along the joint dimensions of demographic ageing and housing under-use.

In [50]:
# Use the median values as simple, data-driven thresholds
# for defining "high" ageing and "high" vacancy.
thr_65 = df_italy_merged["share_65plus"].median()          # threshold for high share of 65+ residents
thr_vac = df_italy_merged["share_unoccupied"].median()     # threshold for high share of unoccupied homes

# Create binary flags: is this region above or below each threshold?
df_italy_merged["high_65"] = df_italy_merged["share_65plus"] >= thr_65
df_italy_merged["high_vac"] = df_italy_merged["share_unoccupied"] >= thr_vac

# Map each combination of (high_65, high_vac) into a human-readable 2×2 category.
def cat(row):
    # High ageing and high vacancy: both people and places are "retired"
    if row["high_65"] and row["high_vac"]:
        return "Old & Empty"
    # High ageing but low vacancy: many older people, but housing is still actively used
    if row["high_65"] and not row["high_vac"]:
        return "Old & Lived-in"
    # Low ageing but high vacancy: relatively younger population, but many empty homes
    if not row["high_65"] and row["high_vac"]:
        return "Younger but Emptying"
    # Low ageing and low vacancy: younger population and intensively used housing stock
    return "Younger & Lived-in"

# Apply the function row by row to assign each region to one of the four types.
df_italy_merged["category_2x2"] = df_italy_merged.apply(cat, axis=1)

df_italy_merged

Unnamed: 0,region_code,region,pop_65plus,tot_pop,share_65plus,homes_occupied,homes_unoccupied,homes_total,region_norm,share_unoccupied,macro_region,high_65,high_vac,category_2x2
0,1,Piemonte,1142793.0,4255702.0,26.85322,1964108,827768,2791876,Piemonte,29.649168,North,True,False,Old & Lived-in
1,2,Valle d'Aosta/Vallée d'Aoste,31686.0,122714.0,25.821015,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,56.023723,North,False,True,Younger but Emptying
2,3,Lombardia,2394067.0,10035481.0,23.856026,4415364,1184728,5600092,Lombardia,21.15551,North,False,False,Younger & Lived-in
3,4,Trentino-Alto Adige/Südtirol,244707.0,1086095.0,22.530902,463305,219888,683193,Trentino Alto Adige/Südtirol,32.185341,North,False,True,Younger but Emptying
4,5,Veneto,1208173.0,4851851.0,24.90128,2076568,584378,2660946,Veneto,21.961287,North,False,False,Younger & Lived-in
5,6,Friuli-Venezia Giulia,328115.0,1194095.0,27.478132,557109,173363,730472,Friuli-Venezia Giulia,23.733011,North,True,False,Old & Lived-in
6,7,Liguria,440645.0,1509908.0,29.183566,746686,431321,1178007,Liguria,36.614468,North,True,True,Old & Empty
7,8,Emilia-Romagna,1112536.0,4465678.0,24.913037,1993088,554077,2547165,Emilia-Romagna,21.752694,North,False,False,Younger & Lived-in
8,9,Toscana,977876.0,3660834.0,26.711837,1627013,506892,2133905,Toscana,23.754197,Centre,True,False,Old & Lived-in
9,10,Umbria,232730.0,851954.0,27.317203,376747,126922,503669,Umbria,25.199486,Centre,True,False,Old & Lived-in


To compare regions on a relative scale, we computed separate rank positions for ageing and vacancy. First, we ranked regions by the share of residents aged 65+ (rank_65), and then by the share of unoccupied homes (rank_vac), with higher ranks indicating higher values of the corresponding indicator. We then calculated a simple divergence measure, rank_diff = rank_vac – rank_65. Positive values of rank_diff identify regions where vacancy is higher than one would expect given their ageing rank (i.e. “emptier than their level of ageing”), while negative values indicate regions that are relatively older than one would expect given their vacancy rank (i.e. “more aged than their level of housing under-use”).


In [51]:
# Compute rank positions for ageing and vacancy to compare regions on a relative scale.
# A higher rank means a higher value of the indicator (more ageing / more vacancy).

# Rank of each region by share of 65+ residents
df_italy_merged["rank_65"] = df_italy_merged["share_65plus"].rank(method="average")

# Rank of each region by share of unoccupied homes
df_italy_merged["rank_vac"] = df_italy_merged["share_unoccupied"].rank(method="average")

# Difference in ranks: positive values mean the region is "emptier" than expected
# given its ageing rank; negative values mean it is "older" than expected
# given its vacancy rank.
df_italy_merged["rank_diff"] = df_italy_merged["rank_vac"] - df_italy_merged["rank_65"]

df_italy_merged

Unnamed: 0,region_code,region,pop_65plus,tot_pop,share_65plus,homes_occupied,homes_unoccupied,homes_total,region_norm,share_unoccupied,macro_region,high_65,high_vac,category_2x2,rank_65,rank_vac,rank_diff
0,1,Piemonte,1142793.0,4255702.0,26.85322,1964108,827768,2791876,Piemonte,29.649168,North,True,False,Old & Lived-in,15.0,10.0,-5.0
1,2,Valle d'Aosta/Vallée d'Aoste,31686.0,122714.0,25.821015,59616,75948,135564,Valle d'Aosta/Vallée d'Aoste,56.023723,North,False,True,Younger but Emptying,10.0,20.0,10.0
2,3,Lombardia,2394067.0,10035481.0,23.856026,4415364,1184728,5600092,Lombardia,21.15551,North,False,False,Younger & Lived-in,5.0,2.0,-3.0
3,4,Trentino-Alto Adige/Südtirol,244707.0,1086095.0,22.530902,463305,219888,683193,Trentino Alto Adige/Südtirol,32.185341,North,False,True,Younger but Emptying,2.0,13.0,11.0
4,5,Veneto,1208173.0,4851851.0,24.90128,2076568,584378,2660946,Veneto,21.961287,North,False,False,Younger & Lived-in,8.0,4.0,-4.0
5,6,Friuli-Venezia Giulia,328115.0,1194095.0,27.478132,557109,173363,730472,Friuli-Venezia Giulia,23.733011,North,True,False,Old & Lived-in,19.0,5.0,-14.0
6,7,Liguria,440645.0,1509908.0,29.183566,746686,431321,1178007,Liguria,36.614468,North,True,True,Old & Empty,20.0,16.0,-4.0
7,8,Emilia-Romagna,1112536.0,4465678.0,24.913037,1993088,554077,2547165,Emilia-Romagna,21.752694,North,False,False,Younger & Lived-in,9.0,3.0,-6.0
8,9,Toscana,977876.0,3660834.0,26.711837,1627013,506892,2133905,Toscana,23.754197,Centre,True,False,Old & Lived-in,14.0,6.0,-8.0
9,10,Umbria,232730.0,851954.0,27.317203,376747,126922,503669,Umbria,25.199486,Centre,True,False,Old & Lived-in,17.0,8.0,-9.0


In [52]:
# saving final mashed df
it_homes_pop_path = PROCESSED / "homes_pop_it.csv"
df_italy_merged.to_csv(it_homes_pop_path, index=False)

print(f"saved to: {it_homes_pop_path}")

saved to: /Users/eugenia/Desktop/Open Access/project/retired_places/data/processed/homes_pop_it.csv
