In [1]:
from datetime import datetime

# Get the current date and time
current_datetime = datetime.now()

print(current_datetime)

2024-10-31 16:17:36.328748


# Spatio Temporal Logistic
This notebook is the first test of a spatio temporal logistic to link properly the **built-up surface** of a given region depending on its quality of life (**GDP/cap**) and of its **population density**.

In [2]:
import numpy as np
from Region import Region
import pandas as pd
import geopandas as gpd
import os

In [3]:
from concurrent.futures import ThreadPoolExecutor
import itertools
from colorama import Fore, Style# init

In [4]:
max_workers = 16

## Part 0 : Theory
First we define some classical functions that will be used later for our modelisation.

### The logistic function
$f(t) = \frac{S}{1+e^{(-k(t-t_0))}}$

with: 
- *S* : the saturation level
- *k* : the slope
- $t_0$ : the half height value (left right alignement).

The logistic function serves as a predictive equation and as the motor of our dynamical models. Indeed, we observe that at the country scale, as time passes, the GDP/cap increases and some observables as the number of cars or as the number of m2/cap. This means that the quality of life, and thus the stocks consumed by a region improve. But this exponential growth does not increase indefinetely. A saturation is observed even if the GDP/cap still increase. This is explained because people usually do not own 2 or 3 cars even though they possess the money to. An logistic function, also known as the S curve, is a way to model this observation.

### The exponential decay function
$f(x) = a\, e^{-b\,x}+c$

with:
- *a* : the slope
- *b* : the half-life
- *c* : the bias.

The exponential decay is a decreasing exponential observed in nature (for example for the probability of nuclear decay over time). This is one of our assumption to model the decreasing phenomena observed betwenn the **built-up surface/cap** and the **population density** of a region.

In [5]:
def logistic(x, a, b, c):
    return a / (1 + np.exp(-b*(x-c)))

def exponential_decay(X, a, b, c):
    return a * np.exp(-b * X) + c

def exponential_decay2(X, a, b):
    return a * np.exp(-b * X) 

### The Spatio Temporal Logistic function
$f(t,x) = \frac{a\,e^{-b\,x}+c}{1 + e^{(-k(t-t_0))}}$

The Spatio Temporal Logistic function (**STL**) is the mix between our classical logistic expression and the exponential decay as S (the saturation level).

In [6]:
def STL(X, a, b, c, d, e):
    # STL stands for Spatio Temporal Logistic. It is an refinment of a logistic, dependant on time (here the GDP per capita serves 
    # as proxy for time), with a saturation level which is a function of space (here the population density of a region)
    x1, x2 = X
    saturation = exponential_decay(x2, a, b,c)
    return saturation / (1 + np.exp(-d*(x1-e)))

## PART 1 : Initialisation
Initialisation of the analysis parameters.

- **region_names** : (*string*) the country to study, named by their ISO3 
- **years** : (*string*) years to study
- **raster_S** : (*string*) letter used in the **GHSL** dataset (S, S_NRES, POP, ...)
- **lvl** : (*int*) the level of our administrative data (GDP and population)
- **subregion_borders** : (*string*) the path to administrative border shapefile to cut the subregions 
- **i dentifier** : (*string*) the column name to match the region names between the administrative data and the GIS data

To each region is associated a **DataFrame** (*oecd_DF_merged*) with the subregions matching administrative observables (GDP, population, etc).

In [7]:
lvl = 2

# GHSL type
raster_str = "Built_S"
with_parents_computation = False

In [8]:
data_folder = "/data/mineralogie/hautervo/data/"
preprocessing_folder = data_folder + "Preprocessed_regions/GHSL_OECD/" + raster_str + "/TL" + str(lvl) + "/"

In [9]:
# Make the OECD complete DF
oecd_gdp_per_cap_file = data_folder + r"OECD/GDP per capita/TL" + str(lvl) + r"/Gross Domestic Product per capita, in USD.csv"
oecd_population_file = data_folder + r"OECD/Population/TL" + str(lvl) + r"/Resident population.csv"

oecd_gdp_per_capita_df = pd.read_csv(oecd_gdp_per_cap_file, skiprows=0, header=1)
oecd_population_df = pd.read_csv(oecd_population_file, skiprows=0, header=1)

df_list = []
df_list.append(oecd_gdp_per_capita_df)
df_list.append(oecd_population_df)

subregion_col = "tl"+str(lvl)+"_id"
parent_col = "iso3"

for df in df_list:
    df[subregion_col] = df["Code"]

In [10]:
# The OECD admin units

oecd_admin_units = data_folder + "OECD/admin_units/TL" + str(lvl) + "/OECD_TL" + str(lvl) + "_2020_ESRI54009.shp"
gpd_oecd_admin_units = gpd.read_file(oecd_admin_units)

In [11]:
# Countries to ignore from our study (not enough data)
country_to_pop = ["SRB", "CRI", "ISR", "CYP", "ISL", "ALB", "LIE","MNE","MKD"]

In [12]:
regions_names = list(oecd_gdp_per_capita_df["Country"].unique())

#exclude the countries
for c in country_to_pop:
    regions_names.pop(regions_names.index(c))

# regions_names = ["FRA", "DEU", "GBR", "BEL", "ITA", "LUX", "ESP", "USA", "JPN", "CAN", "AUS"] # to remove
# regions_names = ["FRA", "DEU", "GBR", "BEL", "ITA", "LUX", "ESP"] # to remove
regions_names = ["FRA"]

# years = ["1975", "1990", "2000", "2010", "2020"] 
years = ["2000", "2010", "2020"]
# years = ["2020"]



regions = []

for name in regions_names:
    new_region = Region(name, lvl-1)
    regions.append(new_region)

    for y in years:        
        new_region.add_gis(data_folder + "GHSL/"+ raster_str + "/E" + y + "_100m_Global/subregions/" + name + ".tif", raster_str + "_" + y, str(y), lvl-1) 
        new_region.add_gis(data_folder + "GHSL/Built_POP/E" + y + "_100m_Global/subregions/" + name + ".tif", "Built_POP_" + y, str(y), lvl-1) 



In [13]:
output_csv_path = preprocessing_folder + "csv/"

## PART 2 : Computation of the observables
Now that we define all the parameters of our study, we will cut the regions into their respective subregions (*make_subregions*). 

We then compute for each subregions GIS, the geographical observables that we store in its oecd_DF_merged.

In [14]:
def GHSL_observables_computation_parallel(subregion, overwrite):    
    csv_path = os.path.join(output_csv_path, subregion.parent_name, subregion.name, '_'.join(years))+".csv"
    if not os.path.isfile(csv_path) or overwrite:
        subregion.output_df = pd.DataFrame({"year": years, "GDP per capita":None, "Population_OECD": None, "Population_GHSL": None, "Built up surface GHSL/Population_OECD": None, "Population_OECD/Total surface": None})

        for y in years:
            # first use the Built_S gis
            gis = next((gis for gis in subregion.gis_list if gis.name == raster_str + "_" + y), None)
            if gis != None:
                subregion.output_df.loc[subregion.output_df["year"]==y, "Built up surface GHSL"] = int(gis.get_total_sum_pixel_values())
                subregion.output_df.loc[subregion.output_df["year"]==y, "Total surface"] = int(1e4 * gis.get_total_number_pixels() )
                subregion.output_df.loc[subregion.output_df["year"]==y, "Built up surface fraction"] = subregion.output_df.loc[subregion.output_df["year"]==y, "Built up surface GHSL"] / subregion.output_df.loc[subregion.output_df["year"]==y, "Total surface"]
            else:
                print(Fore.RED, "Gis ", raster_str + "_" + y, " not found.", Style.RESET_ALL)

            # Second use the Built_POP gis
            gis = next((gis for gis in subregion.gis_list if gis.name == "Built_POP_" + y), None)
            if gis != None:
                subregion.output_df.loc[subregion.output_df["year"]==y, "Population_GHSL"] = int(gis.get_total_sum_pixel_values())
            else:
                print(Fore.RED, "Gis ", "Built_POP_" + y, " not found.", Style.RESET_ALL)

            ### fill the DF
            try:
                subregion.output_df.loc[subregion.output_df["year"]==y, "GDP per capita"] = oecd_gdp_per_capita_df.loc[oecd_gdp_per_capita_df[subregion_col]==subregion.name, y].values[0]
            except:
                print("Missing: ", subregion.name, " ", y, " GDP per capita.")
            
            try:
                subregion.output_df.loc[subregion.output_df["year"]==y, "Population_OECD"] = oecd_population_df.loc[oecd_population_df[subregion_col]==subregion.name, y].values[0]               
                # output
                if subregion.output_df.loc[subregion.output_df["year"]==y, "Population_OECD"] is not np.nan:
                    subregion.output_df.loc[subregion.output_df["year"]==y, "Population_OECD/Total surface"] = subregion.output_df.loc[subregion.output_df["year"]==y, "Population_OECD"] / subregion.output_df.loc[subregion.output_df["year"]==y, "Total surface"]
                    subregion.output_df.loc[subregion.output_df["year"]==y, "Built up surface GHSL/Population_OECD"] = subregion.output_df.loc[subregion.output_df["year"]==y, "Built up surface GHSL"] / subregion.output_df.loc[subregion.output_df["year"]==y, "Population_OECD"]
            except:
                print("Missing: ", subregion.name, " ", y, " Population_OECD.")

        # save the new df
        os.makedirs(os.path.dirname(csv_path), exist_ok="True")
        subregion.output_df.to_csv(csv_path, index=False)
    else:
        #use a precomputed csv
        subregion.output_df = pd.read_csv(csv_path)

In [15]:
def preprocess():
    #Step 1 : make subregions
    overwrite = False

    print(Fore.GREEN + "Starting make_subregions()" + Style.RESET_ALL)
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        list(executor.map(lambda region: region.make_subregions(gpd_oecd_admin_units, subregion_col, parent_col, overwrite=overwrite), regions))

    #Step 2.1 : Computation
    overwrite = False

    subregions_list_parallel = []

    for region in regions:
        for subregion in region.subregions:
            subregions_list_parallel.append(subregion)
    
    print(Fore.GREEN + "Starting observable_computation()" + Style.RESET_ALL)
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        executor.map(GHSL_observables_computation_parallel, subregions_list_parallel, itertools.repeat(overwrite))

    del subregions_list_parallel # not needed anymore

    if with_parents_computation:
        print(Fore.GREEN + "Starting computing region DF" + Style.RESET_ALL)
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            executor.map(lambda region: region.compute_own_df(years, "GHSL_OECD"), regions)

In [16]:
preprocess()   

[32mStarting make_subregions()[0m
Something went wrong while trying to cut a raster for
 FRY1 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY1 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY1 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY1 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY1 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY1 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY2 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY2 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY2 Input shapes do not overlap raster.
Something went wrong while trying to cut a raster for
 FRY2 Input shapes do not overlap raster.
Some

In [17]:
from datetime import datetime

# Get the current date and time
current_datetime = datetime.now()
print(Fore.GREEN, "Ended normally.", Style.RESET_ALL)
print(current_datetime)

[32m Ended normally. [0m
2024-10-31 16:17:53.803042


# TEST ZONE

In [18]:
for region in regions:
    for subregion in region.subregions:
        if subregion.name=="FRK":
            subregion.get_osm_by_tag("building", "FRK", subregion.geometry, crs=subregion.crs)

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)
  overpass_settings = _make_overpass_settings()
  yield _overpass_request(data={"data": query_str})
  this_pause = _get_overpass_pause(overpass_endpoint)


KeyboardInterrupt: 