<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Importing-external-functions" data-toc-modified-id="Importing-external-functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Importing external functions</a></span></li><li><span><a href="#Geofabrik/OSM" data-toc-modified-id="Geofabrik/OSM-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Geofabrik/OSM</a></span></li><li><span><a href="#Eubucco" data-toc-modified-id="Eubucco-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Eubucco</a></span></li><li><span><a href="#DBSM" data-toc-modified-id="DBSM-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>DBSM</a></span></li><li><span><a href="#Microsoft" data-toc-modified-id="Microsoft-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Microsoft</a></span></li><li><span><a href="#Boundaries-statistics" data-toc-modified-id="Boundaries-statistics-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Boundaries statistics</a></span></li></ul></div>

In [1]:
"""
The following code is designed to load datasets and merge them with the GISCO layer, which contains NUTS 3 level province codes. Subsequently, the code calculates the area in square meters of the buildings and determines the number of vertices for each building polygon.

Initially, the code loads the necessary datasets, including the geospatial data of the buildings and the GISCO layer containing the province codes at NUTS 3 level.

After loading the data, the code proceeds to merge the datasets with the GISCO layer based on the province codes.

Next, using the geospatial information of the buildings, the code calculates the area in square meters of each building. The area of each building is determined using the geometries of the corresponding polygons representing the buildings.

Finally, the code calculates the number of vertices for each polygon representing a building. This count is performed by considering the geometry of the polygons and the number of coordinates composing them, excluding the starting and ending points of the polygons, which coincide.


"""

'\nThe following code is designed to load datasets and merge them with the GISCO layer, which contains NUTS 3 level province codes. Subsequently, the code calculates the area in square meters of the buildings and determines the number of vertices for each building polygon.\n\nInitially, the code loads the necessary datasets, including the geospatial data of the buildings and the GISCO layer containing the province codes at NUTS 3 level.\n\nAfter loading the data, the code proceeds to merge the datasets with the GISCO layer based on the province codes.\n\nNext, using the geospatial information of the buildings, the code calculates the area in square meters of each building. The area of each building is determined using the geometries of the corresponding polygons representing the buildings.\n\nFinally, the code calculates the number of vertices for each polygon representing a building. This count is performed by considering the geometry of the polygons and the number of coordinates comp

# Importing external functions

In [17]:
#importing common functions
%run 0-[_functions]-0_methods.ipynb
import warnings
import datetime
warnings.filterwarnings("ignore")

# Geofabrik/OSM
https://taginfo.geofabrik.de/europe:italy:centro/keys/building#values<BR>
https://github.com/ai4up/eubucco/blob/main/tutorials/downloading-parsing-osm.ipynb

In [74]:
for country in ['DK','SE','EL','BE']:
    #country='MT'
    osm=load_osm_data(country)
    osm['country']=country
    osm_enriched=process_buildings_to_lau_parallel(osm)
    print(datetime.datetime.now(),'exporting results')
    osm_enriched.to_file('/mnt/CAS/20240101_foss4g/datasets/in/countries/%s/osm_enriched.gpkg'%(country),driver='GPKG')
    osm_stat=aggregation(osm_enriched)
    osm_stat.to_excel('/mnt/CAS/20240101_foss4g/datasets/out/%s_osm_output.xlsx'%(country),index=False)
    print(osm_enriched.head(2))
    print('----')
    print(osm_stat)
    print(datetime.datetime.now(),'total number of regions computed:',len(osm_stat))
    print('******\n')
    osm=None
    osm_enriched=None

['osm-latest-free.shp.zip']
2024-02-06 16:27:21.294043 importing osm data for DK - osm-latest-free.shp.zip
2024-02-06 16:31:40.003453 imported 3660690 rows
2024-02-06 16:31:40.004212 ['building']
2024-02-06 16:54:37.799194 total number of regions computed: 11
     osm-id osm-code osm-fclass        osm-name  osm-type  \
0   4250592     1500   building  Rigshospitalet  hospital   
1  14309996     1500   building        Nykredit    office   

                                            geometry source    dataset  \
0  POLYGON ((12.56488 55.69608, 12.56522 55.69626...    osm  geofabrik   
1  POLYGON ((12.57508 55.66973, 12.57572 55.67008...    osm  geofabrik   

  country      area_sqm num_vertices nuts_id urban_type       nuts_name  
0      DK  43669.531812           39   DK011          1  Byen København  
1      DK   9814.474012            4   DK011          1  Byen København  
----
      dataset source nuts_id             nuts_name  urban_type country  \
0   geofabrik    osm   DK011    

2024-02-06 17:38:27.601258 imported 6232885 rows
2024-02-06 17:38:27.601994 ['building']
2024-02-06 18:30:49.982497 total number of regions computed: 44
     osm-id osm-code osm-fclass                   osm-name osm-type  \
0   4414121     1500   building                       None     None   
1  10317312     1500   building  Technicum Noord-Antwerpen   school   

                                            geometry source    dataset  \
0  POLYGON ((4.45072 51.18915, 4.45072 51.18916, ...    osm  geofabrik   
1  POLYGON ((4.41164 51.23124, 4.41164 51.23151, ...    osm  geofabrik   

  country     area_sqm num_vertices nuts_id urban_type       nuts_name  
0      BE  11399.86164           38   BE211          1  Arr. Antwerpen  
1      BE   5732.97058           11   BE211          1  Arr. Antwerpen  
----
      dataset source nuts_id  \
0   geofabrik    osm   BE100   
1   geofabrik    osm   BE211   
2   geofabrik    osm   BE212   
3   geofabrik    osm   BE213   
4   geofabrik    osm   BE2

# Eubucco

In [75]:
for country in ['DK','SE','EL','BE']:
    #country='MT'
    eubucco=load_eubucco_data(country)
    eubucco['country']=country
    eubucco_enriched=process_buildings_to_lau_parallel(eubucco)
    print(datetime.datetime.now(),'exporting results')
    eubucco_enriched.to_file('/mnt/CAS/20240101_foss4g/datasets/in/countries/%s/eubucco_enriched.gpkg'%(country),driver='GPKG')
    eubucco_stat=aggregation(eubucco_enriched)
    eubucco_stat.to_excel('/mnt/CAS/20240101_foss4g/datasets/out/%s_eubucco_output.xlsx'%(country),index=False)
    print(eubucco_enriched.head(2))
    print('----')
    print(eubucco_stat.head(2))
    print(datetime.datetime.now(),'total number of regions computed:',len(eubucco_stat))
    print('******\n')
    eubucco=None
    eubucco_enriched=None


2024-02-06 18:30:51.524369 importing Eubucco data for DK
2024-02-06 18:43:27.981934 imported 5691756 rows
2024-02-06 21:41:27.471037 total number of regions computed: 44
                 id height   age  type                id_source type_source  \
0  v0.1-DNK.1.1_1-0   None  None  None  dk.bu-core2d.1005280530        None   
1  v0.1-DNK.1.1_1-1   None  None  None  dk.bu-core2d.1104978398        None   

                                            geometry   source  dataset  \
0  POLYGON ((12.33130 55.65694, 12.33124 55.65691...  eubucco  eubucco   
1  POLYGON ((12.34381 55.66785, 12.34377 55.66773...  eubucco  eubucco   

  country        eubucco-id     area_sqm num_vertices nuts_id urban_type  \
0      DK  v0.1-DNK.1.1_1-0  2712.656064            8   DK012          1   
1      DK  v0.1-DNK.1.1_1-1  1022.052761           10   DK012          1   

          nuts_name  
0  Københavns omegn  
1  Københavns omegn  
----
     dataset source nuts_id  \
0  geofabrik    osm   BE100   
1  geof

In [None]:
#only aggregation
for country in ['DK','SE','EL','BE']:
    dataset='eubucco'
    print('processing',country)
    eubucco_enriched=load_enriched_data(country,dataset)
    eubucco_stat=aggregation(eubucco_enriched)
    eubucco_stat.to_excel('/mnt/CAS/20240101_foss4g/datasets/out/%s_eubucco_output.xlsx'%(country),index=False)
    print(eubucco_enriched.head(2))
    print('----')
    print(eubucco_stat.head(2))
    print(datetime.datetime.now(),'total number of regions computed:',len(eubucco_stat))
    print('******\n')
    eubucco=None
    eubucco_enriched=None

# DBSM
Here we may have different sources for the same NUTS, take care to aggregate after

In [76]:
for country,country_extended in zip(['DK','SE','EL','BE'],['denmark','sweden','greece','belgium']):
    dbsm=load_dbsm_data(country,country_extended)
    dbsm['country']=country
    dbsm_enriched=process_buildings_to_lau_parallel(dbsm)
    print(datetime.datetime.now(),'exporting results')
    dbsm_enriched.to_file('/mnt/CAS/20240101_foss4g/datasets/in/countries/%s/jrc-dbsm_enriched.gpkg'%(country),driver='GPKG')
    
    dbsm_stat=aggregation(dbsm_enriched)
    dbsm_stat.to_excel('/mnt/CAS/20240101_foss4g/datasets/out/%s_jrc-dbsm_output.xlsx'%(country),index=False)
    print(dbsm_enriched.head(2))
    print('----')
    print(dbsm_stat.head(2))
    print(datetime.datetime.now(),'total number of regions computed:',len(dbsm_stat))
    print('******\n')
    dbsm=None
    dbsm_enriched=None


2024-02-07 09:22:54.785395 importing JRC-dbsm data for DK : denmark
2024-02-07 09:27:24.103625 imported 3770334 rows
2024-02-07 10:04:43.321481 total number of regions computed: 33
  source                                           geometry dataset country  \
0    osm  POLYGON ((8.79647 54.97307, 8.79656 54.97293, ...     jrc      DK   
1    osm  POLYGON ((8.76750 54.97691, 8.76769 54.97693, ...     jrc      DK   

  jrc-id     area_sqm num_vertices nuts_id urban_type   nuts_name  
0      0  1392.354379           10   DK032          2  Sydjylland  
1      1  1173.425553           10   DK032          2  Sydjylland  
----
  dataset source nuts_id         nuts_name  urban_type country  nr_buildings  \
0     jrc    esm   DK011    Byen København           1      DK           521   
1     jrc    esm   DK012  Københavns omegn           1      DK           508   

       area_sqm  num_vertices  
0  2.308576e+06         25735  
1  1.136397e+06         17467  
2024-02-07 10:04:43.503473 total nu

# Microsoft

In [3]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import shape
import datetime
import warnings
import datetime
warnings.filterwarnings("ignore")

# this is the name of the geography you want to retrieve. update to meet your needs
def get_data(location,country):
    dataset_links = pd.read_csv("https://minedbuildings.blob.core.windows.net/global-buildings/dataset-links.csv")
    greece_links = dataset_links[dataset_links.Location == location]
    gdf=[]
    i=1
    microsoft_buildings_stats=[]
    for _, row in greece_links.iterrows():
        print(datetime.datetime.now(),country,'importing link nr:',i)
        df = pd.read_json(row.Url, lines=True)
        df['geometry'] = df['geometry'].apply(shape)
        df['country']=country
        df['dataset']='microsoft'
        df['source']='microsoft'
        tmp=gpd.GeoDataFrame(df, crs=4326)
        print(datetime.datetime.now(),country,'processing link nr:',i,'\n')
        i+=1
        gdf.append(tmp)

    #saving original data
    print(datetime.datetime.now(),'merging geometries')
    gdf=pd.concat(gdf)
    print(datetime.datetime.now(),'mapping geometries to nuts')
    gdf_enriched=process_buildings_to_lau_parallel(gdf)
    print(datetime.datetime.now(),'exporting geometries')
    gdf_enriched.to_file('/mnt/CAS/20240101_foss4g/datasets/in/countries/%s/microsoft_enriched.gpkg'%(country),driver='GPKG')
    return gdf_enriched

for location,country in zip(['Denmark','Greece','Belgium'],['DK','EL','BE']):
    microsoft_enriched=get_data(location,country)
    print(datetime.datetime.now(),'computing aggregation by NUTS')
    microsoft_enriched_stat=aggregation(microsoft_enriched)
    microsoft_enriched_stat.to_excel('/mnt/CAS/20240101_foss4g/datasets/out/%s_microsoft_output.xlsx'%(country),index=False)
    print(microsoft_enriched.head(2))
    print('----')
    print(microsoft_enriched_stat.head(2))
    print(datetime.datetime.now(),'total number of regions computed:',len(microsoft_enriched_stat))
    print('******')
    


2024-02-07 18:57:47.793739 DK importing link nr: 1
2024-02-07 18:57:51.663130 DK processing link nr: 1 

2024-02-07 18:57:51.663390 DK importing link nr: 2
2024-02-07 18:57:53.551007 DK processing link nr: 2 

2024-02-07 18:57:53.551579 DK importing link nr: 3
2024-02-07 18:57:56.536998 DK processing link nr: 3 

2024-02-07 18:57:56.537677 DK importing link nr: 4
2024-02-07 18:58:14.566115 DK processing link nr: 4 

2024-02-07 18:58:14.566430 DK importing link nr: 5
2024-02-07 18:58:15.991211 DK processing link nr: 5 

2024-02-07 18:58:15.991444 DK importing link nr: 6
2024-02-07 18:58:17.920598 DK processing link nr: 6 

2024-02-07 18:58:17.921551 DK importing link nr: 7
2024-02-07 18:58:23.019314 DK processing link nr: 7 

2024-02-07 18:58:23.020312 DK importing link nr: 8
2024-02-07 18:58:30.067240 DK processing link nr: 8 

2024-02-07 18:58:30.068017 DK importing link nr: 9
2024-02-07 18:58:39.742793 DK processing link nr: 9 

2024-02-07 18:58:39.743063 DK importing link nr: 10
202

2024-02-07 19:43:30.577964 EL processing link nr: 15 

2024-02-07 19:43:30.578863 EL importing link nr: 16
2024-02-07 19:43:44.132858 EL processing link nr: 16 

2024-02-07 19:43:44.133708 EL importing link nr: 17
2024-02-07 19:44:01.495219 EL processing link nr: 17 

2024-02-07 19:44:01.495550 EL importing link nr: 18
2024-02-07 19:44:22.528055 EL processing link nr: 18 

2024-02-07 19:44:22.528764 EL importing link nr: 19
2024-02-07 19:44:45.245371 EL processing link nr: 19 

2024-02-07 19:44:45.246337 EL importing link nr: 20
2024-02-07 19:44:48.039545 EL processing link nr: 20 

2024-02-07 19:44:48.039846 EL importing link nr: 21
2024-02-07 19:45:09.050238 EL processing link nr: 21 

2024-02-07 19:45:09.050537 EL importing link nr: 22
2024-02-07 19:45:29.373223 EL processing link nr: 22 

2024-02-07 19:45:29.374001 EL importing link nr: 23
2024-02-07 19:45:58.631407 EL processing link nr: 23 

2024-02-07 19:45:58.632407 EL importing link nr: 24
2024-02-07 19:46:09.103107 EL process

2024-02-07 19:59:10.121673 EL processing link nr: 92 

2024-02-07 19:59:10.122444 EL importing link nr: 93
2024-02-07 19:59:41.126772 EL processing link nr: 93 

2024-02-07 19:59:41.127084 EL importing link nr: 94
2024-02-07 19:59:42.590147 EL processing link nr: 94 

2024-02-07 19:59:42.590429 EL importing link nr: 95
2024-02-07 19:59:47.537900 EL processing link nr: 95 

2024-02-07 19:59:47.538733 EL importing link nr: 96
2024-02-07 19:59:49.529912 EL processing link nr: 96 

2024-02-07 19:59:49.530755 EL importing link nr: 97
2024-02-07 19:59:51.373403 EL processing link nr: 97 

2024-02-07 19:59:51.374085 EL importing link nr: 98
2024-02-07 19:59:58.536665 EL processing link nr: 98 

2024-02-07 19:59:58.537664 EL importing link nr: 99
2024-02-07 20:00:01.748263 EL processing link nr: 99 

2024-02-07 20:00:01.749058 EL importing link nr: 100
2024-02-07 20:00:04.138342 EL processing link nr: 100 

2024-02-07 20:00:04.138549 EL importing link nr: 101
2024-02-07 20:00:21.660143 EL proc

# Boundaries statistics

In [63]:
import geopandas as gpd
import pandas as pd
zipped_shapefile_path = '/mnt/CAS/20240101_foss4g/datasets/in/NUTS_RG_01M_2016_4326.shp.zip'
gdf_lau = gpd.read_file('zip://' + zipped_shapefile_path)
print('nr of lau3 regions per country')
gdf_lau_stat=pd.DataFrame(gdf_lau[gdf_lau['LEVL_CODE']==3].groupby(['CNTR_CODE'])['NUTS_ID'].count()).reset_index()
gdf_lau_stat.sort_values(by='NUTS_ID',ascending=False)

nr of lau3 regions per country


Unnamed: 0,CNTR_CODE,NUTS_ID
7,DE,401
36,UK,179
18,IT,110
13,FR,101
35,TR,81
28,PL,73
11,ES,59
10,EL,52
2,BE,44
30,RO,42
