# Dem4Cli: check countries included

WCDE provides demographic information for 202 countries/administrative units + the world average, for the year 1950-2100 for three SSP scenarios (ssp1, ssp2, ssp3). Of these, 197 are available in the isipedia country mask.

**1. What countries are included/excluded ?**
   
These 197 countries/administrative units correspond to 184 UN member/observer countries, plus 13 countries/administrations that are not UN sovereign member countries (disputed territories, special adminstrative regions and overseas administrative units, e.g., Hong Kong, French Guyana, Taiwan, Puerto Rico, La Reunion...).

**2. How much global population is omitted?**

Data is matched based on these 197 countries/administrative units. This corresponds to ~99.97% of the global population, (i.e. the missing areas omitted from the analysis only amount to 0.03% of the global population).


In [1]:
import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
import pickle as pk
from scipy import interpolate
import regionmask
import glob, os, re
import openpyxl
import matplotlib.pyplot as plt
import warnings


pd.set_option('display.max_rows', 80)
%matplotlib inline 

from population_demographics import * 

## Part 1. What countries are included/excluded?

### load country metadata

In [3]:
df_metadata = load_country_metadata()
df_metadata

Unnamed: 0,country,country_iso3,country_code,region,income_group
0,Afghanistan,AFG,3,South Asia,Low income
1,Albania,ALB,103,Europe & Central Asia,Upper middle income
2,Algeria,DZA,203,Middle East & North Africa,Upper middle income
3,Andorra,AND,403,Europe & Central Asia,High income
4,Angola,AGO,503,Sub-Saharan Africa,Lower middle income
...,...,...,...,...,...
190,Venezuela (Bolivarian Republic of),VEN,22503,Latin America & Caribbean,
191,Viet Nam,VNM,22603,East Asia & Pacific,Lower middle income
192,Yemen,YEM,23003,Middle East & North Africa,Low income
193,Zambia,ZMB,23103,Sub-Saharan Africa,Lower middle income


In [4]:
df_countries_matched = match_country_names_all_mask_frac()

Unmatched ISIMIP countries (without WCDE data) after all merges:
                   country           country_wb
194                Andorra              Andorra
196               Dominica             Dominica
199                  Palau                Palau
7            Liechtenstein        Liechtenstein
8         Marshall Islands     Marshall Islands
9                   Monaco               Monaco
10                   Nauru                Nauru
12   Saint Kitts and Nevis  St. Kitts and Nevis
13              San Marino           San Marino
14                  Tuvalu               Tuvalu
Unmatched WCDE countries after all merges:
                                   country_wcde
2                                         Aruba
3                               Channel Islands
4                                       Curaçao
5  Macao Special Administrative Region of China
6                                         World
Unmatched ISIMIP mask countries (geojson + frac mask) after all merges:
    

In [5]:
df_countries_matched

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group
0,Afghanistan,Afghanistan,Afghanistan,Afghanistan,AFG,AFG,AFG,3.0,South Asia,Low income
1,Albania,Albania,Albania,Albania,ALB,ALB,ALB,103.0,Europe & Central Asia,Upper middle income
2,Algeria,Algeria,Algeria,Algeria,DZA,DZA,DZA,203.0,Middle East & North Africa,Upper middle income
3,Andorra,Andorra,,Andorra,AND,AND,AND,403.0,Europe & Central Asia,High income
4,Angola,Angola,Angola,Angola,AGO,AGO,AGO,503.0,Sub-Saharan Africa,Lower middle income
...,...,...,...,...,...,...,...,...,...,...
228,,,,,,,MNP,,,
229,,,,,,,PSID,,,
230,,,,,,,SXM,,,
231,,,,,,,TCA,,,


In [79]:
#UN_countries = pd.read_csv('./data-new/ignore/un-countries.csv',sep=',').rename(columns={'country':'country_un'})

# Load UN country list from Wikipedia with sovereignty status (2024 July) 

UN_countries = pd.read_excel('./data-new/ignore/ISO-wikipedia.xlsx').rename(columns={'World Factbook[6]':'country_un','ISO 3166-1-A3':'iso3_un' })[['country_un','Sovereignty','iso3_un']]

In [80]:
UN_countries = UN_countries.dropna(subset=['iso3_un'])
UN_countries

Unnamed: 0,country_un,Sovereignty,iso3_un
0,Islamic Republic of Afghanistan,UN member,AFG
1,Åland,Finland,ALA
2,Republic of Albania,UN member,ALB
3,People's Democratic Republic of Algeria,UN member,DZA
4,Territory of American Samoa,United States,ASM
...,...,...,...
269,Territory of the Wallis and Futuna Islands,France,WLF
270,Sahrawi Arab Democratic Republic,Disputed [ak],ESH
271,Republic of Yemen,UN member,YEM
272,Republic of Zambia,UN member,ZMB


In [81]:
#merge datasets

df_merge = pd.merge(df_countries_matched,UN_countries, left_on='iso3_frac',right_on='iso3_un',how='outer',indicator='merge_iso')
df_merge

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso
0,Afghanistan,Afghanistan,Afghanistan,Afghanistan,AFG,AFG,AFG,3.0,South Asia,Low income,Islamic Republic of Afghanistan,UN member,AFG,both
1,Albania,Albania,Albania,Albania,ALB,ALB,ALB,103.0,Europe & Central Asia,Upper middle income,Republic of Albania,UN member,ALB,both
2,Algeria,Algeria,Algeria,Algeria,DZA,DZA,DZA,203.0,Middle East & North Africa,Upper middle income,People's Democratic Republic of Algeria,UN member,DZA,both
3,Andorra,Andorra,,Andorra,AND,AND,AND,403.0,Europe & Central Asia,High income,Principality of Andorra,UN member,AND,both
4,Angola,Angola,Angola,Angola,AGO,AGO,AGO,503.0,Sub-Saharan Africa,Lower middle income,Republic of Angola,UN member,AGO,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
255,,,,,,,,,,,Collectivity of Saint-Martin,France,MAF,right_only
256,,,,,,,,,,,Republic of San Marino,UN member,SMR,right_only
257,,,,,,,,,,,Tokelau,New Zealand,TKL,right_only
258,,,,,,,,,,,"Baker Island, Howland Island, Jarvis Island, J...",United States,UMI,right_only


In [82]:
df_merge[df_merge['merge_iso']=='both']


Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso
0,Afghanistan,Afghanistan,Afghanistan,Afghanistan,AFG,AFG,AFG,3.0,South Asia,Low income,Islamic Republic of Afghanistan,UN member,AFG,both
1,Albania,Albania,Albania,Albania,ALB,ALB,ALB,103.0,Europe & Central Asia,Upper middle income,Republic of Albania,UN member,ALB,both
2,Algeria,Algeria,Algeria,Algeria,DZA,DZA,DZA,203.0,Middle East & North Africa,Upper middle income,People's Democratic Republic of Algeria,UN member,DZA,both
3,Andorra,Andorra,,Andorra,AND,AND,AND,403.0,Europe & Central Asia,High income,Principality of Andorra,UN member,AND,both
4,Angola,Angola,Angola,Angola,AGO,AGO,AGO,503.0,Sub-Saharan Africa,Lower middle income,Republic of Angola,UN member,AGO,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
226,,,,,,,CUW,,,,Country of Curaçao,Netherlands,CUW,both
228,,,,,,,MNP,,,,Commonwealth of the Northern Mariana Islands,United States,MNP,both
230,,,,,,,SXM,,,,Sint Maarten,Netherlands,SXM,both
231,,,,,,,TCA,,,,Turks and Caicos Islands,United Kingdom,TCA,both


In [86]:
unmatched_lx = df_merge[df_merge['merge_iso']=='left_only']
unmatched_lx

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso
100,Liechtenstein,Liechtenstein,,,LIE,,,11603.0,Europe & Central Asia,High income,,,,left_only
101,Monaco,Monaco,,,MCO,,,13503.0,Europe & Central Asia,High income,,,,left_only
102,San Marino,San Marino,,,SMR,,,18003.0,Europe & Central Asia,High income,,,,left_only
103,,,Aruba,,,,,,,,,,,left_only
104,,,Channel Islands,,,,,,,,,,,left_only
105,,,Curaçao,,,,,,,,,,,left_only
106,,,Macao Special Administrative Region of China,,,,,,,,,,,left_only
107,,,World,,,,,,,,,,,left_only
225,,,,,,,CSID,,,,,,,left_only
227,,,,,,,IOSID,,,,,,,left_only


In [87]:
unmatched_rx = df_merge[df_merge['merge_iso']=='right_only']
unmatched_rx

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso
233,,,,,,,,,,,Åland,Finland,ALA,right_only
234,,,,,,,,,,,Anguilla,United Kingdom,AIA,right_only
235,,,,,,,,,,,Antarctica (all land and ice shelves south of ...,Antarctic Treaty,ATA,right_only
236,,,,,,,,,,,Aruba,Netherlands,ABW,right_only
237,,,,,,,,,,,"Bonaire, Sint Eustatius and Saba",Netherlands,BES,right_only
238,,,,,,,,,,,Bouvet Island,Norway,BVT,right_only
239,,,,,,,,,,,British Indian Ocean Territory,United Kingdom,IOT,right_only
240,,,,,,,,,,,Territory of Christmas Island,Australia,CXR,right_only
241,,,,,,,,,,,Territory of Cocos (Keeling) Islands,Australia,CCK,right_only
242,,,,,,,,,,,Cook Islands,New Zealand,COK,right_only


In [92]:
second_merge = pd.merge(unmatched_lx.iloc[:,0:9], unmatched_rx.iloc[:,9:], left_on='country_iso3', right_on='iso3_un',how='outer',indicator='merge_iso2')
second_merge

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
0,Liechtenstein,Liechtenstein,,,LIE,,,11603.0,Europe & Central Asia,,Principality of Liechtenstein,UN member,LIE,right_only,both
1,Monaco,Monaco,,,MCO,,,13503.0,Europe & Central Asia,,Principality of Monaco,UN member,MCO,right_only,both
2,San Marino,San Marino,,,SMR,,,18003.0,Europe & Central Asia,,Republic of San Marino,UN member,SMR,right_only,both
3,,,Aruba,,,,,,,,,,,,left_only
4,,,Channel Islands,,,,,,,,,,,,left_only
5,,,Curaçao,,,,,,,,,,,,left_only
6,,,Macao Special Administrative Region of China,,,,,,,,,,,,left_only
7,,,World,,,,,,,,,,,,left_only
8,,,,,,,CSID,,,,,,,,left_only
9,,,,,,,IOSID,,,,,,,,left_only


In [112]:
df_combined = pd.concat([df_merge[df_merge['merge_iso']=='both'],second_merge]).reset_index(drop=True)
df_combined

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
0,Afghanistan,Afghanistan,Afghanistan,Afghanistan,AFG,AFG,AFG,3.0,South Asia,Low income,Islamic Republic of Afghanistan,UN member,AFG,both,
1,Albania,Albania,Albania,Albania,ALB,ALB,ALB,103.0,Europe & Central Asia,Upper middle income,Republic of Albania,UN member,ALB,both,
2,Algeria,Algeria,Algeria,Algeria,DZA,DZA,DZA,203.0,Middle East & North Africa,Upper middle income,People's Democratic Republic of Algeria,UN member,DZA,both,
3,Andorra,Andorra,,Andorra,AND,AND,AND,403.0,Europe & Central Asia,High income,Principality of Andorra,UN member,AND,both,
4,Angola,Angola,Angola,Angola,AGO,AGO,AGO,503.0,Sub-Saharan Africa,Lower middle income,Republic of Angola,UN member,AGO,both,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
252,,,,,,,,,,,"Saint Helena, Ascension and Tristan da Cunha",United Kingdom,SHN,right_only,right_only
253,,,,,,,,,,,Collectivity of Saint-Martin,France,MAF,right_only,right_only
254,,,,,,,,,,,Tokelau,New Zealand,TKL,right_only,right_only
255,,,,,,,,,,,"Baker Island, Howland Island, Jarvis Island, J...",United States,UMI,right_only,right_only


In [113]:
for col in ['iso3_frac', 'country_iso3', 'iso3_un']:
    value_counts = df_combined[col].value_counts()
    filtered_counts = value_counts[value_counts > 1]
    if not filtered_counts.empty:
        print(f"Column: {col}")
        print(filtered_counts)

# no duplicates 

In [115]:
df_combined['Sovereignty'].value_counts()

Sovereignty
UN member           193
France               12
United Kingdom       12
United States         6
Netherlands           4
Australia             4
British Crown         3
New Zealand           3
Denmark               2
Norway                2
China                 2
UN observer           2
Disputed [ak]         1
Disputed [aa]         1
Finland               1
Antarctic Treaty      1
Name: count, dtype: int64

In [114]:
df_combined[df_combined['Sovereignty']=='UN member']

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
0,Afghanistan,Afghanistan,Afghanistan,Afghanistan,AFG,AFG,AFG,3.0,South Asia,Low income,Islamic Republic of Afghanistan,UN member,AFG,both,
1,Albania,Albania,Albania,Albania,ALB,ALB,ALB,103.0,Europe & Central Asia,Upper middle income,Republic of Albania,UN member,ALB,both,
2,Algeria,Algeria,Algeria,Algeria,DZA,DZA,DZA,203.0,Middle East & North Africa,Upper middle income,People's Democratic Republic of Algeria,UN member,DZA,both,
3,Andorra,Andorra,,Andorra,AND,AND,AND,403.0,Europe & Central Asia,High income,Principality of Andorra,UN member,AND,both,
4,Angola,Angola,Angola,Angola,AGO,AGO,AGO,503.0,Sub-Saharan Africa,Lower middle income,Republic of Angola,UN member,AGO,both,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,Zambia,Zambia,Zambia,Zambia,ZMB,ZMB,ZMB,23103.0,Sub-Saharan Africa,Lower middle income,Republic of Zambia,UN member,ZMB,both,
191,Zimbabwe,Zimbabwe,Zimbabwe,Zimbabwe,ZWE,ZWE,ZWE,23203.0,Sub-Saharan Africa,Lower middle income,Republic of Zimbabwe,UN member,ZWE,both,
222,Liechtenstein,Liechtenstein,,,LIE,,,11603.0,Europe & Central Asia,,Principality of Liechtenstein,UN member,LIE,right_only,both
223,Monaco,Monaco,,,MCO,,,13503.0,Europe & Central Asia,,Principality of Monaco,UN member,MCO,right_only,both


In [116]:
df_combined[df_combined['Sovereignty']=='UN observer']

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
132,"Palestine, State of",West Bank and Gaza,Occupied Palestinian Territory,,PSE,,PSE,15603.0,Middle East & North Africa,Lower middle income,State of Palestine,UN observer,PSE,both,
245,,,,,,,,,,,The Holy See,UN observer,VAT,right_only,right_only


### dem4cli includes a total of 197 countries, 184 un member/observers + 13 others


In [120]:

len(df_combined[~pd.isna(df_combined['country_wcde']) & ~pd.isna(df_combined['iso3_frac']) & ((df_combined['Sovereignty']=='UN member')|(df_combined['Sovereignty']=='UN observer'))])

184

In [121]:
len(df_combined[~pd.isna(df_combined['country_wcde']) & ~pd.isna(df_combined['iso3_frac']) & ((df_combined['Sovereignty']!='UN member')&(df_combined['Sovereignty']!='UN observer'))])

13

In [122]:
184+13

197

In [123]:
df_combined[~pd.isna(df_combined['country_wcde']) & ~pd.isna(df_combined['iso3_frac']) & ((df_combined['Sovereignty']!='UN member')&(df_combined['Sovereignty']!='UN observer'))]

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
74,"Hong Kong, China (SAR)","Hong Kong SAR, China",Hong Kong Special Administrative Region of China,Hong Kong,HKG,HKG,HKG,8903.0,East Asia & Pacific,High income,Hong Kong Special Administrative Region of Chi...,China,HKG,both,
192,,,French Guiana,French Guiana,,GUF,GUF,,,,Guyane,France,GUF,both,
193,,,French Polynesia,French Polynesia,,PYF,PYF,,,,French Polynesia,France,PYF,both,
194,,,Guadeloupe,Guadeloupe,,GLP,GLP,,,,Guadeloupe,France,GLP,both,
195,,,Guam,Guam,,GUM,GUM,,,,Territory of Guam,United States,GUM,both,
196,,,Martinique,Martinique,,MTQ,MTQ,,,,Martinique,France,MTQ,both,
197,,,Mayotte,Mayotte,,MYT,MYT,,,,Department of Mayotte,France,MYT,both,
198,,,New Caledonia,New Caledonia,,NCL,NCL,,,,New Caledonia,France,NCL,both,
199,,,Puerto Rico,Puerto Rico,,PRI,PRI,,,,Commonwealth of Puerto Rico,United States,PRI,both,
200,,,Reunion,Réunion,,REU,REU,,,,Réunion,France,REU,both,


### ISIMIP isipedia-countries 195 countries includes Hong Kong and excludes Vatican

In [124]:
len(df_combined[ ~pd.isna(df_combined['country_iso3']) & ((df_combined['Sovereignty']=='UN member')|(df_combined['Sovereignty']=='UN observer'))])

194

In [125]:
len(df_combined[ ~pd.isna(df_combined['country_iso3'])])

195

In [126]:
df_combined[ ~pd.isna(df_combined['country_iso3']) & (~(df_combined['Sovereignty']=='UN member')& ~(df_combined['Sovereignty']=='UN observer'))]

Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
74,"Hong Kong, China (SAR)","Hong Kong SAR, China",Hong Kong Special Administrative Region of China,Hong Kong,HKG,HKG,HKG,8903.0,East Asia & Pacific,High income,Hong Kong Special Administrative Region of Chi...,China,HKG,both,


In [130]:
len(df_combined[(df_combined['Sovereignty']=='UN member')|(df_combined['Sovereignty']=='UN observer')])

195

In [134]:
df_combined[((df_combined['Sovereignty']=='UN member')|(df_combined['Sovereignty']=='UN observer'))&pd.isna(df_combined['country_iso3']) ]



Unnamed: 0,country,country_wb,country_wcde,country_mask,country_iso3,iso3_mask,iso3_frac,country_code,region,income_group,country_un,Sovereignty,iso3_un,merge_iso,merge_iso2
245,,,,,,,,,,,The Holy See,UN observer,VAT,right_only,right_only


## Part 2. How much population is covered?

More than > 99,9% of the population is covered by the dem4cli processsing. 

In 2000 missing population is 0,03%, in 2020 missing population is 0,05%. 

The missing population is in areas of non-overlap between WCDE data and fractional countrymasks used. 

In [2]:
da_pop_demographics = population_demographics_gridscale_global(startyear=2020,
                                                                    endyear=2021,
                                                                    isimip_round=3, 
                                                                    ssp=3)

opening isimip3 - ssp3
loading country masks
interpolating cohort sizes per country
calculating gridscale demographics


  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result =

In [8]:
da_population = load_population(isimip_round=3,ssp=3, startyear=2020,endyear=2021,).compute()

opening isimip3 - ssp3


In [9]:
(da_population.sel(time=2020) - da_pop_demographics.sel(time=2020).sum(dim=['ages'])).sum() / da_population.sel(time=2020).sum(dim=['lat','lon'])

# in 2020, 0,05% of population missing 

In [10]:
da_pop_demographics = population_demographics_gridscale_global(startyear=2000,
                                                                    endyear=2001,
                                                                    isimip_round=3, 
                                                                    ssp=3)

da_population = load_population(isimip_round=3,ssp=3, startyear=2000,endyear=2001,).compute()

opening isimip3 - ssp3
loading country masks
interpolating cohort sizes per country
calculating gridscale demographics


  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result = blockwise(
  result =

opening isimip3 - ssp3


In [11]:
(da_population.sel(time=2000) - da_pop_demographics.sel(time=2000).sum(dim=['ages'])).sum() / da_population.sel(time=2000).sum(dim=['lat','lon'])

# 0,03% in 2000