# Demystifying ICP Purchasing Power Parity calculations: the Global Linking procedure

### Author: Giovanni Tonutti

---
Contents
- [Overview](#Overview) 
- [Load required Python libraries](#Libraries)  
- [Load input data](#InputData) 
- [Global Linking factors](#BHPPP)  
- [Above-basic heading PPPs](#aBHPPP)  
- [Aggregation through CAR-method](#CAR)

---

## Overview <a class="anchor" id="Overview"></a>
This notebook provides the accompanying code for the World Bank blog "Demystifying ICP Purchasing Power Parity calculations: the Global Linking procedure". Its purpose is to lay out the calculation steps and showcase the implementation of the main formulas needed to estimate ICP purchasing power parities (PPPs) at the global level. The blog post is publicly available [here](https://blogs.worldbank.org/opendata/demystifying-icp-purchasing-power-parity-calculations-using-python). 

*Note*: Because the target audience may include users unfamiliar with Python or programming in general, we opted to show each calculation step as explicitly as possible, at the cost of having more modularized and computationally efficient code.

## Load required Python libraries <a class="anchor" id="Libraries"></a>
The code will require loading the following well-known Python libaries: `pandas`, `numpy` and `statsmodels`

In [1]:
## Load libraries 
import pandas as pd
import numpy as np 
import statsmodels.api as sm

## Load input data <a class="anchor" id="InputData"></a>

We start by loading the input datasets containing mock average price data, regional basic headings PPPs and other relevant country-level information. 

In [2]:
#Load price data
data="price_data.csv"
prices=pd.read_csv(data) 
prices # Show full dataset 

Unnamed: 0,country,bh,item,price,imp,region,ppp_reg
0,country1,garment,garment1,4500.380000,3.0,A,9.7435
1,country1,garment,garment2,11583.390000,3.0,A,9.7435
2,country1,garment,garment3,7000.940000,1.0,A,9.7435
3,country1,pork,pork1,2500.710000,1.0,A,13.8749
4,country1,pork,pork2,3561.450000,1.0,A,13.8749
...,...,...,...,...,...,...,...
72,country11,pork,pork1,7.235374,3.0,C,1.0000
73,country11,pork,pork2,8.947315,3.0,C,1.0000
74,country11,garment,garment1,307.252181,3.0,C,1.0000
75,country11,garment,garment2,65.921715,3.0,C,1.0000


In [3]:
#Load ppp data
datappp="ppp_reg.csv"
ppp_reg=pd.read_csv(datappp) 
ppp_reg # Show full dataset 

Unnamed: 0,country,bh,region,ppp_reg
0,country1,garment,A,9.7435
1,country1,pork,A,13.8749
2,country1,rice,A,14.0847
3,country1,total,A,12.5684
4,country2,garment,A,1.0
5,country2,pork,A,1.0
6,country2,rice,A,1.0
7,country2,total,A,1.0
8,country3,garment,A,20.3606
9,country3,pork,A,18.9851


This mock dataset contains 11 countries ('country'), each belonging to one of three different regions, and three basic headings ('bh'): garment; rice; and pork. ‘Basic headings’ in the ICP literature refer to detailed expenditure categories containing similar item varieties, for example the ‘Rice’ basic heading contains several rice varieties. It is also the lowest level of aggregation for which PPPs are first calculated. The different item varieties in each basic heading are noted under the ‘item’ column, for example, within ‘garment’ there are three item varieties, identified as ‘garment 1’, ‘garment 2’, and ‘garment 3’. Finally, an average price in the local currency unit of each country is reported for each item ('price') and information on the relative importance of each item in a country’s consumption at the basic heading level is included for each item priced in the importance column ('imp'). Following the guidelines provided by the [ICP Technical Advisory Group](https://www.worldbank.org/en/programs/icp#3), countries assign a weight of '3' to items identified as 'important' within a given basic heading and a weight of '1' to items deemed unimportant.

It should be highlighted that in practice the full [ICP classification](http://pubdocs.worldbank.org/en/708531575560035925/pdf/ICP-Classification-description-2019-1205.pdf) consists of 155 basic headings with the number of items within each varying from one basic heading to another. Also, not all countries are able to report prices for all items. These two realities are reflected in the example: some basic headings contain more items than others, and prices for some items are missing in some countries.


## Global linking factors <a class="anchor" id="BHPPP"></a>

Regions represent the first building block in the process of cross-country comparisons within the ICP framework. The Regional Implementing Agencies are responsible for the collection of price data and national accounts expenditures from countries and provide regional PPPs estimates. Each region designates one country within their region as numeraire and regional PPPs are calculated in relation to this regional numeraire.


Regional PPPs are then linked to the global numeraire via so-called ‘linking factors’. These are scalars estimated for each region via a regression method known as the ‘weighted region product dummy’ (RPD-W).
The RPD-W method is carried out within each basic heading by regressing the logarithm of the observed country item prices, converted into a common regional numéraire using the country's regional BH basic heading PPPs, on item dummies (one for each item) and region dummies (one for each region other than the region of the global numeraire). The RPD-W method also incorporates the country reported item-level importance indicators with the idea of ‘down-weighting’ less representative unrepresentative items during the calculation.

###  Select the base or numeraire currency 

The first step is to identify a global numeraire country. In our example, Country 11 acts as the global numeraire and region C as the reference region.

In [4]:
## Select the base or numeraire currency
numeraire = 'C' 
numeraire_c = 'country11' 

###  Prep the inputs to run the RPD-W

In [5]:
## Prep
## Drop country-item observations without a price
prices = prices[prices['price'].notnull()]

## Dataframe with country prices
d_region=pd.get_dummies(prices['region'])

## Prepare design matrix for CPD-W
d_region=pd.get_dummies(prices['region'])
d_region.drop(numeraire, axis=1, inplace=True) #drop numeraire
d_region = d_region.add_prefix('r_') #add prefix to countries
d_item=pd.get_dummies(prices['item'],drop_first=False) #include all item dummies
d_item = d_item.add_prefix('i_') #add prefix to items
prices=pd.concat([prices,d_region,d_item],axis=1) # Concatenate the new cols

## Create empty arrays to store results
l_coef= [] # to store exp(beta_hats)
l_bh= [] # to store bh labels

prices

Unnamed: 0,country,bh,item,price,imp,region,ppp_reg,r_A,r_B,i_garment1,i_garment2,i_garment3,i_pork1,i_pork2,i_rice1,i_rice2
0,country1,garment,garment1,4500.380000,3.0,A,9.7435,1,0,1,0,0,0,0,0,0
1,country1,garment,garment2,11583.390000,3.0,A,9.7435,1,0,0,1,0,0,0,0,0
2,country1,garment,garment3,7000.940000,1.0,A,9.7435,1,0,0,0,1,0,0,0,0
3,country1,pork,pork1,2500.710000,1.0,A,13.8749,1,0,0,0,0,1,0,0,0
4,country1,pork,pork2,3561.450000,1.0,A,13.8749,1,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,country11,pork,pork1,7.235374,3.0,C,1.0000,0,0,0,0,0,1,0,0,0
73,country11,pork,pork2,8.947315,3.0,C,1.0000,0,0,0,0,0,0,1,0,0
74,country11,garment,garment1,307.252181,3.0,C,1.0000,0,0,1,0,0,0,0,0,0
75,country11,garment,garment2,65.921715,3.0,C,1.0000,0,0,0,1,0,0,0,0,0


###  Run the RPD-W on each basic heading and store results

In [6]:
for bh in prices.bh.unique():
    tempdf=prices[prices.bh == bh] 
    X=tempdf.loc[:, [x for x in tempdf.columns if x.startswith(('r_', 'i_'))]]
    y = np.log(tempdf['price']/tempdf['ppp_reg']) 
    wts=tempdf['imp']

    wts_cpd=sm.WLS(y, X,weights=wts)
    res=wts_cpd.fit()
    res_eparams=np.exp(res.params)
    
    print("\n","Basic Heading:", bh, "\n")
    print('Exponentiated Parameters:',"\n",
          res_eparams)
    
    l_coef.append(res_eparams)
    l_bh.append(bh)

coef = np.array(l_coef, dtype=float)
coef = np.round(coef,4) # round to 4 decimals
cols = list(X)
 #store column heads of X as a list
coef[coef == 1] = np.nan #%% replace PPPs that were exp(0)=1 with 'np.nan'

#ppp=np.array(ppp_reg.ppp_reg, dtype=float)


 Basic Heading: garment 

Exponentiated Parameters: 
 r_A             7.371785
r_B             1.389717
i_garment1    160.202106
i_garment2     89.885375
i_garment3     64.471554
i_pork1         1.000000
i_pork2         1.000000
i_rice1         1.000000
i_rice2         1.000000
dtype: float64

 Basic Heading: pork 

Exponentiated Parameters: 
 r_A           24.600424
r_B            9.627597
i_garment1     1.000000
i_garment2     1.000000
i_garment3     1.000000
i_pork1        7.497127
i_pork2        9.864433
i_rice1        1.000000
i_rice2        1.000000
dtype: float64

 Basic Heading: rice 

Exponentiated Parameters: 
 r_A           37.316681
r_B            6.339249
i_garment1     1.000000
i_garment2     1.000000
i_garment3     1.000000
i_pork1        1.000000
i_pork2        1.000000
i_rice1        1.970095
i_rice2        1.870425
dtype: float64


The results above show the estimated coefficients from the RPD-W method for each of the three basic headings. Of particular interest are the estimated coefficients on the region dummies (denoted by the prefix 'r_') as they are the natural log of the estimated regions linking factors for the regional basic heading PPPs in question. Note that the estimated coefficients have already been exponentiated.

###  Gather and display the estimated LFs 

In [7]:
#Create dataframe of PPP results from numpy arrays
#dimension = "# BHs" x "# coef"
df_bhppp=pd.DataFrame(data = coef, index = l_bh, columns = cols)
r_numeraire=f"r_{numeraire}"
df_bhppp.insert(0, r_numeraire, 1.000) #insert column of 1s for numeraire

df_bhppp=df_bhppp.loc[:, [x for x in df_bhppp.columns if x.startswith(('r_'))]] #subsetting to store only country level PPPs
df_bhppp.columns = df_bhppp.columns.str.replace('^r_', '') 

df_bhppp['bh'] = df_bhppp.index

df_bhppp=df_bhppp.melt(id_vars="bh",var_name="region", value_name="lf")
df_bhppp



  df_bhppp.columns = df_bhppp.columns.str.replace('^r_', '')


Unnamed: 0,bh,region,lf
0,garment,C,1.0
1,pork,C,1.0
2,rice,C,1.0
3,garment,A,7.3718
4,pork,A,24.6004
5,rice,A,37.3167
6,garment,B,1.3897
7,pork,B,9.6276
8,rice,B,6.3392


## Above-basic heading PPPs <a class="anchor" id="aBHPPP"></a>

Next, regional PPPs are linked across regions through the estimated linking factirs at the basic heading level and successively aggregated using national accounts expenditure converted in regional PPPs for each country as weights.

The aggregation method involves constructing bilateral PPPs for each pair of countries, using basic heading-level national accounts expenditure values as weights from each country in turn. First, a Laspeyres-type bilateral PPP is calculated between each pair of countries and then a Paasche-type bilateral PPP. The geometric mean of the Laspeyres- and Paasche-type bilateral PPPs gives us the Fisher-type bilateral PPP between each pair of countries in the dataset. 

### Linking regional basic heading PPPs across regions

In [8]:
#Merging the PPPreg data with the estimated Linking Factors from the RPD-W procedure
ppp_regLF=pd.merge(ppp_reg, df_bhppp, how='inner', on=('bh', 'region'))
ppp_regLF['ppp_linked']=ppp_regLF['ppp_reg']*ppp_regLF['lf']
ppp_regLF = ppp_regLF.drop(['ppp_reg'], axis=1)
ppp_regLF=ppp_regLF.pivot(index="bh",
              columns="country",
              values="ppp_linked").reset_index()


ppp_regLF.set_index(ppp_regLF['bh'], drop=True, append=False, inplace=True)
ppp_regLF = ppp_regLF.drop(['bh'], axis=1)
#Column sorting function
def sorting(first_col, df):
    columns = df.columns.tolist()
    columns.remove(first_col)
    columns.insert(0,first_col)
    return df.reindex(columns, axis=1)

#Sort cols with numeraire as col1
ppp_regLF=sorting(numeraire_c,ppp_regLF)


###  Load and display basic heading expenditure values
As a second step in the aggregation process, we load the basic heading level national accounts expenditure values in local currency unit for each country.

In [9]:
#Load basic heading expenditure values
#Should contain bh and countries with prefix c
code="bhdata_exp.csv"
df_bh=pd.read_csv(code,index_col="icp_bh")
df_bh

Unnamed: 0_level_0,c_country11,c_country1,c_country10,c_country2,c_country3,c_country4,c_country5,c_country6,c_country7,c_country8,c_country9
icp_bh,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
bhppp_rice,1274367000.0,12055360000.0,668376900.0,2120000000000.0,19456580000.0,6900940414,397692000000.0,2262702119,2026660000000.0,312671000000.0,8132679639
bhppp_pork,33605210000.0,81607990000.0,2201039000.0,19714540000.0,27687870000.0,7189988899,804271000000.0,10979789273,19417580000.0,108973000000.0,9742060884
bhppp_garment,292686000000.0,710876100.0,25905630000.0,1270000000000.0,100000000000.0,1446676800,1309840000000.0,56690242002,1620590000000.0,67778000000.0,26480272339


In [10]:
#sort rows alphabetically 
df_bh=df_bh.sort_values('icp_bh')

print("\n","Basic Heading Expenditure Values in Local Currency Units","\n")
print(df_bh, "\n")


 Basic Heading Expenditure Values in Local Currency Units 

                c_country11    c_country1   c_country10    c_country2  \
icp_bh                                                                  
bhppp_garment  2.926860e+11  7.108761e+08  2.590563e+10  1.270000e+12   
bhppp_pork     3.360521e+10  8.160799e+10  2.201039e+09  1.971454e+10   
bhppp_rice     1.274367e+09  1.205536e+10  6.683769e+08  2.120000e+12   

                 c_country3  c_country4    c_country5   c_country6  \
icp_bh                                                               
bhppp_garment  1.000000e+11  1446676800  1.309840e+12  56690242002   
bhppp_pork     2.768787e+10  7189988899  8.042710e+11  10979789273   
bhppp_rice     1.945658e+10  6900940414  3.976920e+11   2262702119   

                 c_country7    c_country8   c_country9  
icp_bh                                                  
bhppp_garment  1.620590e+12  6.777800e+10  26480272339  
bhppp_pork     1.941758e+10  1.089730e+11   9742060

###  Check the basic heading PPP and basic heading expenditure matrices
Before proceeding, it is important to check that both the basic heading PPP and basic heading expenditure matrices have the same dimensions. It is also important to check that the matrix of basic heading PPPs is complete. If the dimensions of the two matrices do not match or the basic heading PPP matrix is incomplete then aggregation at higher aggregate levels is not possible using the formulas employed by the ICP. 

In [11]:
df_bh.columns = df_bh.columns.str.replace('^c_', '') 

print("Dimensions of Matrices (No. of headings x No. of countries):","\n")
print("BH Purchasing Power Parities (PPPs)  = ",ppp_regLF.shape)
print("BH Nominal Expenditures in LCUs      = ", df_bh.shape)

Dimensions of Matrices (No. of headings x No. of countries): 

BH Purchasing Power Parities (PPPs)  =  (3, 11)
BH Nominal Expenditures in LCUs      =  (3, 11)


  df_bh.columns = df_bh.columns.str.replace('^c_', '')


###  Calculate bilateral PPPs (Laspeyres-, Paasche-, and Fisher-type)

Calculate the Laspeyres-type bilateral PPPs

In [12]:
#Calculate Laspeyres bilateral PPPs 
shape = (len(df_bh.columns),len(df_bh.columns))
lp = np.zeros(shape) #square matrix: country x country
nrow= len(lp)  # gets the number of rows
ncol = len(lp[0]) #get the number of cols

for row in range(nrow):
    for col in range(ncol):
        #weighted means by looping over df rows
        lp[row][col]= np.average((ppp_regLF.iloc[:,row]/ppp_regLF.iloc[:,col]),weights=df_bh.iloc[:,col])

lp_ppp = lp
lp_ppp=pd.DataFrame(data = lp_ppp, index = df_bh.columns, columns = df_bh.columns)
lp_ppp = round(lp_ppp,3)

Square ('country x country') matrix of Laspeyres-type (bilateral) PPPs

In [13]:
print("\n", "Laspeyres-type bilateral PPPs:","\n")
print(lp_ppp, "\n")


 Laspeyres-type bilateral PPPs: 

           country11  country1  country10  country2  country3  country4  \
country11      1.000     0.003      0.801     0.067     0.005     0.516   
country1     101.241     1.000     75.117    12.467     0.640   172.687   
country10      1.300     0.006      1.000     0.084     0.007     0.833   
country2       9.256     0.072      7.045     1.000     0.056    12.639   
country3     183.552     1.294    135.813    14.229     1.000   185.295   
country4       0.865     0.006      0.647     0.078     0.005     1.000   
country5       2.227     0.014      1.707     0.192     0.013     2.221   
country6       2.254     0.026      1.495     0.178     0.012     3.283   
country7       6.614     0.070      4.726     0.719     0.040    10.801   
country8       4.001     0.042      2.824     0.397     0.023     6.137   
country9     101.034     0.250     81.388     6.603     0.528    46.189   

           country5  country6  country7  country8  country9  
co

Derive the Paasche-type bilateral PPPs by taking the reciprocal of the transpose of the Laspeyres-type bilateral PPP 

In [14]:
#Calculate Paasche bilateral PPPs 
pa = np.transpose(np.reciprocal(lp))
pa_ppp=pd.DataFrame(data = pa, index = df_bh.columns, columns = df_bh.columns)
pa_ppp = round(pa_ppp,3)

Square ('country x country') matrix of Paasche-type (bilateral) PPPs

In [15]:
print("\n", "Paasche-type bilateral PPPs:","\n")
print(pa_ppp, "\n")


 Paasche-type bilateral PPPs: 

           country11  country1  country10  country2  country3  country4  \
country11      1.000     0.010      0.769     0.108     0.005     1.156   
country1     347.062     1.000    171.262    13.857     0.773   156.306   
country10      1.249     0.013      1.000     0.142     0.007     1.547   
country2      14.832     0.080     11.966     1.000     0.070    12.889   
country3     189.831     1.563    147.923    17.899     1.000   203.452   
country4       1.939     0.006      1.200     0.079     0.005     1.000   
country5       2.727     0.017      1.968     0.212     0.012     2.423   
country6       1.654     0.020      1.344     0.204     0.010     2.189   
country7       8.465     0.058      6.900     0.694     0.045     8.548   
country8       9.289     0.033      6.719     0.436     0.032     5.762   
country9      94.780     0.410     65.200     5.334     0.342    65.326   

           country5  country6  country7  country8  country9  
coun

Derive the Fisher-type bilateral PPPs by taking the geometric mean of the Laspeyres-type 
and Paasche-type bilateral PPPs for the aggregate

In [16]:
#Create geomean function
def nangmean(arr, axis=None):
    arr = np.asarray(arr)
    inverse_valids = 1. / np.sum(~np.isnan(arr), axis=axis)  # could be a problem for all-nan-axis
    rhs = inverse_valids * np.nansum(np.log(arr), axis=axis)
    return np.exp(rhs)

#Calculate Fisher bilateral PPPs 
fi = np.zeros(shape)
nrow=len(fi)
ncol=len(fi[0])

for row in range(nrow):
    for col in range(ncol):
        fi[row][col]= nangmean([lp[row][col],pa[row][col]])

fi_ppp=pd.DataFrame(data = fi, index = df_bh.columns, columns = df_bh.columns)
fi_ppp = round(fi_ppp,3)

Square ('country x country') matrix of Paasche-type (bilateral) PPPs

In [17]:
print("Fisher-type bilateral PPPs:","\n")
print(fi_ppp, "\n")

Fisher-type bilateral PPPs: 

           country11  country1  country10  country2  country3  country4  \
country11      1.000     0.005      0.785     0.085     0.005     0.772   
country1     187.448     1.000    113.422    13.143     0.703   164.292   
country10      1.274     0.009      1.000     0.109     0.007     1.135   
country2      11.717     0.076      9.181     1.000     0.063    12.763   
country3     186.665     1.422    141.739    15.959     1.000   194.162   
country4       1.295     0.006      0.881     0.078     0.005     1.000   
country5       2.464     0.015      1.833     0.202     0.012     2.320   
country6       1.931     0.023      1.418     0.191     0.011     2.681   
country7       7.482     0.064      5.711     0.707     0.042     9.609   
country8       6.096     0.037      4.356     0.416     0.028     5.947   
country9      97.857     0.320     72.846     5.935     0.425    54.930   

           country5  country6  country7  country8  country9  
country

###  Calculate GEKS PPPs

As a next step, the Gini-Éltető-Köves-Szulc (GEKS) method is applied to the matrix of Fisher-type bilateral PPPs. GEKS PPPs are calculated between each country relative to the numeraire or base country. To this end, the first step is to divide each country row of the Fisher-type bilateral PPP matrix by the row of the numeraire country. Each row will then contain two direct PPPs (each country to itself and directly to the numeraire country) and n−2 indirect PPPs (each country to the numeraire country via each of the other third countries), where n equals the total number of countries in the dataset. Finally, the GEKS PPP for each country relative to the numeraire is given by the geometric mean of the direct and indirect PPPs in each respective country row. 

GEKS PPPs are considered 'multilateral' because the GEKS procedure uses both direct and indirect PPPs and thus takes into account the relative prices between all the countries as a group. The GEKS method is needed to make the Fisher-type bilateral PPPs transitive and base country-invariant. Transitivity means that the PPP between any two countries should be the same whether it is computed directly or indirectly through a third country. Base country-invariant means that the PPPs between any two countries should be the same regardless of the choice of base or numeraire country.

In [18]:
#Calculate GEKS multilateral ppps 
##requires the earlier nangmean function 
geks = np.zeros(shape)  # zero 'country x country' matrix
nrow=len(geks)  # gets the number of rows
ncol=len(geks[0])

for row in range(nrow):
    for col in range(ncol):
        geks[row][col]= nangmean(fi[row]/fi[col])     

geks_vec = np.zeros(shape=(1,len(df_bh.columns))) # as we need a vector of ppps, not a matrix
j=len(geks_vec[0])
for col in range(j):#..one PPP per country, or col of bhexp df
    geks_vec[:,col]=nangmean(geks[col,0]/geks[0,0]) #geomean over each row, w/ each col rebased to country in col1    

geks_ppp = np.array(geks_vec)

In [19]:
geks_ppp = pd.DataFrame(geks_ppp)
geks_ppp.columns = df_bh.columns
geks_ppp = round(geks_ppp,3)

print("\n","GEKS Multilateral PPPs:","\n")
print(geks_ppp.to_string(index=False), "\n")


 GEKS Multilateral PPPs: 

 country11  country1  country10  country2  country3  country4  country5  country6  country7  country8  country9
       1.0   160.003      1.362    12.711   203.339     1.068     2.483     2.463     8.911     5.759    76.097 



## Aggregation throguh CAR-method <a class="anchor" id="CAR"></a>

The final step in the process of global PPPs estimation consists in the Country Aggregation with Redistribution (CAR) procedure. This step is undertaken to guarantee the principle of fixity. Fixity implies that the relative volumes in the global comparisons between any pair of countries belonging to a given region should be identical to the relative volumes of the two countries established in the regional comparisons to which they belong. 

In order to adhere to this principle, regional volume totals in the global comparison are obtained by summing the GEKS-adjusted volumes for individual countries in each region. These volume totals are then divided using the countries’ shares in regional comparison. Finally, PPPs in world numéraire for each country are derived indirectly by dividing countries' nominal expenditures by the volume-share adjusted expenditures. 

In [20]:
#reshaping the global GEKS data frame 
geks_ppp=geks_ppp.melt(var_name="country", value_name="geks")
geks_ppp

Unnamed: 0,country,geks
0,country11,1.0
1,country1,160.003
2,country10,1.362
3,country2,12.711
4,country3,203.339
5,country4,1.068
6,country5,2.483
7,country6,2.463
8,country7,8.911
9,country8,5.759


In [21]:
#Preparing a dataframe with aggregate regional PPPs and total expenditures 

volshare_df = ppp_reg[ppp_reg['bh'] == 'total']

df_bhtotal=df_bh.sum().to_frame().T 
df_bhtotal=df_bhtotal.melt(var_name="country", value_name="total_exp")

volshare_df=pd.merge(volshare_df, df_bhtotal, how='inner', on='country')
volshare_df['exp_ppp']=volshare_df['total_exp']/volshare_df['ppp_reg']
volshare_df['reg_total']= volshare_df['exp_ppp'].groupby(volshare_df['region']).transform('sum')
volshare_df['volshare']=volshare_df['exp_ppp']/volshare_df['reg_total']
volshare_df

Unnamed: 0,country,bh,region,ppp_reg,total_exp,exp_ppp,reg_total,volshare
0,country1,total,A,12.5684,94374220000.0,7508849000.0,3622872000000.0,0.002073
1,country2,total,A,1.0,3409715000000.0,3409715000000.0,3622872000000.0,0.941164
2,country3,total,A,16.4047,147144400000.0,8969652000.0,3622872000000.0,0.002476
3,country4,total,A,0.079,15537610000.0,196678600000.0,3622872000000.0,0.054288
4,country5,total,B,0.671468,2511803000000.0,3740763000000.0,5202107000000.0,0.719086
5,country6,total,B,1.0,69932730000.0,69932730000.0,5202107000000.0,0.013443
6,country7,total,B,3.147503,3666668000000.0,1164945000000.0,5202107000000.0,0.223937
7,country8,total,B,2.161128,489422000000.0,226466000000.0,5202107000000.0,0.043534
8,country9,total,C,96.160274,44355010000.0,461261300.0,342803700000.0,0.001346
9,country10,total,C,1.947303,28775050000.0,14776870000.0,342803700000.0,0.043106


In [22]:
#Merging the expenditure and global geks dataframe
car_df=pd.merge(volshare_df, geks_ppp, how='inner', on=('country'))
#Converting the total exp using the global geks
car_df['exp_gek']=car_df['total_exp']/car_df['geks']
#Calculating the total regional expenditure in geks adjusted units
car_df['exp_gek_reg']=car_df['exp_gek'].groupby(volshare_df['region']).transform('sum')
#Applying the regional volume share to the total expenditure and re-basingit on the numeraire PPP
car_df['exp_adj']=car_df['exp_gek_reg']*car_df['volshare']
car_df['PPPglobal']=car_df['total_exp']/car_df['exp_adj']
car_df['PPPglobal_num']=car_df['PPPglobal']/0.981471
car_df.set_index(car_df['country'], drop=True, append=False, inplace=True)

print("\n","Global linked PPPs:","\n")
print(car_df.PPPglobal_num, "\n")


 Global linked PPPs: 

country
country1     163.292996
country2      12.992346
country3     213.135531
country4       1.026395
country5       2.316368
country6       3.449708
country7      10.857965
country8       7.455260
country9      96.160254
country10      1.947303
country11      1.000000
Name: PPPglobal_num, dtype: float64 



In the above example we showcased the main steps to calculate PPPs.  Information about the overall ICP methodology is provided on the [ICP website](https://www.worldbank.org/en/programs/icp/brief/methodology-calculation). 