Replicate Excel VLOOKUP, HLOOKUP, XLOOKUP in Python (DAY 30!!)<br>
https://pythoninoffice.com/replicate-excel-vlookup-hlookup-xlookup-in-python/

VLOOKUP implementation in Python in three simple steps
https://towardsdatascience.com/vlookup-implementation-in-python-in-three-simple-steps-93b5a290fd72

How to Do a vLookup in Python using pandas
https://www.geeksforgeeks.org/how-to-do-a-vlookup-in-python-using-pandas/

Country codes
https://laendercode.net/en/3-letter-list.html

In [None]:
def xlookup(lookup_value, lookup_array, return_array, if_not_found:str = ''):
    match_value = return_array.loc[lookup_array == lookup_value]
    if match_value.empty:
        return f'"{lookup_value}" not found!' if if_not_found == '' else if_not_found

    else:
        return match_value.tolist()[0]

Formula complete, now “drag down”<br>
Well, since we are doing everything in code and there’s no GUI, we can’t just simply double click on something to “drag down” the formula. But essentially the “drag down” is the looping part – we just need to apply the xlookup function to every single row of the table df1. And remember, we should never loop through a dataframe using the for loop.

apply() method instead of for loop<br>
It turns out that pandas provides a method to do exactly this, and its name is .apply()! Let’s look at its syntax. Below is a simplified list of arguments, if you prefer to see the full list of arguments, check out the official pandas documentation on apply.

dataframe.apply(func, axis = 0, args=())

 - func: the function we are applying
 - axis: we can apply the function both rows or columns. By default it’s = 0, which is rows. axis=1 means columns
 - args=(): this is a tuple that contain the positional arguments we want to pass into the func
 
Here’s how we can apply the xlookup function on the entire column of a dataframe.

In [None]:
df1['purchase'] = df1['User Name'].apply(xlookup, args = (df2['Customer'], df2['purchase']))

### Three easy steps to implement it in Python
Herein, I describe the implementation of the same function used in Excel above, but in Python in three simple steps:
First, we create a dataframe called df_target where we want to have the desired subset data. Initially, the dataframe comprises the names of the nine desired countries. And then we create empty columns for the indicators - CO2 emissions (tonnes) and Population, that we want to return from df.

In [25]:
df_target = pd.DataFrame({"countries" : ["Bhutan", "Germany", "Japan", "Nepal", "Netherlands", "South Africa", "United States", "Russian Federation", "China"]})
df_target

Unnamed: 0,countries
0,Bhutan
1,Germany
2,Japan
3,Nepal
4,Netherlands
5,South Africa
6,United States
7,Russian Federation
8,China


#### Step 1: Create empty columns for the desired indicators

In [26]:
df_target["CO2 Emissions (tonnes)"] = ""
df_target["Population"] = ""
df_target          

Unnamed: 0,countries,CO2 Emissions (tonnes),Population
0,Bhutan,,
1,Germany,,
2,Japan,,
3,Nepal,,
4,Netherlands,,
5,South Africa,,
6,United States,,
7,Russian Federation,,
8,China,,


#### Step 2: Set the column common with df in df_target as index

In [27]:
df_target.set_index("countries", inplace = True)
df_target

Unnamed: 0_level_0,CO2 Emissions (tonnes),Population
countries,Unnamed: 1_level_1,Unnamed: 2_level_1
Bhutan,,
Germany,,
Japan,,
Nepal,,
Netherlands,,
South Africa,,
United States,,
Russian Federation,,
China,,


#### Step 3: Mapping
This is the main step: we map the index in df_target against df to get the data for required columns as output. For example, the values in kt CO2 column of df multiplied by 1000 is returned for the CO2 emissions (tonnes) column of df_target. The map() function maps the value of Series according to input correspondence and is used for substituting each value in a Series with another value, that could be extracted from a function, a dictionary or a Series.

In [29]:
df_target["CO2 Emissions (tonnes)"] = df_target.index.map(df["kt CO2"]) * 1000
df_target["Population"] = df_target.index.map(df["Population"])
df_target["t CO2/capita"] = df_target["CO2 Emissions (tonnes)"] / df_target["Population"]
df_target

Unnamed: 0_level_0,CO2 Emissions (tonnes),Population,t CO2/capita
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bhutan,1380000,754396,1.829278
Germany,709540000,82905782,8.55839
Japan,1106150000,126529100,8.742258
Nepal,12030000,28095712,0.428179
Netherlands,151170000,17231624,8.772824
South Africa,433250000,57792520,7.496645
United States,4981300000,326838199,15.240875
Russian Federation,1607550000,144477859,11.126618
China,10313460000,1402760000,7.352263
