# Fathom Consulting Trade Analysis Task

#### Excel Sheet:
- The input data is export values (USD thousands) by country and by year across a range of sectors.
    - Note: For the purposes of this task "world" values are assumed to be the sum of all countries listed in the data.
- The output data is the Revealed Comparative Advantage (RCA) of a country's exports in each sector and year.
    - The RCA metric is a measure of a country’s relative specialism of a given export.

#### Task:
- Replicate this methodology in one of the following coding languages: Python, R, SQL (I decided to use Python).
    - The script should read in the input data in the same structure as provided and write out a csv with the output.
- Analyse the data (both input and output) and pick out two insights that you feel are interesting.
    - This could involve further calculations if you think necessary. (Please add these calculations to your script if this is the case)
    - These insights should be presented in a PowerPoint file, with one slide for each insight using visualizations to help

## Exploratory Data Analysis for the Input

Going through the input data to better understand it using Pandas.

### Basic EDA

In [336]:
# imports

import pandas as pd

In [337]:
# read in the data

data = pd.read_excel("Trade analysis task.xlsx", sheet_name='Input - Trade values')

In [338]:
# check the data

data.head()

Unnamed: 0,Country,period,Arms and Ammunition,Building materials,Charges for the use of intellectual property n.i.e,Chemicals and minerals,Clothing and accessories,Construction,Financial services,Food and drink,...,Metals,Other,Other business services,"Personal, cultural, and recreational services",Plastics and rubber,"Telecommunications, computer, and information services",Transport,Transport equip,Travel,TOTAL
0,Afghanistan,2005,,,,,,,,,...,,,,,,,,,,0.0
1,Albania,2005,504.0,23914.0,1000.0,37395.0,394057.0,3000.0,16000.0,53893.0,...,101756.0,15403.0,57000.0,18000.0,4022.0,74000.0,126000.0,2182.0,854000.0,1823093.0
2,Algeria,2005,,21989.0,,45574415.0,11401.0,167000.0,48000.0,70490.0,...,267077.0,3471.0,637000.0,4000.0,23669.0,96000.0,851000.0,12136.0,184000.0,48508722.0
3,Andorra,2005,,,,,,,,,...,,,,,,,,,,0.0
4,Angola,2005,40.0,1252.0,49000.0,18561.0,1125.0,,,42598.0,...,1050511.0,3581.0,2000.0,5000.0,190.0,14000.0,18000.0,2008.0,88000.0,1331948.0


In [339]:
# find the number of rows and columns

data.shape

(2354, 23)

In [340]:
# number of countries in the data

data['Country'].nunique()

214

In [341]:
# what all years are there in the data?

data['period'].nunique()

11

In [342]:
# is there any country thats not in one of the years?
# it should return 11 if true, as each country appears 11 times for all 11 years

data['Country'].value_counts().value_counts()


count
11    214
Name: count, dtype: int64

In [343]:
# find which countries have the most missing values

data['missing_values'] = data.isnull().sum(axis=1)

data.groupby('Country')['missing_values'].sum().sort_values(ascending=False)

Country
Cayman Islands                220
Curacao                       220
Uzbekistan                    220
Congo, Democratic Republic    205
Tajikistan                    190
                             ... 
Lithuania                       0
Denmark                         0
Tapei, Chinese                  0
Thailand                        0
Latvia                          0
Name: missing_values, Length: 214, dtype: int64

<mark>Countries with value 220 means all the data is missing *(20 different exports and 11 years, 20 x 11 = 220)*</mark>

In [344]:
# lets remove this 'missing_values' column we just added

data.drop('missing_values', axis=1, inplace=True)

### Finding the top five countries with the highest values for each export

In [345]:
# countries with highest 'Arms and Ammunition' export

data.groupby('Country')['Arms and Ammunition'].sum().sort_values(ascending=False).head()

Country
United States of America    36850020.0
Italy                        6253871.0
Germany                      4549828.0
Norway                       4062342.0
France                       3379198.0
Name: Arms and Ammunition, dtype: float64

In [346]:
# countries with highest 'Building materials' export

data.groupby('Country')['Building materials'].sum().sort_values(ascending=False).head()

Country
China                       626618311.0
Germany                     572069807.0
United States of America    502246000.0
Canada                      340688134.0
Djibouti                    276298012.0
Name: Building materials, dtype: float64

In [347]:
# countries with highest 'Charges for the use of intellectual property n.i.e' export

data.groupby('Country')['Charges for the use of intellectual property n.i.e'].sum().sort_values(ascending=False).head()

Country
United States of America    1.046610e+09
Japan                       3.013450e+08
Netherlands                 2.299490e+08
Switzerland                 1.762160e+08
United Kingdom              1.733682e+08
Name: Charges for the use of intellectual property n.i.e, dtype: float64

In [348]:
# countries with highest 'Chemicals and minerals' export

data.groupby('Country')['Chemicals and minerals'].sum().sort_values(ascending=False).head()

Country
Russian Federation          3.140169e+09
Saudi Arabia                2.781857e+09
United States of America    2.641245e+09
Germany                     2.065715e+09
Belgium                     1.540946e+09
Name: Chemicals and minerals, dtype: float64

In [349]:
# countries with highest 'Clothing and accessories' export

data.groupby('Country')['Clothing and accessories'].sum().sort_values(ascending=False).head()

Country
China       3.054780e+09
Italy       6.279256e+08
Djibouti    4.844045e+08
Germany     4.392089e+08
India       3.643407e+08
Name: Clothing and accessories, dtype: float64

In [350]:
# countries with highest 'Construction' export

data.groupby('Country')['Construction'].sum().sort_values(ascending=False).head()

Country
South Korea    148778600.0
Japan          117559000.0
Germany        115957000.0
China          114650000.0
France          45216000.0
Name: Construction, dtype: float64

In [351]:
# countries with highest 'Financial services' export

data.groupby('Country')['Financial services'].sum().sort_values(ascending=False).head()

Country
United States of America    970071000.0
United Kingdom              727633280.0
Luxembourg                  483760000.0
Germany                     260521000.0
Switzerland                 255948000.0
Name: Financial services, dtype: float64

In [352]:
# countries with highest 'Food and drink' export

data.groupby('Country')['Food and drink'].sum().sort_values(ascending=False).head()

Country
United States of America    1.280731e+09
Netherlands                 8.923862e+08
Germany                     7.956509e+08
France                      7.197702e+08
Brazil                      6.831843e+08
Name: Food and drink, dtype: float64

In [353]:
# countries with highest 'Government goods and services n.i.e' export

data.groupby('Country')['Government goods and services n.i.e'].sum().sort_values(ascending=False).head()

Country
United States of America    219807000.0
Germany                      60455000.0
United Kingdom               42630635.0
Japan                        31270000.0
Netherlands                  23070000.0
Name: Government goods and services n.i.e, dtype: float64

In [354]:
# top five countries with highest 'Insurance and pension services' export

data.groupby('Country')['Insurance and pension services'].sum().sort_values(ascending=False).head()

Country
United Kingdom              221124205.0
United States of America    149508000.0
Ireland                     120571000.0
Germany                      91434000.0
Switzerland                  63872000.0
Name: Insurance and pension services, dtype: float64

In [355]:
# top five countries with highest 'Machinery' export

data.groupby('Country')['Machinery'].sum().sort_values(ascending=False).head()

Country
China                       7.655152e+09
Germany                     4.005751e+09
United States of America    3.823271e+09
Japan                       2.829440e+09
South Korea                 1.774268e+09
Name: Machinery, dtype: float64

In [356]:
# top five countries with highest 'Metals' export

data.groupby('Country')['Metals'].sum().sort_values(ascending=False).head()

Country
China                       1.665804e+09
Germany                     1.323814e+09
United States of America    1.322341e+09
Japan                       7.960982e+08
United Kingdom              6.765773e+08
Name: Metals, dtype: float64

In [357]:
# top five countries with highest 'Other' export

data.groupby('Country')['Other'].sum().sort_values(ascending=False).head()

Country
United States of America    2.229587e+09
China                       1.746298e+09
Germany                     1.601250e+09
Djibouti                    9.745987e+08
Netherlands                 8.869224e+08
Name: Other, dtype: float64

In [358]:
# top five countries with highest 'Other business services' export

data.groupby('Country')['Other business services'].sum().sort_values(ascending=False).head()

Country
United States of America    1.247912e+09
United Kingdom              9.741582e+08
Germany                     8.438850e+08
France                      7.353770e+08
China                       7.245070e+08
Name: Other business services, dtype: float64

In [359]:
# top five countries with highest 'Personal, cultural, and recreational services' export

data.groupby('Country')['Personal, cultural, and recreational services'].sum().sort_values(ascending=False).head()

Country
United States of America    200824000.0
United Kingdom               48854908.0
France                       30889000.0
Canada                       25790000.0
Luxembourg                   21534000.0
Name: Personal, cultural, and recreational services, dtype: float64

In [360]:
# top five countries with highest 'Plastics and rubber' export

data.groupby('Country')['Plastics and rubber'].sum().sort_values(ascending=False).head()

Country
Germany                     763409156.0
United States of America    714641557.0
China                       621702957.0
Japan                       394657709.0
Belgium                     368972458.0
Name: Plastics and rubber, dtype: float64

In [361]:
# top five countries with highest 'Telecommunications, computer, and information services' export

data.groupby('Country')['Telecommunications, computer, and information services'].sum().sort_values(ascending=False).head()

Country
India                       438910000.0
Ireland                     355150000.0
United States of America    307801000.0
Germany                     235246000.0
United Kingdom              139079424.0
Name: Telecommunications, computer, and information services, dtype: float64

In [362]:
# top five countries with highest 'Transport' export

data.groupby('Country')['Transport'].sum().sort_values(ascending=False).head()

Country
United States of America    849928000.0
Germany                     592769000.0
France                      451507000.0
Japan                       435316000.0
Denmark                     422666000.0
Name: Transport, dtype: float64

In [363]:
# top five countries with highest 'Transport equip' export

data.groupby('Country')['Transport equip'].sum().sort_values(ascending=False).head()

Country
Germany                     2.873224e+09
Japan                       1.828012e+09
United States of America    1.563069e+09
France                      1.087883e+09
South Korea                 1.037240e+09
Name: Transport equip, dtype: float64

In [364]:
# top five countries with highest 'Travel' export

data.groupby('Country')['Travel'].sum().sort_values(ascending=False).head()

Country
United States of America    1.494555e+09
Spain                       6.417183e+08
France                      6.018450e+08
China                       4.659790e+08
Italy                       4.546850e+08
Name: Travel, dtype: float64

### Top five countries with the highest value for exports (TOTAL)

In [365]:
# top five countries with highest 'TOTAL' export

data.groupby('Country')['TOTAL'].sum().sort_values(ascending=False).head()

Country
United States of America    2.096525e+10
China                       1.979091e+10
Germany                     1.717110e+10
Japan                       9.210458e+09
United Kingdom              8.243027e+09
Name: TOTAL, dtype: float64

### Average export value for each export

In [366]:
# best export categories with the highest average export value for all of the countries

data.drop(["period", "TOTAL"], axis = 1).groupby('Country').mean().mean().sort_values(ascending=False)

Chemicals and minerals                                    1.932150e+07
Machinery                                                 1.764181e+07
Metals                                                    7.738373e+06
Transport equip                                           7.567866e+06
Other                                                     7.151514e+06
Food and drink                                            5.982532e+06
Other business services                                   5.576436e+06
Travel                                                    5.190841e+06
Transport                                                 4.295660e+06
Clothing and accessories                                  4.187526e+06
Plastics and rubber                                       3.152705e+06
Building materials                                        2.536557e+06
Financial services                                        2.181019e+06
Charges for the use of intellectual property n.i.e        2.130481e+06
Teleco

### Best performing years for all countries

In [367]:
# which years had the highest export values?

data.drop(["Country", "TOTAL"], axis = 1).groupby('period').sum().sum(axis=1).sort_values(ascending=False)

period
2014    2.420023e+10
2013    2.369681e+10
2012    2.295461e+10
2011    2.269758e+10
2015    2.140012e+10
2008    2.023700e+10
2010    1.918031e+10
2007    1.756501e+10
2009    1.605744e+10
2006    1.511179e+10
2005    1.299551e+10
dtype: float64

## Script to generate output

### RCA Formula:

Country ***A*** is said to have a revealed comparative advantage in a given product ***i*** when its ratio of exports of product i to its total exports of all goods (products) exceeds the same ratio for the world as a whole.

That is,

<img src="RCA.svg" width="250"/>

- ***P*** is the set of all products (with i∈P)
- ***XAi*** is the country A's exports of product i
- ***Xwi*** is the worlds's exports of product i
- ***Σj∈PXAj*** is the country A's total exports (of all products j in P)
- ***Σj∈PXwj*** is the world's total exports (of all products j in P)

When a country has a revealed comparative advantage for a given product (RCA >1), it is inferred to be a competitive producer and exporter of that product relative to a country producing and exporting that good at or below the world average. A country with a revealed comparative advantage in product i is considered to have an export strength in that product. The higher the value of a country’s RCA for product i, the higher its export strength in product i.


In [368]:
# imports

import math

In [369]:
# create a copy of the data to store our output
output = data.copy()

# drop the columns we don't need, i.e. 'TOTAL'
output.drop(["TOTAL"], axis = 1, inplace = True)

# define 'export_columns' with the names of the 20 export categories
export_columns = data.drop(["Country", "period", "TOTAL"], axis = 1).columns

# calculate the total export value for the world by year
world_total = data.groupby("period")["TOTAL"].sum()

# calculate the RCA for each country and export per year
for idx, row in data.iterrows():
        
        # store the country and year
        country = row['Country']
        year = row['period']

        # initialize the RCA list
        rca = []

        for column in export_columns:

                # defining the variables in the RCA formula
                X_Ai = pd.to_numeric(row[column])
                X_Aj = pd.to_numeric(row["TOTAL"])
                X_Wi = pd.to_numeric(data[column].loc[data["period"] == year]).sum()
                X_Wj = pd.to_numeric(data["TOTAL"].loc[data["period"] == year]).sum()

                # if the denominator is not zero, calculate the RCA
                if X_Aj != 0 and X_Wj != 0:
                        calc = (X_Ai / X_Aj) / (X_Wi / X_Wj)
                        
                        # if the RCA is NaN, set it to 0, as thats whats in the excel output file
                        if math.isnan(calc):
                                calc = 0.0
                        
                        # round the RCA to 2 decimal places, again, as in the excel output file
                        calc = round(calc, 2)
                        calc = f"{calc:.2f}"
                
                # if the denominator for the world calulation is zero, set the RCA to None
                elif X_Aj == 0:
                        calc = None
                
                rca.append(calc)

        # store the RCA values in the output dataframe
        output.loc[idx, export_columns] = rca

# export the output to an excel file
output.to_excel("Output.xlsx", index=False)        

  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca
  output.loc[idx, export_columns] = rca


In [370]:
# check if the generated output is the same as the expected output

expected_output = pd.read_excel("Trade analysis task.xlsx", sheet_name='Output - RCA')

output.equals(expected_output)

False

In [371]:
# check if the shapes for both the dataframes are the same

output.shape == expected_output.shape

True

In [372]:
# check where the generated output is different from the expected output

output.compare(expected_output)

Unnamed: 0_level_0,Arms and Ammunition,Arms and Ammunition,Building materials,Building materials,Charges for the use of intellectual property n.i.e,Charges for the use of intellectual property n.i.e,Chemicals and minerals,Chemicals and minerals,Clothing and accessories,Clothing and accessories,...,Plastics and rubber,Plastics and rubber,"Telecommunications, computer, and information services","Telecommunications, computer, and information services",Transport,Transport,Transport equip,Transport equip,Travel,Travel
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,...,self,other,self,other,self,other,self,other,self,other
1,0.69,0.691057,0.40,0.399797,0.05,0.054296,0.11,0.114696,4.35,4.351126,...,0.07,0.065060,4.77,4.771366,1.62,1.624175,0.01,0.013494,8.97,8.973401
2,0.00,0.000000,0.01,0.013816,0.00,0.000000,5.25,5.253451,0.00,0.004731,...,0.01,0.014389,0.23,0.232633,0.41,0.412269,0.00,0.002821,0.07,0.072662
4,0.08,0.075070,0.03,0.028649,3.64,3.641518,0.08,0.077922,0.02,0.017003,...,0.00,0.004207,1.24,1.235551,0.32,0.317582,0.02,0.016997,1.27,1.265620
5,0.00,0.000000,0.22,0.216991,0.00,0.000000,0.20,0.197046,0.12,0.121143,...,0.03,0.027994,1.05,1.052686,0.42,0.420901,0.08,0.083496,14.75,14.753176
6,0.00,0.000000,0.12,0.118329,0.00,0.000000,0.82,0.820729,0.08,0.079570,...,0.01,0.013160,1.61,1.614039,3.43,3.428421,0.27,0.271504,10.16,10.159461
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2349,0.00,0.001242,0.05,0.049941,0.00,0.000000,5.53,5.530508,0.02,0.022372,...,0.03,0.028536,0.11,0.105293,0.45,0.450353,0.00,0.004908,0.26,0.264892
2350,0.03,0.030553,1.09,1.090106,0.00,0.001225,0.31,0.306707,5.15,5.154977,...,0.89,0.887040,0.12,0.119323,0.36,0.363141,0.21,0.213207,0.78,0.783817
2351,0.01,0.014952,0.13,0.132340,0.06,0.063632,0.07,0.072547,0.20,0.201180,...,0.05,0.054840,2.64,2.644211,1.76,1.759710,0.98,0.984794,5.96,5.958841
2352,3.41,3.413364,0.39,0.387707,0.00,0.000000,0.33,0.329821,0.20,0.201670,...,0.07,0.073529,0.18,0.175799,0.13,0.130889,0.06,0.058048,1.45,1.446024


In [373]:
# check where the generated output is different from the expected output

output.compare(expected_output)

Unnamed: 0_level_0,Arms and Ammunition,Arms and Ammunition,Building materials,Building materials,Charges for the use of intellectual property n.i.e,Charges for the use of intellectual property n.i.e,Chemicals and minerals,Chemicals and minerals,Clothing and accessories,Clothing and accessories,...,Plastics and rubber,Plastics and rubber,"Telecommunications, computer, and information services","Telecommunications, computer, and information services",Transport,Transport,Transport equip,Transport equip,Travel,Travel
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,...,self,other,self,other,self,other,self,other,self,other
1,0.69,0.691057,0.40,0.399797,0.05,0.054296,0.11,0.114696,4.35,4.351126,...,0.07,0.065060,4.77,4.771366,1.62,1.624175,0.01,0.013494,8.97,8.973401
2,0.00,0.000000,0.01,0.013816,0.00,0.000000,5.25,5.253451,0.00,0.004731,...,0.01,0.014389,0.23,0.232633,0.41,0.412269,0.00,0.002821,0.07,0.072662
4,0.08,0.075070,0.03,0.028649,3.64,3.641518,0.08,0.077922,0.02,0.017003,...,0.00,0.004207,1.24,1.235551,0.32,0.317582,0.02,0.016997,1.27,1.265620
5,0.00,0.000000,0.22,0.216991,0.00,0.000000,0.20,0.197046,0.12,0.121143,...,0.03,0.027994,1.05,1.052686,0.42,0.420901,0.08,0.083496,14.75,14.753176
6,0.00,0.000000,0.12,0.118329,0.00,0.000000,0.82,0.820729,0.08,0.079570,...,0.01,0.013160,1.61,1.614039,3.43,3.428421,0.27,0.271504,10.16,10.159461
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2349,0.00,0.001242,0.05,0.049941,0.00,0.000000,5.53,5.530508,0.02,0.022372,...,0.03,0.028536,0.11,0.105293,0.45,0.450353,0.00,0.004908,0.26,0.264892
2350,0.03,0.030553,1.09,1.090106,0.00,0.001225,0.31,0.306707,5.15,5.154977,...,0.89,0.887040,0.12,0.119323,0.36,0.363141,0.21,0.213207,0.78,0.783817
2351,0.01,0.014952,0.13,0.132340,0.06,0.063632,0.07,0.072547,0.20,0.201180,...,0.05,0.054840,2.64,2.644211,1.76,1.759710,0.98,0.984794,5.96,5.958841
2352,3.41,3.413364,0.39,0.387707,0.00,0.000000,0.33,0.329821,0.20,0.201670,...,0.07,0.073529,0.18,0.175799,0.13,0.130889,0.06,0.058048,1.45,1.446024


<mark> In the above table, 'self' refers to our generated output and 'other' is the output sheet in our excel file</mark>

<mark> When viewing the excel sheet using the excel app, we see that all the values are rounded to two decimal points, but the above table suggests its not actually rounded </mark>

<mark> Lets run our script again but this time not round it to two decimal points </mark>


In [374]:
# create a copy of the data to store our output
output = data.copy()

# drop the columns we don't need, i.e. 'TOTAL'
output.drop(["TOTAL"], axis = 1, inplace = True)

# define 'export_columns' with the names of the 20 export categories
export_columns = data.drop(["Country", "period", "TOTAL"], axis = 1).columns

# calculate the total export value for the world by year
world_total = data.groupby("period")["TOTAL"].sum()

# calculate the RCA for each country and export per year
for idx, row in data.iterrows():
        
        # store the country and year
        country = row['Country']
        year = row['period']

        # initialize the RCA list
        rca = []

        for column in export_columns:

                # defining the variables in the RCA formula
                X_Ai = pd.to_numeric(row[column])
                X_Aj = pd.to_numeric(row["TOTAL"])
                X_Wi = pd.to_numeric(data[column].loc[data["period"] == year]).sum()
                X_Wj = pd.to_numeric(data["TOTAL"].loc[data["period"] == year]).sum()

                # if the denominator is not zero, calculate the RCA
                if X_Aj != 0 and X_Wj != 0:
                        calc = (X_Ai / X_Aj) / (X_Wi / X_Wj)
                        
                        # if the RCA is NaN, set it to 0, as thats whats in the excel output file
                        if math.isnan(calc):
                                calc = 0.0
                
                # if the denominator for the world calulation is zero, set the RCA to None
                elif X_Aj == 0:
                        calc = None
                
                rca.append(calc)

        # store the RCA values in the output dataframe
        output.loc[idx, export_columns] = rca    

In [375]:
# check if the generated output is the same as the expected output

expected_output = pd.read_excel("Trade analysis task.xlsx", sheet_name='Output - RCA')

output.equals(expected_output)

False

In [376]:
# check where the generated output is different from the expected output

output.compare(expected_output)

Unnamed: 0_level_0,Arms and Ammunition,Arms and Ammunition,Building materials,Building materials,Charges for the use of intellectual property n.i.e,Charges for the use of intellectual property n.i.e,Chemicals and minerals,Chemicals and minerals,Clothing and accessories,Clothing and accessories,...,Plastics and rubber,Plastics and rubber,"Telecommunications, computer, and information services","Telecommunications, computer, and information services",Transport,Transport,Transport equip,Transport equip,Travel,Travel
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,...,self,other,self,other,self,other,self,other,self,other
14,,,,,,,,,,,...,,,,,,,,,,
206,,,,,,,1.171359,1.171359,,,...,,,,,,,,,,
215,3.628112,3.628112,0.429869,0.429869,0.046025,0.046025,0.141322,0.141322,4.186567,4.186567,...,0.085394,0.085394,2.654982,2.654982,1.729986,1.729986,0.010697,0.010697,9.063114,9.063114
216,,,0.010117,0.010117,0.003693,0.003693,4.946556,4.946556,0.007696,0.007696,...,0.036667,0.036667,0.195267,0.195267,0.354246,0.354246,0.002103,0.002103,0.079040,0.079040
217,3.080766,3.080766,1.844256,1.844256,,,0.180017,0.180017,1.568095,1.568095,...,0.168748,0.168748,,,,,1.652356,1.652356,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2349,0.001242,0.001242,,,,,5.530508,5.530508,,,...,0.028536,0.028536,,,,,0.004908,0.004908,,
2350,0.030553,0.030553,,,,,0.306707,0.306707,,,...,0.887040,0.887040,,,,,0.213207,0.213207,,
2351,0.014952,0.014952,,,,,0.072547,0.072547,,,...,0.054840,0.054840,,,,,0.984794,0.984794,,
2352,3.413364,3.413364,,,,,0.329821,0.329821,,,...,0.073529,0.073529,,,,,0.058048,0.058048,,


<mark> Now less values are the not the same between the two outputs </mark>

<mark> But its still showing that most values are different, even though they look the same </mark>

<mark> This could be becuase excel and python have small differences in the precisions for the value </mark>

## Exploratory Data Analysis for the Ouput

Going through the output data to better understand it using Pandas.