# Finding scope 3 emissions of a Pharma company.
- Scope 3 category is downstream transportation and distribution
- Data taken from data.world
- Distance-based methology will be used for calculations
    - According to GHG "If sub-contractor fuel data cannot be easily obtained in order to use the fuel-based method, then the distance-based method should be used."

# Data points required for calculations of scope 3
1. Total distance in KM 
    - Coordinates of places
2. Size of packages
3. Mode of Transport
4. Emission Factors
5. Calculation formulaes
    1. Emissions from road transport: = ∑ (mass of goods purchased (tonnes) × distance travelled in transport leg × emission factor of transport mode or vehicle type (kg CO2e/tonne-(km or mile))
    2. emissions from air transport: = ∑ (quantity of goods purchased (tonnes) x distance travelled in transport leg x emission factor of transport mode or vehicle type (kg CO2e/tonne-(km or mile)))
    3. emissions from sea transport: = ∑ (quantity of goods purchased (tonnes) x distance travelled in transport leg x emission factor of transport mode or vehicle type (kg CO2e/tonne-(km or mile)))

## Data columns
1. Country - Destination location
2. Mode - Mode of transport (Air, Ocean, Road)
3. Weight - Kilogram of weight
4. Delivery Date - Date of delivery
5. Manufacturing Site - Source location

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import geopandas as gpd

# For point datatype of locations
from shapely.geometry import Point
# For geolocation
from geopy.geocoders import ArcGIS
# Importing the great_circle module for calculation of distance for air transport
from geopy.distance import great_circle
# For calculation of distance for road transport
import openrouteservice
# For calculation of distance for ocean transport
import searoute as sr

# For calculation of source location airport/port
from math import radians, sin, cos, sqrt, atan2, isnan

# Calculating the time
import time

# For converting country name to iso_2 codes
import country_converter as coco

In [16]:
print(pd.__version__)

1.5.3


In [2]:
# Collecting data
raw_df = pd.read_csv("./Data/SCMS_Delivery_History_Dataset_20150929.csv")

In [3]:
raw_df.head(5)

Unnamed: 0,ID,Project Code,PQ #,PO / SO #,ASN/DN #,Country,Managed By,Fulfill Via,Vendor INCO Term,Shipment Mode,...,Unit of Measure (Per Pack),Line Item Quantity,Line Item Value,Pack Price,Unit Price,Manufacturing Site,First Line Designation,Weight (Kilograms),Freight Cost (USD),Line Item Insurance (USD)
0,1,100-CI-T01,Pre-PQ Process,SCMS-4,ASN-8,Côte d'Ivoire,PMO - US,Direct Drop,EXW,Air,...,30,19,551.0,29.0,0.97,Ranbaxy Fine Chemicals LTD,Yes,13,780.34,
1,3,108-VN-T01,Pre-PQ Process,SCMS-13,ASN-85,Vietnam,PMO - US,Direct Drop,EXW,Air,...,240,1000,6200.0,6.2,0.03,"Aurobindo Unit III, India",Yes,358,4521.5,
2,4,100-CI-T01,Pre-PQ Process,SCMS-20,ASN-14,Côte d'Ivoire,PMO - US,Direct Drop,FCA,Air,...,100,500,40000.0,80.0,0.8,ABBVIE GmbH & Co.KG Wiesbaden,Yes,171,1653.78,
3,15,108-VN-T01,Pre-PQ Process,SCMS-78,ASN-50,Vietnam,PMO - US,Direct Drop,EXW,Air,...,60,31920,127360.8,3.99,0.07,"Ranbaxy, Paonta Shahib, India",Yes,1855,16007.06,
4,16,108-VN-T01,Pre-PQ Process,SCMS-81,ASN-55,Vietnam,PMO - US,Direct Drop,EXW,Air,...,60,38000,121600.0,3.2,0.05,"Aurobindo Unit III, India",Yes,7590,45450.08,


In [15]:
raw_df['Delivery Recorded Date'].unique()

array(['2-Jun-06', '14-Nov-06', '27-Aug-06', ..., '29-Mar-13', '7-Dec-13',
       '8-May-15'], dtype=object)

In [5]:
# Filtering out required columns from the main data.

filtered_df = raw_df[['ID','Country','Shipment Mode','Manufacturing Site','Weight (Kilograms)','Delivery Recorded Date','Item Description']].copy()
filtered_df.rename(columns={"Shipment Mode":"Mode","Weight (Kilograms)":"Weight","Delivery Recorded Date":"Delivery_Date",'Manufacturing Site':'Manufacturing_site','Country':'Destination_Country'},inplace=True)

In [6]:
filtered_df

Unnamed: 0,ID,Destination_Country,Mode,Manufacturing_site,Weight,Delivery_Date,Item Description
0,1,Côte d'Ivoire,Air,Ranbaxy Fine Chemicals LTD,13,2-Jun-06,"HIV, Reveal G3 Rapid HIV-1 Antibody Test, 30 T..."
1,3,Vietnam,Air,"Aurobindo Unit III, India",358,14-Nov-06,"Nevirapine 10mg/ml, oral suspension, Bottle, 2..."
2,4,Côte d'Ivoire,Air,ABBVIE GmbH & Co.KG Wiesbaden,171,27-Aug-06,"HIV 1/2, Determine Complete HIV Kit, 100 Tests"
3,15,Vietnam,Air,"Ranbaxy, Paonta Shahib, India",1855,1-Sep-06,"Lamivudine 150mg, tablets, 60 Tabs"
4,16,Vietnam,Air,"Aurobindo Unit III, India",7590,11-Aug-06,"Stavudine 30mg, capsules, 60 Caps"
...,...,...,...,...,...,...,...
10319,86818,Zimbabwe,Truck,"Mylan, H-12 & H-13, India",See DN-4307 (ID#:83920),20-Jul-15,"Lamivudine/Nevirapine/Zidovudine 30/50/60mg, d..."
10320,86819,Côte d'Ivoire,Truck,Hetero Unit III Hyderabad IN,See DN-4313 (ID#:83921),7-Aug-15,"Lamivudine/Zidovudine 150/300mg, tablets, 60 Tabs"
10321,86821,Zambia,Truck,Cipla Ltd A-42 MIDC Mahar. IN,Weight Captured Separately,3-Sep-15,Efavirenz/Lamivudine/Tenofovir Disoproxil Fuma...
10322,86822,Zimbabwe,Truck,Mylan (formerly Matrix) Nashik,1392,11-Aug-15,"Lamivudine/Zidovudine 150/300mg, tablets, 60 Tabs"


In [11]:
# Converting Delivery Date to datetime datatype
filtered_df.loc[:,'Delivery_Date'] = pd.to_datetime(filtered_df['Delivery_Date'],format="%YYY-%M-%D")

  filtered_df.loc[:,'Delivery_Date'] = pd.to_datetime(filtered_df['Delivery_Date'],format="%Y-%M-%D")


In [12]:
filtered_df['Delivery_Date']

0       2006-06-02
1       2006-11-14
2       2006-08-27
3       2006-09-01
4       2006-08-11
           ...    
10319   2015-07-20
10320   2015-08-07
10321   2015-09-03
10322   2015-08-11
10323   2015-08-11
Name: Delivery_Date, Length: 10324, dtype: datetime64[ns]

# Exploratory Data Analysis

In [6]:
# Checking total number of data points for each year
filtered_df['Delivery_Date'].groupby(by=filtered_df['Delivery_Date'].dt.year).size()

Delivery_Date
2006      65
2007     670
2008    1109
2009    1192
2010    1176
2011    1049
2012    1250
2013    1205
2014    1599
2015    1009
Name: Delivery_Date, dtype: int64

In [7]:
# Check for null values
filtered_df.isna().sum()

ID                       0
Destination_Country      0
Mode                   360
Manufacturing_site       0
Weight                   0
Delivery_Date            0
Item Description         0
dtype: int64

Null values found in the mode of transport. 

In [8]:
# Checking Mode column's null values

# filtered_df[filtered_df['Mode'].isna()]

# Counting the number of null values per year
filtered_df[filtered_df['Mode'].isna()].groupby(by=filtered_df['Delivery_Date'].dt.year).size()

Delivery_Date
2006      2
2007    264
2008     94
dtype: int64

Since mode of transport is not available for these years (2006,2007,2008) I will not consider these years for our base year.
Even though year 2006 has only 2 NaN rows for Mode, the number of data points are only 65. Thus not enough to consider as base year as compared to other years.

In [9]:
# Checking Weight column values
filtered_df[filtered_df['Weight'].str.isnumeric() == False]

Unnamed: 0,ID,Destination_Country,Mode,Manufacturing_site,Weight,Delivery_Date,Item Description
8,46,Nigeria,Air,"Aurobindo Unit III, India",See ASN-93 (ID#:1281),2006-12-07,"Stavudine 30mg, capsules, 60 Caps"
12,62,Nigeria,Air,"EY Laboratories, USA",Weight Captured Separately,2007-01-10,"HIV 1/2, InstantChek HIV 1+2 Kit, 100 Tests"
15,68,Zimbabwe,Air,"BMS Meymac, France",Weight Captured Separately,2007-03-19,"#102198**Didanosine 200mg [Videx], tablets, 60..."
16,69,Nigeria,,ABBVIE GmbH & Co.KG Wiesbaden,Weight Captured Separately,2007-05-07,"HIV 1/2, Determine Complete HIV Kit, 100 Tests"
31,262,South Africa,,GSK Mississauga (Canada),Weight Captured Separately,2008-01-29,"Zidovudine 10mg/ml [Retrovir], oral solution, ..."
...,...,...,...,...,...,...,...
10318,86817,Zimbabwe,Truck,"Cipla, Goa, India",See DN-4307 (ID#:83920),2015-07-20,"Lamivudine/Nevirapine/Zidovudine 30/50/60mg, d..."
10319,86818,Zimbabwe,Truck,"Mylan, H-12 & H-13, India",See DN-4307 (ID#:83920),2015-07-20,"Lamivudine/Nevirapine/Zidovudine 30/50/60mg, d..."
10320,86819,Côte d'Ivoire,Truck,Hetero Unit III Hyderabad IN,See DN-4313 (ID#:83921),2015-08-07,"Lamivudine/Zidovudine 150/300mg, tablets, 60 Tabs"
10321,86821,Zambia,Truck,Cipla Ltd A-42 MIDC Mahar. IN,Weight Captured Separately,2015-09-03,Efavirenz/Lamivudine/Tenofovir Disoproxil Fuma...


As you can see above the weight column has non-numeric values 'Weight Captured Separately', thus the values for these rows are not available. While values like 'See ASN-93 (ID#:1281)' can be found by mapping to that particular ID and getting the weight

In [10]:
# Counting the number of 'Weight Captured Separately' values in weight columns per year.

filtered_df[(filtered_df['Weight'].str.isnumeric() == False) & (filtered_df['Weight'] == 'Weight Captured Separately')].groupby(by=filtered_df['Delivery_Date'].dt.year).size()

Delivery_Date
2006      6
2007     28
2008    233
2009    262
2010     67
2011     56
2012     33
2013     80
2014    386
2015    356
dtype: int64

Seeing that 2012 has least number of 'Weight Captured Separately' values and it has good number of data points i.e. 1250. We can consider it as our base year.

# Data Cleaning

## Fixing Weight column

In [8]:
# Filtering out 'Weight Captured Separately' rows from Final dataset
final_df = filtered_df[(filtered_df['Delivery_Date'].dt.year == 2012) & (filtered_df['Weight'] != 'Weight Captured Separately')].copy()
final_df.reset_index(inplace = True, drop = True)

In [12]:
# Function to resolve "See DN-2947 (ID#:83642)" given in weight column and adding the weight

def mapWeights(weight):
    try:
        if weight.isnumeric() == False:
            ID = weight[-6:-1]
            weight_returned = filtered_df[filtered_df['ID'] == int(ID)]['Weight'].iloc[0]
            if weight_returned == 'Weight Captured Separately':
                return None
#             print(f'{ID} -- {weight_returned}')
            return float(weight_returned)
        return float(weight)
    except Exception as e:
        print(f'Error == {e} \n {weight[-6:-1]} --- {filtered_df[filtered_df["ID"] == int(weight[-6:-1])]["Weight"].iloc[0]}')

In [13]:
weights = final_df['Weight'].apply(mapWeights)

In [14]:
# Adding weights list to the df
final_df.loc[:,'Weight'] = weights

  final_df.loc[:,'Weight'] = weights


In [15]:
# Removing None value rows for weight

final_df = final_df[final_df['Weight'].isna() == False]

In [16]:
# Converting Weights from KG to Tonnes
# 1 kilogram = 0.001 tonne

final_df['Weight'] = final_df['Weight'] * 0.001

## Fixing Mode column

In [17]:
final_df['Mode'].unique()

array(['Air', 'Ocean', 'Truck', 'Air Charter'], dtype=object)

As you can see we have air charter as the 4th Mode of transport. We can combine that with Air transport and convert truck to Road

In [18]:
final_df.loc[final_df['Mode'] == 'Air Charter','Mode'] = 'Air'
final_df.loc[final_df['Mode'] == 'Truck','Mode'] = 'Road'

## Getting Location coordinates of Source Location and Destination Location

## To-Do
1. Source 
- If the mode of transport is 
        1. Air and Ocean, then we get the Airport/Port coordinates of the nearest Airport/Port to the manufacturing site.
        2. Road, then we get the coordinates of the manufacturing site.
2. Destination
- If the mode of transport is 
        1. Air and Road, then we get the nearest Airport coordinates to the Capital City of the Country given.
        2. Ocean, then we get the coordinates of the port nearest to the capital city of the country given.

In [19]:
# Read airports data
raw_airport_df = pd.read_csv('Data/airports.csv')
# Read ports data
raw_ports_df = gpd.read_file('Data/attributed_ports.geojson')

In [20]:
# Data cleaning for airports and ports data
df = raw_airport_df[['type','latitude_deg','longitude_deg','name','iso_country','municipality']]
final_airport_df = df[df['type'].str.lower().str.contains('airport')].copy()
final_airport_df['continent'] = coco.convert(final_airport_df['iso_country'],src='ISO2',to = 'continent', not_found=None)

final_ports_df = raw_ports_df[['Country','Name','geometry']].copy()
final_ports_df['latitude_deg'] = final_ports_df.geometry.apply(lambda p: p.y)
final_ports_df['longitude_deg'] = final_ports_df.geometry.apply(lambda p: p.x)
final_ports_df.drop(['geometry'], axis=1,inplace=True)
final_ports_df['iso_country'] = coco.convert(final_ports_df['Country'],to='ISO2', not_found=None)
final_ports_df['continent'] = coco.convert(final_ports_df['iso_country'],src='ISO2',to = 'continent', not_found=None)

In [21]:
final_ports_df.head(2)

Unnamed: 0,Country,Name,latitude_deg,longitude_deg,iso_country,continent
0,United Arab Emirates,Abu Dhabi,24.466667,54.366667,AE,Asia
1,United Arab Emirates,Ar Ruways,24.116667,52.733333,AE,Asia


In [22]:
final_airport_df.head(2)

Unnamed: 0,type,latitude_deg,longitude_deg,name,iso_country,municipality,continent
1,small_airport,38.704022,-101.473911,Aero B Ranch Airport,US,Leoti,America
2,small_airport,59.947733,-151.692524,Lowell Field,US,Anchor Point,America


### Finding source location coordinates

Getting all manufacturing_site and mode of transport from the main df

In [23]:
source_df = final_df[['Manufacturing_site','Mode']].copy()

In [24]:
len(source_df)

1202

Dropping duplicates

In [25]:
source_df.drop_duplicates(inplace=True)

In [26]:
# Initialising geolocation object
nom = ArcGIS()
cc = coco.CountryConverter()
# Creating geopandas DF to append Source location
data = {'Manufacturing_site':[],'source':[],'departure':[],'source_mode':[],'departure_mode':[],'src_num_transport':[],'src_country':[]}
source_locations =  gpd.GeoDataFrame(data)

In [27]:
# Functions to get the distance in KM
def distance(lat1, lon1, lat2, lon2):
    # Calculate the distance between two coordinates using the haversine formula
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.    

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

In [28]:
len(final_airport_df)

44441

In [29]:
"""
Function to find the minimum distance.
"""
def minDistance(transport_df,location,iso2_country,continent):
    min_distance = float('inf')
    min_latitude = float('inf')
    min_longitude = float('inf')
    
    """
    Getting all airports/ports lying inside the country. If no airports/ports are found inside the country then, 
    we check for all the airport/ports in the continent.
    """
    transport_df_country = transport_df[transport_df['iso_country'] == iso2_country]
    if len(transport_df_country) == 0:
        transport_df_country = transport_df[transport_df['continent'] == continent]
        print('Looking at continent now', len(transport_df_country))

    for index,row in transport_df_country.iterrows():
        dist = distance(location.latitude, location.longitude, row['latitude_deg'],row['longitude_deg'])
        if dist <= min_distance:
            min_distance = dist
            min_latitude = row['latitude_deg']
            min_longitude = row['longitude_deg']

    return min_latitude,min_longitude

In [30]:
# Function to get the coordinates of the location
# Now that we have location of the manufacturing site coordinates. We need to get the Airport/Port location from which the item was transported
def getSourceLocation(data):
    place = data['Manufacturing_site']
    departure_mode = data['Mode']
    try:
        number_of_transport = 1
        location = nom.geocode(place)
        min_latitude = location.latitude
        min_longitude = location.longitude
        country = nom.reverse(str(location.latitude) + ',' + str(location.longitude)).address.split(',')[-1].replace(" ", "")
        source_mode = 'Road'
        
        # Getting iso_2 code for the country
        iso2_country = cc.convert(country, to='ISO2',not_found=None)
        continent = cc.convert(iso2_country,src='ISO2',to = 'continent', not_found=None)
        
        # Now that we have location of the manufacturing site. We need to get the Airport/Port location from which the item was transported
        if departure_mode == 'Air':
            number_of_transport = number_of_transport + 1
            min_latitude,min_longitude= minDistance(final_airport_df,location,iso2_country,continent)
        elif departure_mode == 'Ocean':
            number_of_transport = number_of_transport + 1
            min_latitude,min_longitude = minDistance(final_ports_df,location,iso2_country,continent)
            
        source_locations.loc[len(source_locations.index)] = [place,Point(location.latitude,location.longitude),Point(min_latitude,min_longitude),source_mode,departure_mode,number_of_transport,country]
        
    except Exception as e:
        print(e, place)
        source_locations.loc[len(source_locations.index)] = [place,None,None,None,departure_mode,number_of_transport,None]

In [31]:
source_locations.head(1)

Unnamed: 0,Manufacturing_site,source,departure,source_mode,departure_mode,src_num_transport,src_country


In [None]:
# Adding source_latitude and source_longitude columns
# store starting time
begin = time.time()
source_df.apply(getSourceLocation,axis=1)
end = time.time()
# total time taken
print(f"Total runtime of the program is {end - begin}")

The code took around 74 seconds to complete the computations

Merging the source_location_df to the final_df

In [33]:
final_df = final_df.merge(source_locations, how='outer', left_on=['Manufacturing_site','Mode'],right_on=['Manufacturing_site','departure_mode'])

Checking for any NONE or INF values in source and departure columns

In [34]:
indexes = final_df.loc[(final_df['source'].apply(lambda p: p == None or p.x == float('inf') or p.y == float('inf'))) | (final_df['departure'].apply(lambda p: p == None or p.x == float('inf') or p.y == float('inf')))].index

Since coordinates of these manufacturing sites were not found, I will drop these.

In [35]:
final_df.drop(index=indexes,axis=1,inplace=True)

### Getting Destination location coordinates

## TO-DO:
Destination
- If the mode of transport is 
        1. Air and Road, 
            1. We first search for the Airport coordinates of the Capital City of the Country given.
            2. If no airport is found in the capital then we get the coordinates of the capital and find the nearest airport.
        2. Ocean, then we get the coordinates of the main port of the country.

In [36]:
# Getting ISO_2 codes for each country
final_df['destination_iso_country'] = coco.convert(final_df['Destination_Country'],to='ISO2', not_found=None)

Uploading country capital list and cleaning the data

In [37]:
country_capital = pd.read_csv('Data/country-capital-list.csv')
country_capital.drop(columns=['type'],axis=1,inplace=True)
country_capital['iso_country'] = coco.convert(country_capital['country'],to='ISO2', not_found=None)

Abkhazia not found in regex
Akrotiri and Dhekelia not found in regex
Ascension Island not found in regex
Easter Island not found in regex
Nagorno-Karabakh Republic not found in regex
Scotland not found in regex
South Ossetia not found in regex
Transnistria not found in regex
Tristan da Cunha not found in regex
Wales not found in regex


Merging the main_df and country capital df to get the required capital cities.

In [38]:
destination_locations = country_capital.merge(final_df[['Mode','destination_iso_country']],right_on='destination_iso_country',left_on='iso_country').drop(columns=['destination_iso_country'])

Dropping duplicate rows

In [39]:
destination_locations.drop_duplicates(inplace=True)

Adding continents name to destination_locations df 

In [40]:
destination_locations['continent'] = coco.convert(destination_locations['country'],to = 'continent', not_found=None)

In [41]:
# DF to append Destination location
data = {'iso_country':[],'arrival':[],'destination':[],'arrival_mode':[],'destination_mode':[],'des_num_transport':[]}
destination_result = gpd.GeoDataFrame(data)

In [42]:
'''
Function to look up the airport and port df to find coordinates of destination location
'''
def getDestinationCoordinates(df):
    iso_country = df['iso_country']
    arrival_mode = df['Mode']
    destination_mode = 'Road'
    try:   
        capital = df['capital']
        continent = df['continent']
        location = nom.geocode(capital)
        min_latitude = location.latitude
        min_longitude = location.longitude
        number_of_transport = 1
        
        if arrival_mode == 'Air':
            # Check if airport is available for this capital
            number_of_transport = number_of_transport + 1
            min_latitude,min_longitude = minDistance(final_airport_df,location,iso_country,continent)    
        elif arrival_mode == 'Ocean':
            number_of_transport = number_of_transport + 1
            min_latitude,min_longitude = minDistance(final_ports_df,location,iso_country,continent)
        
        destination_result.loc[len(destination_result.index)] = [iso_country,Point(min_latitude,min_longitude),Point(location.latitude,location.longitude),arrival_mode,destination_mode,number_of_transport]
        
    except Exception as e:
        print(e, '==', iso_country, arrival_mode)
        destination_result.loc[len(destination_result.index)] = [iso_country,Point(min_latitude,min_longitude),Point(location.latitude,location.longitude),arrival_mode,destination_mode,number_of_transport]

In [None]:
# store starting time
begin = time.time()
destination_locations.apply(getDestinationCoordinates,axis=1)
end = time.time()
# total time taken
print(f"Total runtime of the program is {end - begin}")

Check if any inf or none values in arrival and destination columns

In [44]:
destination_result.loc[(destination_result['arrival'].apply(lambda p: p == None or p.x == float('inf') or p.y == float('inf'))) | (destination_result['destination'].apply(lambda p: p == None or p.x == float('inf') or p.y == float('inf')))]

Unnamed: 0,iso_country,arrival,destination,arrival_mode,destination_mode,des_num_transport


Since no inf or None values, we can merge the destination result with the final_df

In [45]:
final_df = final_df.merge(destination_result,left_on=['destination_iso_country','Mode'],right_on=['iso_country','arrival_mode'])

Finding out the total number of transports used.

In [46]:
final_df['total_num_of_transports'] = final_df['src_num_transport'] + final_df['des_num_transport'] - 1

In [47]:
final_df.drop(columns=['arrival_mode','departure_mode','iso_country','src_num_transport','des_num_transport'],inplace=True)

In [48]:
# Rearranging the columns in the DF
final_df = final_df[['ID','Item Description','Weight','Delivery_Date','Destination_Country','destination_iso_country','Manufacturing_site','Mode','source', 'departure',
       'source_mode', 'arrival', 'destination',
       'destination_mode', 'total_num_of_transports']]

In [49]:
final_df.head(1)

Unnamed: 0,ID,Item Description,Weight,Delivery_Date,Destination_Country,destination_iso_country,Manufacturing_site,Mode,source,departure,source_mode,arrival,destination,destination_mode,total_num_of_transports
0,12973,"Lamivudine/Nevirapine/Stavudine 30/50/6mg, dis...",0.021,2012-06-12,Haiti,HT,"Cipla, Goa, India",Air,POINT (15.359210000000076 73.93967000000004),POINT (15.3808 73.831398),Road,POINT (18.58 -72.292503),POINT (18.543490000000077 -72.33880999999997),Road,3


## Finding the total distance between source to departure, departure to arrival and arrival to destination.

In [56]:
"""
Function to find the distance between coordinates for each transportation mode 'Road', 'Ocean', 'Air'
"""
def calculateDistance(data):
    source_loc = data['source']
    departure_loc = data['departure']
    arrival_loc = data['arrival']
    destination_loc = data['destination']
    source_mode = data['source_mode']
    destination_mode = data['destination_mode']
    main_mode = data['Mode']
    num_of_transports = data['total_num_of_transports']
    try:
        # Finding road distance
        ## This service is free.
        API_KEY = '5b3ce3597851110001cf6248de01212ad45649ddbb4031ce99837efa'
        client = openrouteservice.Client(key=API_KEY)
#         print(num_of_transports)
        
        if num_of_transports == 1:
            data['dist_source_to_departure'] = 0
            data['dist_arrival_to_destination'] = 0
            try:
#                 print(num_of_transports)
                # Complete road travel
                coords = ((source_loc.y,source_loc.x), (destination_loc.y,destination_loc.x))
                res = client.directions(coords)
                distance = res['routes'][0]['summary']['distance']/1000
                data['dist_departure_to_arrival'] = distance/1.60934
#                 print('Transportation from source to destination in km', distance)
            except Exception as e:
                print(e, data['source'])
                data['dist_departure_to_arrival'] = None
                
        else:
            # Transportation from source to departure
            if source_mode == 'Road':
                try:
                    coords = ((source_loc.y,source_loc.x),(departure_loc.y,departure_loc.x))
                    res = client.directions(coords)
                    distance = res['routes'][0]['summary']['distance']/1000
#                     print('Transportation from source to departure distance in km', distance)
                    data['dist_source_to_departure'] = distance/1.60934
                except Exception as e:
#                     print(e, data['source'])
                    data['dist_source_to_departure'] = None
                
            # Transportation from departure to arrival
            if main_mode == 'Air':
                coords = ((departure_loc.y,departure_loc.x), (arrival_loc.y,arrival_loc.x))
                distance = great_circle((departure_loc.x,departure_loc.y), (arrival_loc.x,arrival_loc.y)).km
#                 print("Aerial Distance", distance)
                data['dist_departure_to_arrival'] = distance/1.60934
            else:
                origin = [departure_loc.y,departure_loc.x]
                destination = [arrival_loc.y,arrival_loc.x]
                route = sr.searoute(origin, destination)
                distance = route.properties['length']/1.60934
#                 print("Ocean Distance {:.1f} {}".format(route.properties['length'], route.properties['units']))
                data['dist_departure_to_arrival'] = distance
                
            # Transportation from arrival to destination
            if source_mode == 'Road':
                try:
                    coords = ((arrival_loc.y,arrival_loc.x), (destination_loc.y,destination_loc.x))
                    res = client.directions(coords)
                    distance = res['routes'][0]['summary']['distance']/1000
#                     print('Transportation from source to departure distance in km', distance)
                    data['dist_arrival_to_destination'] = distance/1.60934
                except Exception as e:
#                     print(e, data['source'])
                    data['dist_arrival_to_destination'] = None
        return data
    except Exception as e:
#         print(e, data['source'])
        data['dist_source_to_departure'] = None
        data['dist_arrival_to_destination'] = None
        data['dist_departure_to_arrival'] = None
        return data

In [51]:
# Getting all the required values and removing duplicated for computations
distance_df = final_df[['Destination_Country','Manufacturing_site','Mode', 'source', 'departure', 'source_mode', 'arrival', 'destination',
       'destination_mode', 'total_num_of_transports']]

Since Point datatype is unhashable, I can't perform drop_duplicates function.
Workaround is to convert all the data in str, perform drop duplicates and get the index of the unique values. Then get the rows on those indexes from main df

In [52]:
indexes = distance_df.loc[distance_df.astype(str).drop_duplicates().index].index

In [53]:
distance_df = distance_df.iloc[indexes]

In [None]:
# store starting time
begin = time.time()
results = distance_df.apply(calculateDistance,axis=1)
end = time.time()
# total time taken
print(f"Total runtime of the program is {end - begin}")

### Understanding the results from calculation of distance

In [None]:
results.isna().sum()

In [None]:
results[(results['dist_arrival_to_destination'].isna()) & (results['dist_source_to_departure'].isna())][['destination_mode','source_mode']]

There are many None values in 'dist_source_to_departure' and 'dist_arrival_to_destination', the mode of transport for these travel is 'Road', this is due to the limitation of openservice API. As some road routes are not recognizable by the API. We can optimize this by using Google Distance Matrix API.

Let's examine why we have None values in 'dist_departure_to_arrival' column

In [None]:
results[results['dist_departure_to_arrival'].isna()]['Mode'].unique()

The mode of transport for the None values in dist_departure_to_arrival is 'Road'. There is a limitation of the openservice API, 'The approximated route distance must not be greater than 6000000.0 meters'. Thus the distance can't be calculated

Merging result of distance calculation with the final_df

In [None]:
# Creating an array of columns to use for merging to avoid duplicate columns
cols_to_use = ['Destination_Country', 'Manufacturing_site', 'Mode',
       'dist_arrival_to_destination', 'dist_departure_to_arrival',
       'dist_source_to_departure',]

In [None]:
final_df = final_df.merge(results[cols_to_use],on=['Destination_Country','Manufacturing_site','Mode'])

In [None]:
# Emission Factors

aircraft_EF = 1.165
waterborne_EF = 0.041
truck_EF = 0.211

In [None]:
"""
Function to calculated emissions for each distributions
"""

def calculateEmissions(data):
    weight = data['Weight']
    source_mode = data['source_mode']
    destination_mode = data['destination_mode']
    main_mode = data['Mode']
    distance_leg1 = data['dist_source_to_departure']
    distance_leg2 = data['dist_departure_to_arrival']
    distance_leg3 = data['dist_arrival_to_destination']
    total_emissions = 0
    
    if isnan(distance_leg1):
        data['emissions_source_to_departure'] = None
    else:
        if source_mode == 'Road':
            data['emissions_source_to_departure'] = weight * distance_leg1 * truck_EF
        total_emissions = total_emissions + data['emissions_source_to_departure']
    
    if isnan(distance_leg2):
        data['emissions_departure_to_arrival'] = None
    else:
        if source_mode == 'Air':
            data['emissions_departure_to_arrival'] = weight * distance_leg2 * aircraft_EF
        else:
            data['emissions_departure_to_arrival'] = weight * distance_leg2 * waterborne_EF
        total_emissions = total_emissions + data['emissions_departure_to_arrival']
    
    if isnan(distance_leg3):
        data['emissions_arrival_to_destination'] = None
    else:
        if source_mode == 'Road':
            data['emissions_arrival_to_destination'] = weight * distance_leg3 * truck_EF
            total_emissions = total_emissions + data['emissions_arrival_to_destination']
    
    if total_emissions == 0:
        total_emission = None
    data['total_emissions'] = total_emissions
    
    
    
    return data

In [None]:
final_df = final_df.apply(calculateEmissions,axis=1)

## Description of each field in the final_df

0. ID - Unique ID for each transport
1. Destination_Country - The country where the final goods are transported for use.
2. Mode - This is the mode of transport from the country of manufacturing_site to the destination_country location
3. Manufacturing_site - Name of the manufacturer where the items are made.
4. Weight - The total weight of the item being transported in KG
5. Delivery_Date - The date on which the delivery was completed
6. Item Description - Description of the item being transported
7. source - The coordinates of the manufacturing place (Latitude, Longitude)
8. departure - The coordinates of the Airport/Port nearest to the manufacturing site from which the items are transported to the destination_country (Latitude, Longitude)
9. source_mode - The mode of transport used to transfer the items from manufacturing site to the Airport/Port location
10. destination_iso_country - ISO2 name for the destination country.
11. arrival - The coordinates of the Airport/Port nearest to the destination.(Latitude, Longitude)
12. destination - The coordinates of the destination place. (Latitude, Longitude)
13. destination_mode - The mode of transport used to transfer the items from arrival location to destination.
14. total_num_of_transports - Total number of transports used to reach the destination from source.
15. dist_arrival_to_destination - Total distance from arrival to destination location.
16. dist_departure_to_arrival - Total distance from departure to arrival location.
17. dist_source_to_departure - Total distance from source to departure location.
18. emissions_source_to_departure - Total emissions from source to departure location.
19. emissions_departure_to_arrival - Total emissions from departure to arrival location.
20. emissions_arrival_to_destination - Total emissions from arrival to destination location.
21. total_emissions - Summation of the total emissions from source to destination location.

In [None]:
final_df.to_csv('./Output/pharma_scope3_category9_emissions.csv')  

In [None]:
final_df.head(5)