## Calculating the distance Anthony Bourdain traveled

### Data from: https://data.world/makeovermonday/2018w33-anthony-bourdains-travels

In [None]:
import pandas as pd # Import for data processing
import mpu #Import for haversine distance calculation

def Calc_Dist(row):
    # The haversine distance function calculated the kilometer distance between two points.
    distance = mpu.haversine_distance(
        (row.Latitude, row.Longitude),
        (row.Prev_Latitude, row.Prev_Longitude))
    
    # Because the distance returned is in kilometes, we have to convert it into miles. I'm rounding it to the 2nd decimal
    distance = round(distance * 0.621371, 2)
    
    return(distance)

In [None]:
data = pd.read_csv("Map_data.csv")
data.head()

The data is very thorough and luckily we don't have to do much do shape it.

Still, we need two new variables to calculate the haversine distance: the latitude and longitude Bourdain traveled from in the previous episode. We can achive this by using the shift function on our pandas data frame. This function will shift offset the data. Depending on the direction, these are usually refered to as lag or lead variables.

In this case, we're going to shift it one row - we're lagging the longitude and lataitude.

In [42]:
data['Prev_Latitude'] = data.groupby('Show').Latitude.shift(1)
data['Prev_Longitude'] = data.groupby('Show').Longitude.shift(1)
data[['Latitude', 'Longitude', 'Prev_Latitude', 'Prev_Longitude']].head()

Unnamed: 0,Latitude,Longitude,Prev_Latitude,Prev_Longitude
0,35.689487,139.691706,,
1,35.096276,139.071705,35.689487,139.691706
2,10.823099,106.629664,35.096276,139.071705
3,15.933589,103.449284,10.823099,106.629664
4,11.556374,104.92821,15.933589,103.449284


You can see, the data's been shifted so we can see what the previous locaiton was. Before we move on, I'm going to shift some other variables that I'll use in the visualization, and create a new variable that counts the number of episodes in that show.

In [None]:
data['Prev_City'] = data.groupby('Show').City.shift(1)
data['Prev_State'] = data.groupby('Show').State.shift(1)
data['Prev_Country'] = data.groupby('Show').Country.shift(1)
data['Episode In Series'] = data.groupby('Show').cumcount() + 1

Finally, we are ready to calculate the distance. The funciton we created above will work fine, unless it's the first show in the series. In that case we need to return a 0 since Bourdain hasn't traveled anywhere.

We can do this with a simple if statement within a list comprehension. We iterrate over each row of the data, and if the previous longitude is not a number, we return 0, otherwise calculate the distance. The resulting list is a series in our dataframe. Then write the data to a csv, and visualize!

In [41]:
data['Distance'] = [0 if np.isnan(row.Prev_Longitude) else Calc_Dist(row) for _, row in data.iterrows()]

data.to_csv("DataWithDistance.csv")