### CH3/05 Calculate distance
main > Ch03 > 02_05 > speed.py

In [3]:
import pandas as pd
import numpy as np

Let's calculate how slow I jog for **speed we need time and distance**. 

$Speed = Time /Distance$

We have latitude and longitude, and we're going to **cheat and use Euclidian distance instead of distance on a sphere**. 

So first we're going to load the data with read CSV. 

In [4]:
csv_file = 'track.csv'
df = pd.read_csv(csv_file, parse_dates=['time'])
df

Unnamed: 0,time,lat,lng,height
0,2015-08-20 03:48:07.235,32.519585,35.015021,136.199997
1,2015-08-20 03:48:24.734,32.519606,35.014954,126.599998
2,2015-08-20 03:48:25.660,32.519612,35.014871,123.000000
3,2015-08-20 03:48:26.819,32.519654,35.014824,120.500000
4,2015-08-20 03:48:27.828,32.519689,35.014776,118.900002
...,...,...,...,...
735,2015-08-20 04:20:28.982,32.517020,35.014387,104.800003
736,2015-08-20 04:20:29.923,32.517035,35.014355,105.199997
737,2015-08-20 04:20:32.863,32.517087,35.014279,102.900002
738,2015-08-20 04:20:33.994,32.517098,35.014264,102.400002


**One latitude** in **kilometers** is about **92 kilometers** and **one longitude** is about **111 kilometers**. It's varies depending on where you are on earth, but it's good enough for what we're going to do. 

In [5]:
lat_km = 92
lng_km = 111

So we're going to **define distance function which takes the delta latitude, the latter longitude. Multiply them.** By the constants and then use Numpy to get the kleidion distance. So we're going to run this cell. There is no output, but now we have the function.

In [6]:
def distance(lat1, lng1, lat2, lng2):
    delta_lat = (lat1 - lat2) * lat_km
    delta_lng = (lng1 - lng2) * lng_km
    return np.hypot(delta_lat, delta_lng)

Let's take **first longitude and latitude** and the **second one**. We'll take 200 and 201 and we're going to calculate the distance between them.

In [7]:
lat1, lng1 = df.loc[200]['lat'], df.iloc[200]['lng']
lat2, lng2 = df.loc[201]['lat'], df.iloc[201]['lng']
distance(lat1, lng1, lat2, lng2)

np.float64(0.009249671616168792)

We need to calculate the distance between every row. Remember we want to **avoid for loops as much as possible**. Let's use the shift method. 

To avoid looping, here's an example so. **And creating a series of five elements**. So zero to four now. 

In [8]:
s = pd.Series(np.arange(5))
s

0    0
1    1
2    2
3    3
4    4
dtype: int32

If I'm going to run the shift operator, I'm going to get none is the first one, and every element is **shifted downward.** 

In [9]:
s.shift()

0    NaN
1    0.0
2    1.0
3    2.0
4    3.0
dtype: float64

Shift can also work in the other direction. **If I give it -1. So now I have none at the bottom** and everyone is shifted up. 

In [10]:
s.shift(-1)

0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
dtype: float64

Now we're going to use shift so we get the distance, we take the latitude and longitude and the **shifted latitude and longitude.** And if you're going to run this one, we're going to get Nan again for the first value and then distances which are in kilometers. 

In [11]:
dist = distance(
    df['lat'], df['lng'], 
    df['lat'].shift(), df['lng'].shift(),
)
dist[:5]

0         NaN
1    0.007684
2    0.009230
3    0.006492
4    0.006225
dtype: float64

Let's do a sanity check and **sum all of the distances 4.7 kilometers.** That seems about right. 

In [12]:
dist.sum()

np.float64(4.693669332948701)

Now we'd like to **calculate the difference in times.** This one, we're going to use the diff. If you're going to run it now, we're going to see again, not the time for the first one. And then the **difference in times 17 seconds, 0 seconds, 1 second, etc.** 

In [13]:
times = df['time'].diff()
times[:5]

0                      NaT
1   0 days 00:00:17.499000
2   0 days 00:00:00.926000
3   0 days 00:00:01.159000
4   0 days 00:00:01.009000
Name: time, dtype: timedelta64[ns]

So what we are going to do is time. We're going to take times and divide it by pandas time delta saying one hour. **And now we see the numbers as fractions of an hour.**

In [14]:
times.sum()

Timedelta('0 days 00:32:35.094000')

Finally, we can calculate the speed, so we do distance divided by times, hour and we are going to see the speed. This speed is in kilometers per hour. 

In [15]:
times_hour = times / pd.Timedelta(1, 'hour')
times_hour[:5]

0         NaN
1    0.004861
2    0.000257
3    0.000322
4    0.000280
Name: time, dtype: float64

[Context_Python_Scientific_Stack](./../../Context_Python_Scientific_Stack.md)