<a href="https://colab.research.google.com/github/rahiakela/hands-on-explainable-ai-xai-with-python/blob/main/1-explaining-artificial-intelligence-with-python/2_google_location_history.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Google Location History

We could take the raw data of the location history provided and run an AI black box process to provide a quick diagnosis. However, most users do not trust AI systems that explain nothing, especially when it comes to life and death situations. We must build a component that can explain how and why we used Google's Location History data.



## Setup

In [None]:
%%shell

apt install proj-bin libproj-dev libgeos-dev
pip install https://github.com/matplotlib/basemap/archive/v1.1.0.tar.gz
pip install -U git+https://github.com/matplotlib/basemap.git

In [2]:
import pandas as pd
import numpy as np
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
from datetime import datetime as dt

In [None]:
!wget https://github.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python/raw/master/Chapter01/Location_History.zip

In [4]:
import zipfile

with zipfile.ZipFile('/content/Location_History.zip', 'r') as zip_ref:
  zip_ref.extractall('/content/')

We now read the file and display the number of rows in the data file.

In [5]:
df_gps = pd.read_json("/content/Location_History.json")
print("There are {:,} rows in the location history dataset".format(len(df_gps)))

There are 123,143 rows in the location history dataset


## Processing the data for XAI and basemap

Before using the data to access and display the location history records, we must parse, convert, and drop some unnecessary columns.

We will parse the latitudes, longitudes, and the timestamps stored inside the location columns:

In [7]:
df_gps["lat"] = df_gps["locations"].map(lambda x: x["latitudeE7"])
df_gps["lon"] = df_gps["locations"].map(lambda x: x["longitudeE7"])
df_gps["timestamp_ms"] = df_gps["locations"].map(lambda x: x["timestampMs"])

In [8]:
df_gps

Unnamed: 0,locations,lat,lon,timestamp_ms
0,"{'timestampMs': '1468992488806', 'latitudeE7':...",482688285,41040263,1468992488806
1,"{'timestampMs': '1468992524778', 'latitudeE7':...",482688285,41040263,1468992524778
2,"{'timestampMs': '1468992760000', 'latitudeE7':...",482922011,41480267,1468992760000
3,"{'timestampMs': '1468992775000', 'latitudeE7':...",482906350,41523165,1468992775000
4,"{'timestampMs': '1468992924000', 'latitudeE7':...",482960132,41664239,1468992924000
...,...,...,...,...
123138,"{'timestampMs': '1553429840319', 'latitudeE7':...",506184756,30720526,1553429840319
123139,"{'timestampMs': '1553430033166', 'latitudeE7':...",506187677,30719292,1553430033166
123140,"{'timestampMs': '1553430209458', 'latitudeE7':...",506189316,30719503,1553430209458
123141,"{'timestampMs': '1553514237945', 'latitudeE7':...",482745797,40919255,1553514237945


As you can see, the data must be transformed before we can use it for basemap. It does not meet the standard of XAI or even a basemap input.

We need decimalized degrees for the latitudes and longitudes. We also need to
convert the timestamp to date-time.

In [9]:
df_gps["lat"] = df_gps["lat"] / 10. ** 7
df_gps["lon"] = df_gps["lon"] / 10. ** 7

df_gps["timestamp_ms"] = df_gps["timestamp_ms"].astype(float) / 1000

df_gps["datetime"] = df_gps["timestamp_ms"].map(lambda x: dt.fromtimestamp(x).strftime("%Y-%m-%d %H:%M:%S"))
date_range = "{}-{}".format(df_gps["datetime"].min()[:4], df_gps["datetime"].max()[:4])

Before displaying some of the records in our location history, we will drop the columns we do not need anymore.

In [10]:
df_gps = df_gps.drop(labels=["locations", "timestamp_ms"], axis=1, inplace=False)

df_gps[1000:1005]

Unnamed: 0,lat,lon,datetime
1000,49.010427,2.567411,2016-07-29 21:16:01
1001,49.011505,2.567486,2016-07-29 21:16:31
1002,49.011341,2.566974,2016-07-29 21:16:47
1003,49.011596,2.568414,2016-07-29 21:17:03
1004,49.011756,2.570905,2016-07-29 21:17:19
