# <div align="center">GOOGLE LOCATIONS</div>

In order to download your Google location history you need to go here: https://takeout.google.com/settings/takeout

Scroll down and select only Location history here:

![Imgur](https://i.ibb.co/64F8fq3/Untitled.png)

After that wait for a few minutes for your download to arrive in your Gmail inbox and download the archive.

The archive consists of one file and some directiories:

- **Location History.json**
- **Semantic Location History directory**

The first .json file contains all locations google has on you since you've activated the location history service in your account. It is, obviously, a json format file with the following structure for the most part:

*`{
  "locations" : [ {
    "timestampMs" : "1575051363494",
    "latitudeE7" : 575202012,
    "longitudeE7" : 227607747,
    "accuracy" : 299,
    "altitude" : 26,
    "verticalAccuracy" : 52,
    "activity" : [ {
      "timestampMs" : "1575051362890",
      "activity" : [ {
        "type" : "IN_VEHICLE",
        "confidence" : 90
      }, {
        "type" : "IN_ROAD_VEHICLE",
        "confidence" : 90
      }, {
        "type" : "IN_FOUR_WHEELER_VEHICLE",
        "confidence" : 52
      }, {
        "type" : "IN_TWO_WHEELER_VEHICLE",
        "confidence" : 37
      }, {
        "type" : "IN_CAR",
        "confidence" : 27
      }, {
        "type" : "IN_BUS",
        "confidence" : 25
      }, {
        "type" : "IN_RAIL_VEHICLE",
        "confidence" : 5
      }, {
        "type" : "UNKNOWN",
        "confidence" : 2
      }, {
        "type" : "ON_BICYCLE",
        "confidence" : 1
      }, {
        "type" : "ON_FOOT",
        "confidence" : 1
      }, {
        "type" : "WALKING",
        "confidence" : 1
      }, {
        "type" : "STILL",
        "confidence" : 1
      } ]
    }`*

Essentially, I will be extracting the following values:
- **timestampMs**: duration in miliseconds since 1970.01 00:00:00
- **latitudeE7**: integer that has to be diveded by 1e7 to get the proper format
- **longitudeE7**: same as latitudeE7
- **accuracy**: location accuracy in meters (how certain Google is about the radius of my location)
- **altitude**: altidude of that location (not sure how Google measures it)
- **velocity**: movement speed
- **heading**: not sure what this one is
- **altitude**: location altidue
- **activity_type**: type of activity sorted by confdence (I will be taking the first one if it exists)
- **activity_confidence**: confidence of the activity type

Here is the function that forms the dataset:

In [20]:
import pandas as pd
import json
import numpy as np

def extract_activity(record):
    try:
        return record["activity"][0]["activity"][0]["type"]
    except:
        return np.nan
    
def extract_activity_confidence(record):
    try:
        return record["activity"][0]["activity"][0]["confidence"]
    except:
        return np.nan
    
def location_extractor(json):
    df = pd.DataFrame()
    file = pd.read_json(json)
    locations = file.locations
    df["date"] = locations.apply(lambda x: pd.to_datetime(x["timestampMs"], unit='ms'))
    df["longitude"] = locations.apply(lambda x: x['longitudeE7'] / 1e7)
    df["latitude"] = locations.apply(lambda x: x['latitudeE7'] / 1e7)
    df["location_accuracy"] = locations.apply(lambda x: x["accuracy"])
    df["velocity"] = locations.apply(lambda x: x.get("velocity", np.nan))
    df["heading"] = locations.apply(lambda x: x.get("heading", np.nan))
    df["altitude"] = locations.apply(lambda x: x.get("altitude", np.nan))
    df["activity_type"] = locations.apply(extract_activity)
    df["activity_confidence"] = locations.apply(extract_activity_confidence)
    return df
 
location_history = location_extractor("\Google locations\Location History\Location History.json")
location_history.head(5)

Unnamed: 0,date,longitude,latitude,location_accuracy,velocity,heading,altitude,activity_type,activity_confidence
0,2013-09-27 22:38:10.599,24.071285,56.962056,1172,,,,,
1,2013-09-27 22:39:10.708,24.071285,56.962056,1172,,,,,
2,2013-09-27 22:40:10.734,24.071285,56.962056,1172,,,,,
3,2013-09-27 22:41:10.743,24.071285,56.962056,1172,,,,,
4,2013-09-27 22:42:10.863,24.071285,56.962056,1172,,,,,


The structure of files in Semantic Location History is slightly more complex and looks like this:

*`{
  "timelineObjects" : [ {
    "activitySegment" : {
      "startLocation" : {
        "latitudeE7" : 569574458,
        "longitudeE7" : 241346403
      },
      "endLocation" : {
        "latitudeE7" : 569776090,
        "longitudeE7" : 241367443
      },
      "duration" : {
        "startTimestampMs" : "1572589201469",
        "endTimestampMs" : "1572590899942"
      },
      "distance" : 2492,
      "activityType" : "WALKING",
      "confidence" : "UNKNOWN_CONFIDENCE",
      "activities" : [ {
        "activityType" : "WALKING",
        "probability" : 95.6399142742157
      }, {
        "activityType" : "IN_TRAM",
        "probability" : 1.3046897947788239
      }, {
        "activityType" : "STILL",
        "probability" : 1.209926512092352
      } ],
      "waypointPath" : {
        "waypoints" : [ {
          "latE7" : 569576797,
          "lngE7" : 241342391
        }, {
          "latE7" : 569576339,
          "lngE7" : 241321563
        }, {
          "latE7" : 569581108,
          "lngE7" : 241311073
        } ]
      },
      "simplifiedRawPath" : {
        "points" : [ {
          "latE7" : 569581149,
          "lngE7" : 241311210,
          "timestampMs" : "1572589443606",
          "accuracyMeters" : 24
        } ]
      }
    }
  }, {
    "placeVisit" : {
      "location" : {
        "latitudeE7" : 569776726,
        "longitudeE7" : 241367168,
        "placeId" : "ChIJt6oH5K7P7kYRSmmiyZQSLlQ",
        "address" : "Duntes iela 6\nVidzemes priekšpilsēta, Rīga, LV-1013\nLatvija",
        "name" : "Circle K Business Centre",
        "sourceInfo" : {
          "deviceTag" : 113987519
        },
        "locationConfidence" : 95.3105
      },
      "duration" : {
        "startTimestampMs" : "1572590899942",
        "endTimestampMs" : "1572623228427"
      },
      "placeConfidence" : "HIGH_CONFIDENCE",
      "centerLatE7" : 569780429,
      "centerLngE7" : 241369450,
      "visitConfidence" : 93,
      "otherCandidateLocations" : [ {
        "latitudeE7" : 569783000,
        "longitudeE7" : 241360100,
        "placeId" : "ChIJ7RQLH6_P7kYRPGaLbztEwXA",
        "locationConfidence" : 1.2885653
      }, {
        "latitudeE7" : 569793400,
        "longitudeE7" : 241365700,
        "placeId" : "ChIJffu9Lq_P7kYRTxdXGqcpwlE",
        "locationConfidence" : 0.71848404
      }, {
        "latitudeE7" : 569780320,
        "longitudeE7" : 241367160,
        "placeId" : "ChIJQz5wzajP7kYRDQMEicbxJrA",
        "locationConfidence" : 0.58375335
      }, {
        "latitudeE7" : 569780320,
        "longitudeE7" : 241367161,
        "placeId" : "ChIJeWnNLcjP7kYRDI1Qb2tXFsY",
        "locationConfidence" : 0.4604294
      } ],
      "editConfirmationStatus" : "NOT_CONFIRMED"
    }`*
    
For these data I will be extracting the following (all from *`placeVisit`* portion of the data):

- **startTimestampMs**: again, duration in miliseconds since 1970.01 00:00:00
- **endTimestampMs**: same as above
- **longitudeE7**: integer that has to be diveded by 1e7 to get the proper format
- **latitudeE7**: sames as longitudeE7
- **name**: name of the location from Google Maps
- **address**: address of the location
- **locationConfidence**: how confident Google engine is about the location (in percentage)

Here is the function to do so:

In [37]:
import pandas as pd
import json
import os

def get_links(path):
    file_list = []
    for root, dirs, files in os.walk(path):
        for file in files:
            file_location = root + "\\" + file
            file_list.append(file_location)
    return file_list

def placeVisit_filter(path):
    files = get_links(path)
    placeVisit_data = []
    for file in files:
        json_file = pd.read_json(file)
        location_dicts = json_file.timelineObjects
        for placeVisit_dict in location_dicts:
            try:
                if placeVisit_dict.get("placeVisit"): placeVisit_data.append(placeVisit_dict.get("placeVisit"))
            except: pass
    return placeVisit_data

def df_creator(path):
    data = placeVisit_filter(path)
    df = pd.DataFrame()
    df["start_date"] = pd.to_datetime(list(map(lambda x: x.get("duration").get("startTimestampMs"), data)), unit="ms")
    df["end_date"] = pd.to_datetime(list(map(lambda x: x.get("duration").get("endTimestampMs"), data)), unit="ms")
    df["duration_sec"] = round((df["end_date"] - df["start_date"]).dt.total_seconds()).astype(dtype="int64")
    df["longitude"] = list(map(lambda x: x.get("location").get("longitudeE7") / 1e7, data))
    df["latitude"] = list(map(lambda x: x.get("location").get("latitudeE7") / 1e7, data))
    df["location_name"] = list(map(lambda x: x.get("location").get("name"), data))
    df["location_address"] = list(map(lambda x: x.get("location").get("address"), data))
    df["location_address"] = df["location_address"].str.replace("\n", " ")
    df["location_confidence"] = list(map(lambda x: x.get("location").get("locationConfidence"), data))
    df["location_confidence"] = df["location_confidence"].round(1)
    return df.dropna().sort_values(by="start_date").reset_index().drop(columns="index")

location_names = df_creator("\Google locations\Location History\Semantic Location History")
location_names.head(5)

Unnamed: 0,start_date,end_date,duration_sec,longitude,latitude,location_name,location_address,location_confidence
0,2013-09-27 23:10:10.817,2013-09-28 11:24:19.297,44048,24.056465,56.97221,Dzirciema iela 115,"Dzirciema iela 115 Kurzemes rajons, Rīga, LV-1...",99.5
1,2013-09-28 13:04:18.764,2013-09-28 13:17:09.724,771,24.036267,56.929672,Shopping Centre Spice,"Lielirbes iela 29 Zemgales priekšpilsēta, Rīga...",66.7
2,2013-09-28 17:50:14.861,2013-09-29 14:18:19.643,73685,24.056465,56.97221,Dzirciema iela 115,"Dzirciema iela 115 Kurzemes rajons, Rīga, LV-1...",99.5
3,2013-09-29 20:37:05.962,2013-09-30 05:10:31.854,30806,24.056465,56.97221,Dzirciema iela 115,"Dzirciema iela 115 Kurzemes rajons, Rīga, LV-1...",99.6
4,2013-09-30 06:25:56.386,2013-09-30 11:33:33.334,18457,24.137327,56.977953,Hospitāļu iela 55,"Hospitāļu iela 55 Vidzemes priekšpilsēta, Rīga...",99.8


Two datasets are loaded into Microsoft PowerBI and visualized. In the report below you can select any period or a date for which it will count all unique locations and group them by hour intervals (more locations = more activity), as well as put the geo tags on the map and display durations of time spent on locations that Google could identify:  

In [38]:
from IPython.display import IFrame
pbi_report = IFrame('https://bit.ly/3bnjNFQ', width=1215, height=720)
pbi_report

661K locations during last 7 years of which only ~5,5 were tracked - scary to think what FANG corporations have on us.