# Visualizing Mapillary Data on GCP

This short notebook demostrates how we can access and visualize data was imported from Mapillary and stored on CGP.


### Authenticating with Google Cloud

Before we can run this notebook on a new device, we will have to authenticate with google cloud. To do so, please run the following command in a terminal window (you shouldn't need to do this if you have already authenticated on this device):

```
gcloud auth application-default login 
```

You will also need to create a service account json key. This can be done using the following (for example):

```
gcloud iam service-accounts keys create ~/su-sa-default-private-key.json --iam-account=555121831052-compute@developer.gserviceaccount.com
```

Now we can continue with importing the python modules that we will use.

### Setting Python Directory

Because we are reading src module locally, we will need to add the absolute path of this repo to the `sys` path. This can be done using:

In [None]:
# note, to load the local module, it might be necessary to add the repository directory to the sys path:
import os
import sys
sys.path.insert(0, os.path.abspath(".."))

Now, we can import the required libraries and python modules, functions, etc.

In [None]:
from src.controller import MapillaryImage
from src.visualizer import Visualize
from src.utils import DataUtils

## Setting Environmental Variables

In order for the importer module to access both Google Storage and the Mapillary API, the following environmental variables must be set. Replace the text in `<>` with actual variables before running the next block.

In [None]:
%env DATABASE_URL=<DATABASE_URL>
%env SERVICE_ACCOUNT_PRIVATE_KEY=<PATH_TO_PRIVATE_KEY>

## Querying Data

In order to visualize the data, we will need to query the database and have the data in a Python data type. The `MapillaryImage` controller class contains functions to return data from the database in various formats. We will use it to construct a Geopandas dataframe of data that all belong to the same sequence.

First, initialize the controller:

In [None]:
imgs = MapillaryImage()

Then, query by sequence ID:

In [None]:
data = imgs.select_by_sequence_id('vSDgMY3PXlEz6wLvvUcoCQ')

We can observe the total number of rows of the data and take a look at the first few rows:

In [None]:
print(f"The number of rows returned is: {len(data.index)}")
data.head()

## Visualize Data

We will use the `Visualize` class and folium maps to visualize the data. Before we can get started, we must initialize the Visualize class:

In [None]:
vis = Visualize(data)

Then, it is simply a matter of calling `map`. Note, if this notebook is being acesses from an unauthorized connection (i.e. not from a signed in Google Chrome session), then it will be necessary to set `use_signed_urls=True` when calling map. By default, use_signed_urls is False because generating signed urls creates significant overhead and adds a lot of time to the mapping function.

Some image details can be viewed when clicking on map markers.

In [None]:
vis = Visualize(data)
vis.map(use_signed_urls=True)

Note, the map will render poorly for large datasets so please query or provide subqueries of only the data you want to visualize. As a rule of thumb, try to keep this number less than 5000 or so if not using signed URLs or less than about 1000 if using signed URLs.

## Visualize Subsets

Is also possible to filter data using geopandas on the initialized data. For example, we can filter a larger query (for example, by a larger bounding box) so that we only visualize points where the difference between the Mapillary default geometry and the Mapillary computed geometry is greater than 20 meters. We can also choose to plot both the `geometry` markers and the `computed_geometry` markers using the `additional_geometries` parameter.

In [None]:
# set parameters and query data
bbox = [141.028,42.292,141.121,42.444]
data = imgs.select_within_bbox(bbox)

subset = data.loc[data.to_crs(3857).distance(data['computed_geometry'].to_crs(3857)) > 20]
print(f"The number of rows returned is: {len(subset.index)}")

# it is necessary to reset the index when passing on a subset of a geopandas dataframe to the Visualize class
subset.reset_index(drop=True, inplace=True)

# visualize
vis = Visualize(subset)
vis.map(use_signed_urls=True, additional_geometries=['computed_geometry'])

Finally, it is also possible to pass on different map tiles to the `map` function. The `map` function is just a wrapper for `Folium.map()` ans so all tiles that are available for `Folium` can be called here. For example:

In [None]:
vis.map(use_signed_urls=True, additional_geometries=['computed_geometry'], tiles ='Stamen Toner', zoom_start=16)

## Download Images

Finally, it is also possible to download images from GCP storage to your local device. You can use the utility module to download either a single image or use it to create a list of images for download that you can then use `gutils` to download. If you need to download more than 10 images, it is highly advised to use `gutils`.

If we want to download imagery within a particular bounding box and time range, first create the dataframe, then download the data. Using the same `bbox` that is defined above, we can define the time range then initiate the `DataUtils` class:

In [None]:
start_date = '2019-10-22'
end_date = '2019-10-24'

data_for_download = imgs.select_within_bbox_dates(bbox, start_date, end_date)

dutil = DataUtils(data_for_download)

print(f'There are {len(data_for_download.index)} images that can be downloaded')

To download all 5081 images you will first need to create a text file with all images to download:

In [None]:
dutil.write_image_list('./foo.txt')

Then, you can download the images with `gsutil` like so:

```
cat foo.txt | gsutil -m cp -I ./foo
```

You can also download a single image using the API. For this, you will need to know the image id of the image you want to download. Note that it will only look in the data that you used to initiate the utility functions. 

In [None]:
dutil.download_gcp_image(804730633805031, './foo')