# Tutorial 1: Loading, Processing, Visualizing, and Storing Data

This short notebook demostrates how `landlens_db` can be used to load, process, visualize, and store street-view data from local file directories and Mapillary servers.

## Pre-requisites

Before getting started, you will need to have [PostgreSQL](https://www.postgresql.org/download/) and [PostGIS](https://postgis.net/documentation/getting_started/) installed. 

### PostgreSQL and PostGIS
Once PostgreSQL and PostGIS are installed, or if they are already installed, then you will need to have a PostGIS enabled PostgreSQL database to work with. To create one, use:

```bash
createdb <database_name> && psql <database_name> -c "CREATE EXTENSION POSTGIS"
```

Be sure to replace `<database_name>` with the name you want to call your database. For example, it could be:

```bash
createdb landlens && psql landlens -c "CREATE EXTENSION POSTGIS"
```

### Mapillary Token
You will also need to have a Mapillary API token. This isnt necessary to use the library, but you will need it to follow the tutorial, which includes examples on connecting to Mapillary and downloading images.

### .env File
Finally, you will need to save this information into a `.env` file. To create a `.env` file, use:

```bash
touch .env
```

Then, paste the following:

```
MLY_TOKEN=<token>
LOCAL_IMAGES=<path_to_local_images>
DOWNLOAD_DIR=<path_to_download_images_do>
DATABASE_URL=<PostgreSQL_database_url>
DB_TABLE=<table_name>
```

Replace the text enclosed by angular brackets `<>` with the actual text. For example, your `MLY_TOKEN` would be the Mapillary token you can acquire from a Mapillary account. Usually, these start with `MLY|`. The path variables are absolute paths to where you want files to be read or written. The database URL, if the database was created using the earlier instructions, would look like: `postgresql://localhost:5432/landlens`, assuming PostgreSQL is being served on localhost with port 5432 (the default). The `DB_TABLE` will be the table name to use in this tutorial. It can be anything, for example: `mapillary_images`. 

Once this is done, you should be ready to proceed with the tutorial.

If you don't have `landlens_db` installed, you can do so with `pip install landlens_db`.

In [1]:
from landlens_db.handlers.cloud import Mapillary
from landlens_db.handlers.image import Local
from landlens_db.process.snap import create_bbox, get_osm_lines, snap_to_road_network
from landlens_db.handlers.db import Postgres
from landlens_db.geoclasses.geoimageframe import GeoImageFrame

Before we get started, we will need to load our Mapillary API token and other environmental variables. For simplicity, we will use the `dotenv` library to please install this and create a .env file to follow this tutorial. You will also need to make sure that pandas and geopandas are installed in order to manipulate some of the data required for the tutorial.

In [2]:
import os
import geopandas as gpd
import glob
import pandas as pd

from dotenv import load_dotenv

load_dotenv()

MLY_TOKEN = os.environ.get("MLY_TOKEN")
LOCAL_IMAGES = os.environ.get("LOCAL_IMAGES")
DOWNLOAD_DIR = os.environ.get("DOWNLOAD_DIR")
DATABASE_URL = os.environ.get("DATABASE_URL")
DB_TABLE = os.environ.get("DB_TABLE")

# Loading Images

`landlens_db` provides two simple ways to load images for the first time which can then be processed and stored for further analysis.

## 1. Loading images from local directory

To load images from a local directory, simply call the `load_images` function while providing the source directory to read from. Currently, only `jpeg` images are supported and it is best to provide the full path to the images.

In [3]:
local_images = Local.load_images(LOCAL_IMAGES)
local_images

Unnamed: 0,name,altitude,camera_type,camera_parameters,captured_at,compass_angle,exif_orientation,image_url,geometry
0,IMG_0408.jpeg,234.415619,perspective,,2023-03-06T11:04:19+03:00,316.262604,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79483 -16.33181)
1,IMG_0404.jpeg,235.402679,perspective,,2023-03-06T11:04:15+03:00,76.32843,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79484 -16.33181)
2,IMG_0405.jpeg,235.890045,perspective,,2023-03-06T11:04:15+03:00,45.822021,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79484 -16.33181)
3,IMG_0409.jpeg,234.390961,perspective,,2023-03-06T11:04:20+03:00,271.465851,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79483 -16.33181)
4,R0013708.JPG,221.110001,360-degree,,2023-03-10T11:39:47+03:00,,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.80393 -16.32318)
5,IMG_0403.jpeg,235.64389,perspective,,2023-03-06T11:04:14+03:00,97.850014,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79484 -16.33182)
6,IMG_0410.jpeg,234.289215,perspective,,2023-03-06T11:04:21+03:00,254.171234,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79483 -16.33181)
7,IMG_0406.jpeg,235.161835,perspective,,2023-03-06T11:04:16+03:00,36.9245,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79484 -16.33181)
8,IMG_0407.jpeg,234.766693,perspective,,2023-03-06T11:04:18+03:00,350.140991,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79484 -16.33181)
9,IMG_0411.jpeg,234.289215,perspective,,2023-03-06T11:04:22+03:00,248.655853,1.0,/Users/iosefa/repos/misc/SU_GCPsystem/notebook...,POINT (46.79483 -16.33181)


The resulting image is a GeoImageFrame, which is a simple extension of a Pandas GeoDataFrame with a few required column definitions and additional methods for visualization and data verification.

## 2. Loading images from mapillary

`landlens_db` was made to work with Mapillary data and it includes helper functions to make calls to the Mapillary API and download and convert Mapillary data into a format for `landlens_db`.

To use `landlens_db` to fetch data from Mapillary, you first need to initialize a Mapillary connection using your Mapillary Secret Token.

In [4]:
importer = Mapillary(MLY_TOKEN)

`landlens_db` offers a few functions to filter Mapillary data from their API. However, for more advanced filtering, we recommend that users use the `mapillary-python-sdk` and convert the resulting data into a GeoImageFrame.

Here is an example of how to load data using the `fetch_by_id` method of `landlens_db`:

In [5]:
image_id = 915374089313107
image = importer.fetch_by_id(image_id)
image

Unnamed: 0,altitude,atomic_scale,camera_parameters,camera_type,captured_at,compass_angle,computed_altitude,computed_compass_angle,computed_geometry,computed_rotation,...,height,merge_cc,mesh,sequence,sfm_cluster,width,detections,mly_id,name,image_url
0,41.782,1.002665,"0.61739578749889,0.26131500830183,0.1242660260...",fisheye,2019-10-23T22:29:42+09:00,99.299232,1.795589,102.951814,POINT (140.95153462743 42.329677227362),"-1.0627190885041,-0.84029284280692,-1.15538369...",...,3000,1.926644e+18,"{'id': '313263440182706', 'url': 'https://scon...",emgV_2cwMSoW9w7fkg7xJQ,"{'id': '169747341731652', 'url': 'https://scon...",4000,"{'data': [{'id': '916266259223890'}, {'id': '9...",915374089313107,mly|915374089313107,https://scontent-itm1-1.xx.fbcdn.net/m1/v/t6/A...


By default, `landlens_db` will download all fields from the Mapillary image endpoint and default to `thumb_1024_url` as the `image_url`, however, you may specify a subset of fields using the `fields` argument and only these fields will be downloaded. Note, you must supply at least the `id`, `geometry`, and one of the image url fields.

For example, using the `fetch_within_bbox` method of `landlens_db`:

In [6]:
bbox = [139.59,35.865358, 139.719, 35.882781]
start = '2022-03-16'
end = '2022-03-16'
fields = ['id', 'altitude', 'captured_at', 'camera_type', 'thumb_1024_url', 
          'compass_angle', 'computed_compass_angle', 'computed_geometry', 'geometry']

images = importer.fetch_within_bbox(bbox, start_date=start, end_date=end, fields=fields)
images.head()

Unnamed: 0,altitude,captured_at,camera_type,compass_angle,computed_compass_angle,computed_geometry,geometry,mly_id,name,image_url
0,38.64,2022-03-16T02:46:44.229000+09:00,perspective,346.376743,350.233731,POINT (139.62202815944 35.87962303497),POINT (139.62201 35.87961),1056391831896042,mly|1056391831896042,https://scontent-itm1-1.xx.fbcdn.net/m1/v/t6/A...
1,40.539,2022-03-16T18:44:07.106000+09:00,perspective,264.977394,260.561395,POINT (139.62272895993 35.882495429369),POINT (139.62280 35.88253),117242294104865,mly|117242294104865,https://scontent-itm1-1.xx.fbcdn.net/m1/v/t6/A...
2,32.324,2022-03-16T02:39:30.831000+09:00,perspective,23.902162,27.52459,POINT (139.60894199878 35.866945069785),POINT (139.60890 35.86694),153264953775899,mly|153264953775899,https://scontent-itm1-1.xx.fbcdn.net/m1/v/t6/A...
3,35.907,2022-03-16T02:48:56.295000+09:00,perspective,77.650447,80.105901,POINT (139.6226116354 35.882501362048),POINT (139.62258 35.88251),173595095002718,mly|173595095002718,https://scontent-itm1-1.xx.fbcdn.net/m1/v/t6/A...
4,43.166,2022-03-16T18:48:37.162000+09:00,perspective,159.181296,163.441132,POINT (139.61837370823 35.877372117104),POINT (139.61833 35.87740),1131077224318050,mly|1131077224318050,https://scontent-itm1-1.xx.fbcdn.net/m1/v/t6/A...


It is also important to realize that Mapillary image urls are not permanent. So, `landlens_db` offers a method to download Mapillary images and return a new `GeoImageFrame` with the updated the `image_url` to the new location.

In [None]:
images = images.download_images_to_local(DOWNLOAD_DIR, filename_column='name')
images.head()

 93%|█████████▎| 386/413 [01:43<00:08,  3.35it/s]

## Loading data from arbitrary sources
It is also possible to read from any OGC-recognized vector file format, including ESRI shapefile, geojson, and geopackage, or to create a `GeoImageFrame` in the same manner as a geopandas dataframe by initializing it with data so long as it has a `name`, `image_url`, and `geometry` column.

Data can also be imported from a PostreSQL postGIS enabled database. There is more information below on creating and exporting postgres tables for `landlens_db`.

When reading from postgres, it can be beneficial to load a subset of data. This can be important when the database contains upwards of tens of thousands of images. For this purpose, there are several database utility and query functions to select only a subset of the data in the database.

# Processing Images

Now that we have loaded some data, we can perform some simple processing on the images. Check the documentation for the current processing functions available. Here is an example of how `landlens_db` can be used to snap images to road networks.

First, we need a road network to snap your images to. `landlens_db` also offers a helper function to download road networks from Open Street Map within a given bounding box.

In [None]:
bbox = images['geometry'].total_bounds
network = get_osm_lines(bbox)

Then, calling the `snap_to_road_network` will snap all points to the closest road network (within the provided threshold distance) and will create a new geometry column in the `GeoImageFrame` falled `snapped_geometry` to represent this new point.

In [None]:
snap_to_road_network(images, 100, network)

## Snapping to a local road network

It is also possible to load your own road network and snap to this. When doing this, it is important that all the image points are within a reasonable distance from any given road in your network and that the threshold is appropriately set. If you suspect that there are images far from a road, and you do need that image to be snapped to the closest road, then be sure to set a high enough threshold.

Here is an example of how this can be achieved:

```python
roads_path = 'data/roads/*.shp'
road_files = glob.glob(roads_path)
roads = [gpd.read_file(road) for road in road_files]
network = pd.concat(roads, ignore_index=True)
snap_to_road_network(images, 100, network, realign_camera=True)
```

# Visualizing Images

`landlens_db` provides a simple way to visualize its `GeoImageFrames` interactively using Folium. The `map` method of a `GeoImageFrame` will plot all images as markers on a map and will display the image on click along with any metadata set using the `additional_properties` argument as well as markers for any provided additional geometry.

In [None]:
images.map(
    additional_properties=['altitude', 'camera_type'],
    additional_geometries=[
        {'geometry': 'computed_geometry', 'angle': 'computed_compass_angle', 'label': 'Computed'},
        {'geometry': 'snapped_geometry', 'angle': 'snapped_angle', 'label': 'Snapped'},
    ]
)

# Storing Images

`GeoImageFrame` data can be stored in a variety of formats. Given that it is built on GeoPandas the `GeoDataFrame` class, it will take any geodataframe method to save data. For instance, to save a table as a `geopackage`, we simply call:

In [None]:
images.to_file('data/images_tutorial.gpkg')

However, in the current version when reading a saved vector format it is important to then initialize the GeoDataFrame as a GeoImageFrame if you want to make use of the features of `landlens_db`. For example:

In [None]:
images_gdf = gpd.read_file('data/images_tutorial.gpkg')
images = GeoImageFrame(images_gdf)

## Saving to a PostgreSQL Database

`landlens_db` also offers functionality to store data in a PostGIS enabled PostgreSQL database. This is done by extending the `to_postgis` method of GeoPandas. There are some constraints, such as unique image_urls, that are automatically applied when storing data, as well as some data validity checks -- see the documentation for details. 

To save a `GeoImageFrame` to a PostgreSQL table, you will need to first initiate a connection to a PostgreSQL database. You can do this using the `ImageDB` class:

In [None]:
db_con = Postgres(DATABASE_URL)

This database must already exist and have PostGIS loaded. 

Then, you can save using `to_postgis`:

In [None]:
images.to_postgis(DB_TABLE, db_con.engine, if_exists="replace")

### Updating an Existing Table

When saving to PostgreSQL, you can choose to handle existing tables. `to_postgis` offers the same `fail`, `replace` and `append` methods that GeoPandas offers, however, `append` requires that all data going in will not conflict with any existing data. Instead, it is possible to "upsert" (insert and update) data into existing tables using the `upsert_images` class method of `Image_DB`. You may choose to either update conflicting records or skip them by declaring `"update"` or `"nothing"` in the conflict argument of the function.

In [None]:
db_con.upsert_images(images, DB_TABLE, conflict='update')

### Querying an Existing Table

It is also possible to load and filter data from existing postgres connections. `landlens_db` offers simple filter functions to query and filter tables to provide a subset of the data. This can be important when working with very large datasets. For example, to load all images with an altitude greater than 50:

In [None]:
high_altitude_images = db_con.table(DB_TABLE).filter(altitude__gt=50).all()

high_alt_map = high_altitude_images.map(
    additional_properties=['altitude', 'camera_type'],
    additional_geometries=[
        {'geometry': 'geometry', 'angle': 'compass_angle', 'label': 'Base Geometry'},
    ])
high_alt_map.save('data/query_map.html')

In [None]:
high_alt_map