Author: Luca Pappalardo

Geospatial Analytics, Master degree in Data Science and Business Informatics, University of Pisa

# Geospatial Analytics - Lesson 2: Fundamental Concepts

In this lesson, we will learn how to handle spatial data in Python using Shapely, Geopandas, and scikit-mobility.

1. [Shapely](#shapely)
    - [Point](#point)
    - [LineString](#linestring)
    - [Polygon](#polygon)
    - [Geometry Collections](#geometrycollections)
    - [Practice](#practice)
2. [Geopandas](#geopandas)
    - [Reading a shapefile](#readingshapefile)
    - [Geometries](#geometriesgeopandas)
    - [Writing data](#writingdata)
    - [Practice](#practicegeopandas)
3. [Folium](#folium)
    - [Creating a map](#creatingmap)
    - [Markers](#markers)
    - [Overlays](#overlays)
4. [scikit-mobility](#scikitmobility)
    - [Data Structures](#datastructures)
        - [TrajDataFrame](#trajdataframe)
        - [Tessellation](#tessellation)
        - [FlowDataFrame](#flowdataframe)

<a id='shapely'></a>
# Shapely

In [**Shapely**](https://shapely.readthedocs.io/en/stable/manual.html), the most fundamental geometric objects are `Points`, `Lines` and `Polygons`, the basic ingredients when working with spatial data in vector data format. 

Python has a specific module called `Shapely` for doing various geometric operations. Basic knowledge of using Shapely is fundamental for understanding how geometries are stored and handled in `GeoPandas` and `scikit-mobility`.

Geometric objects consist of coordinate tuples where:

- `Point` object: represents a single point in space. Points can be either two-dimensional $(x, y)$ or three dimensional $(x, y, z)$

- `LineString` object: (i.e., a line) represents a sequence of points joined together to form a line. Hence, a line consist of a list of at least two coordinate tuples

- `Polygon` object: represents a filled area that consists of a list of at least three coordinate tuples that forms the outerior ring and a (possible) list of hole polygons

It is also possible to have a collection of geometric objects (e.g., `Polygon`s with multiple parts):

- `MultiPoint` object: represents a collection of `Point`s and consists of a list of coordinate-tuples

- `MultiLineString` object: represents a collection of `LineString`s and consists of a list of line-like sequences

- `MultiPolygon` object: represents a collection of `Polygon`s that consists of a list of polygon-like sequences that construct from exterior ring and (possible) hole list tuples

![](https://autogis-site.readthedocs.io/en/latest/_images/SpatialDataModel.PNG)

<a id="point"></a>
## Point
Creating point is easy, you pass $x$ and $y$ coordinates into `Point()` object (+ possibly also $z$ coordinate):

In [None]:
# Import necessary geometric objects from shapely module
from shapely.geometry import Point, LineString, Polygon

In [None]:
# Create Point geometric object(s) with coordinates
point1 = Point(2.2, 4.2)
point2 = Point(7.2, -25.1)
point3 = Point(9.26, -2.456)
point3D = Point(9.26, -2.456, 0.57)

In [None]:
point1

In [None]:
type(point1)

The type of the point is Shapely’s `Point`. The point object is represented in a specific format based on GEOS C++ library that is one of the standard libraries behind various GIS, e.g., QGIS.

In [None]:
print(point1)

In [None]:
print(point3D)

### Point attributes and functions
`Point`s and other Shapely objects have useful built-in attributes and methods. Using the available attributes, we can for example extract the coordinate values of a `Point` and calculate the Euclidian distance between points.

`geom_type` attribute contains information about the geometry type of the Shapely object:

In [None]:
point1.geom_type

Extracting the coordinates of a `Point` can be done in a couple of different ways:

- `coords` attribute contains the coordinate information as a `CoordinateSequence` (list) which is another data type related to Shapely
- Using the attributes `x` and `y` to get the coordinates directly as plain decimal numbers

In [None]:
# Get xy coordinate tuple
list(point1.coords)

In [None]:
# Read x and y coordinates separately
x = point1.x
y = point1.y
x, y

It is also possible to calculate the distance between two objects using the `distance` method. In our example, the distance is calculated in a cartesian coordinate system. When working with real GIS data the distance is based on the used coordinate reference system. 

Let’s calculate the distance between `point1` and `point2`:

In [None]:
# Calculate the distance between point1 and point2
dist = point1.distance(point2)

# Print out a nicely formatted info message
print(f"Distance between the points is {dist} units")

<a id="linestring"></a>
## LineString
Creating `LineString` objects is fairly similar to creating Shapely `Point`s.

Now instead using a single coordinate-tuple we can construct the line using either a list of Shapely `Point` objects or pass the points as coordinate-tuples:

In [None]:
# Create a LineString from our Point objects
line = LineString([point1, point2, point3])

# It is also possible to produce the same outcome using coordinate tuples
line2 = LineString([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])

# Check if lines are identical
line == line2 

In [None]:
# Check data type of the line object
type(line)

In [None]:
# Check geometry type of the line object
line.geom_type

In [None]:
line

In [None]:
print(line)

### LineString attributes and functions
`LineString` object has many useful built-in attributes and functionalities. It is for instance possible to extract the coordinates or the length of a `LineString` (line), calculate the centroid of the line, create points along the line at specific distance, calculate the closest distance from a line to specified `Point` and simplify the geometry. 

We can extract the coordinates of a `LineString` similarly as with `Point`:

In [None]:
# Get xy coordinate tuples
list(line.coords)

If you would need to access all x-coordinates or all y-coordinates of the line, you can do it directly using the `xy` attribute:

In [None]:
# Extract x and y coordinates separately
xcoords = list(line.xy[0])
ycoords = list(line.xy[1])

print(xcoords)
print(ycoords)

It is possible to retrieve specific attributes such as the `length` of the line and center of the line (`centroid`) straight from the `LineString` object itself:

In [None]:
# Get the lenght of the line
l_length = line.length
print(f"Length of our line: {l_length} units")

In [None]:
# Get the centroid of the line
print(line.centroid)

<a id="polygon"></a>
## Polygon
Creating a `Polygon` object continues the same logic of how `Point` and `LineString` were created but `Polygon` object only accepts a sequence of coordinates as input.

`Polygon` needs at least three coordinate-tuples (three points are reguired to form a surface):

In [None]:
# Create a Polygon from the coordinates
poly = Polygon([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])

In [None]:
print(poly)

In [None]:
poly.area

In [None]:
# Data type
type(poly)

In [None]:
# Geometry type
poly.geom_type

In [None]:
poly

We can also use information from the Shapely `Point` objects created earlier, but we cannot use the `Point` objects directly. Instead, we need to get information of the $x,y$ coordinate pairs as a sequence. We can achieve this by using a list comprehension.

In [None]:
# Create a Polygon based on information from the Shapely points
poly2 = Polygon([[p.x, p.y] for p in [point1, point2, point3]])

In [None]:
poly2

In [None]:
poly == poly2

In [None]:
# Define the outer border
border = [(-180, 90), (-180, -90), (180, -90), (180, 90)]

In [None]:
# Outer polygon
world = Polygon(shell=border)
print(world)

In [None]:
world

### Polygon attributes and functions¶
We can again access different attributes directly from the `Polygon` object itself that can be really useful for many analyses, such as `area`, `centroid`, bounding box (`bounds`), `exterior`, and exterior-length (`exterior.length`). 

Here, we can see a few of the available attributes and how to access them:

In [None]:
# Print the outputs
print(f"Polygon centroid: {world.centroid}")
print(f"Polygon Area: {world.area}")
print(f"Polygon Bounding Box: {world.bounds}")
print(f"Polygon Exterior: {world.exterior}")
print(f"Polygon Exterior Length: {world.exterior.length}")

<a id="geometrycollections"></a>
## Geometry collections
In some occassions it is useful to store multiple geometries (for example, several points or several polygons) in a single feature. For example, when country is composed of several islands, the polygons share the same attributes on the country-level and it might be reasonable to store that country as geometry collection that contains all the polygons. The attribute table would then contain one row of information with country-level attributes, and the geometry related to those attributes would represent several polygons.

In Shapely, collections of `Point`s are implemented by using a `MultiPoint` object, collections of `LineString`s by using a `MultiLineString` object, and collections of `Polygon`s by a `MultiPolygon` object.

In [None]:
from shapely.geometry import Point, LineString, Polygon
from shapely.geometry import MultiPoint, MultiLineString, MultiPolygon

In [None]:
point1, point2, point3 = (2.2, 4.2), (7.2, -25.1), (9.26, -2.456)

# Create a MultiPoint object of our points 1,2 and 3
multi_point = MultiPoint([point1, point2, point3])

# It is also possible to pass coordinate tuples inside
multi_point2 = MultiPoint([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])

# We can also create a MultiLineString with two lines
line1 = LineString([point1, point2])
line2 = LineString([point2, point3])
multi_line = MultiLineString([line1, line2])

# Print object definitions
print(multi_point)
print(multi_line)

In [None]:
multi_point

In [None]:
multi_line

`MultiPolygon`s are constructed in a similar manner. Let’s create a bounding box for “the world” by combining two separate polygons that represent the western and eastern hemispheres.

In [None]:
# Let's create the exterior of the western part of the world
west_exterior = [(-180, 90), (-180, -90), (0, -90), (0, 90)]

# Let's create a hole --> remember there can be multiple holes, thus we need to have a list of hole(s). 
# Here we have just one.
west_hole = [[(-170, 80), (-170, -80), (-10, -80), (-10, 80)]]

# Create the Polygon
west_poly = Polygon(shell=west_exterior, holes=west_hole)

# Print object definition
print(west_poly)

In [None]:
west_poly

Shapely also has a tool for creating a bounding box based on minimum and maximum $x$ and $y$ coordinates. Instead of using the `Polygon` constructor, let’s use the box constructor for creating the polygon:

In [None]:
from shapely.geometry import box

In [None]:
# Specify the bbox extent (lower-left corner coordinates and upper-right corner coordinates)
min_x, min_y = 0, -90
max_x, max_y = 180, 90

# Create the polygon using Shapely
east_poly = box(minx=min_x, miny=min_y, maxx=max_x, maxy=max_y)

# Print object definition
print(east_poly)

In [None]:
east_poly

Finally, we can combine the two polygons into a `MultiPolygon`:

In [None]:
# Let's create our MultiPolygon. We can pass multiple Polygon -objects into our MultiPolygon as a list
multi_poly = MultiPolygon([west_poly, east_poly])

# Print object definition
print(multi_poly)

In [None]:
multi_poly

We can check if we have a "valid" `MultiPolygon`, i.e., if the individual polygons does notintersect with each other. Here, because the polygons have a common 0-meridian, we should NOT have a valid polygon. 

We can check the validity of an object from the `is_valid` attribute that tells if the polygons or lines intersect with each other. This can be really useful information when trying to find topological errors from your data:

In [None]:
print(f"Is polygon valid?: {multi_poly.is_valid}")

### Convex hull and envelope
Convex hull refers to the smalles possible polygon that contains all objects in a collection. Alongside with the minimum bounding box, convex hull is a useful shape when aiming to describe the extent of your data.

In [None]:
# Check input geometry
multi_point

In [None]:
# Convex Hull (smallest polygon around the geometry collection)
multi_point.convex_hull

In [None]:
# Envelope (smalles rectangular polygon around the geometry collection): 
multi_point.envelope

<a id="practice"></a>
## Practice

### Practice 1
Plot these shapes using Shapely!

- Pentagon, example coords: $(30, 2.01), (31.91, 0.62), (31.18, -1.63), (28.82, -1.63), (28.09, 0.62)$
- Triangle
- Square
- Circle

In [None]:
pentagon = Polygon([(30, 2.01), (31.91, 0.62), (31.18, -1.63), (28.82, -1.63), (28.09, 0.62)])
pentagon

In [None]:
triangle = Polygon([(0,0), (1, 1), (1, 0)])
triangle

In [None]:
square = Polygon([(0,0), (0, 1), (1, 1), (1, 0)])
square

In [None]:
# Circle (using a buffer around a point)
point = Point((0,0))
point.buffer(1)

### Practice 2

In this problem you will create custom-made functions for creating geometries. We start with a very simple function, and proceed to creating functions that can handle invalid input values.

1. Create a function called `create_point_geom()` that has two parameters (`x_coord`, `y_coord`). The function should create and return a shapely `Point` geometry object.

In [None]:
def create_point_geom(x_coord, y_coord):
    return Point(x_coord, y_coord)

Test your function by running these code cells:

In [None]:
# Demonstrate the usage of the function
point1 = create_point_geom(0.0, 1.1)

In [None]:
print(point1)
print(point1.geom_type)

2. Create a function called `create_line_geom()` that takes a list of Shapely `Point` objects as parameter called points and returns a `LineString` object of those input points. In addition, you should take care that the function is used as it should:

    - Inside the function, you should first check with `assert` functionality that the input is a list. If something else than a list is passed for the function, you should return an error message: `"Input should be a list!"`
    - You should also check with `assert` that the input list contains at least two values. If not, return an error message: `"LineString object requires at least two Points!"`

In [None]:
def create_line_geom(points_list):
    assert isinstance(points_list, list), "Input should be a list!"
    assert len(points_list) >= 2, "LineString object requires at least two Points!"
    return LineString(points_list)

In [None]:
create_line_geom([(1, 2), (2, 2)])

Create a line object with two points: `Point(45.2, 22.34)` and `Point(100.22, -3.20)` and store the result in a variable called `line1`:

In [None]:
point1, point2 = Point(45.2, 22.34), Point(100.22, -3.20)
line1 = create_line_geom([point1, point2])

Run these code cells to check your solution:

In [None]:
print(line1)
print(line1.geom_type)

3. Create a function called `create_poly_geom()` that has one parameter called `coords`, which should containt a list of coordinate tuples. The function should create and return a `Polygon` object based on these coordinates.

    - Inside the function, you should first check with `assert` that the input is a list. If something else than a list is passed for the function, you should return an error message: `"Input should be a list!"`
    - You should also check with `assert` that the input list contains at least three values. If not, return an error message: `"Polygon object requires at least three Points!"`
    - Check the data type of the objects in the input list. All values in the input list should be tuples. If not, return an error message: `"All list values should be coordinate tuples!"` using `assert`.

In [None]:
def create_poly_geom(coords):
    assert isinstance(coords, list), "Input should be a list!"
    assert len(coords) >= 3, "Polygon object requires at least three Points!"
    for p in coords:
        assert isinstance(p, tuple), "All list values should be coordinate tuples!" 
    return Polygon(coords)

In [None]:
create_poly_geom([(1, 2), (2, 2), (3, 3), 4])

Demonstrate the usage of the function. For example, create a `Polygon` with three points: `(45.2, 22.34)`, `(100.22, -3.20)`,  `(70.0, 10.20)`.

In [None]:
coords = [(45.2, 22.34), (100.22, -3.20), (70.0, 10.20)]
poly1 = create_poly_geom(coords)

In [None]:
print(coords)
print(poly1)
print(poly1.geom_type)

### Practice 3
1. Create a function called `get_centroid()` that has one parameter called `geom`. The function should take any kind of Shapely's geometric object as an input, and return a centroid of that geometry. In addition, you should take care that the function is used as it should:

    - Inside the function, you should first check with `assert` that the input is a Shapely `Point`, `LineString` or `Polygon` geometry. If something else than a list is passed for the function, you should return an error message: `"Input should be a Shapely geometry!"`

In [None]:
def get_centroid(geom):
    assert isinstance(geom, Point) or isinstance(geom, LineString) or isinstance(geom, Polygon), "Input should be a Shapely geometry!"
    return geom.centroid

Test and demonstrate the usage of the function. You can, for example, create Shapely objects using the functions you created in the previous exercise and print out information about their centroids:

In [None]:
point1, point2 = Point(45.2, 22.34), Point(100.22, -3.20)
line1 = create_line_geom([point1, point2])
print(get_centroid(line1))

2. Create a function called `get_area()` with one parameter called `polygon`. Function should take a Shapely's `Polygon` object as input and returns the area of that geometry.

    - Inside the function, you should first check with `assert` that the input is a Shapely `Polygon` geometry. If something else than a list is passed for the function, you should return an error message: `"Input should be a Shapely Polygon object!"`

In [None]:
def get_area(polygon):
    return polygon.area

In [None]:
def create_poly_geom(coords):
    assert isinstance(coords, list), "Input should be a list!"
    assert len(coords) >= 3, "Polygon object requires at least three Points!"
    for p in coords:
        assert isinstance(p, tuple), "All list values should be coordinate tuples!" 
    return Polygon(shell=coords)

In [None]:
points_list = [(-180, 90), (-180, -90), (180, -90), (180, 90)]
poly1 = create_poly_geom(points_list)
print(poly1)

In [None]:
poly1.area

3. Create a function called `get_length()` with parameter called `geom`. The function should accept either a Shapely `LineString` or `Polygon` object as input. Function should check the type of the input and returns the length of the line if input is `LineString` and length of the exterior ring if input is `Polygon`. If something else is passed to the function, you should return an error `"'geom' should be either LineString or Polygon!"`. (Use `assert`).

In [None]:
def get_length(geom):
    if isinstance(geom, LineString) or isinstance(geom, Polygon):
        return geom.length
    print("'geom' should be either LineString or Polygon!")

In [None]:
line_length = get_length(line1)
print("Line length:", round(line_length,2))

In [None]:
poly_exterior_length = get_length(poly1)
print("Polygon exterior length:", round(poly_exterior_length,2))

<a id="geopandas"></a>
# Geopandas

[**Geopandas**](http://geopandas.org/) makes it possible to work with geospatial data in Python in a relatively easy way. Geopandas combines the capabilities of the data analysis library pandas with other packages like Shapely and fiona for managing spatial data.

The main data structures in geopandas are `GeoSeries` and `GeoDataFrame` which extend the capabilities of `Series` and `DataFrame`s from pandas. This means that we can use all our pandas skills also when working with geopandas!

The main difference between `GeoDataFrame`s and pandas `DataFrame`s is that a `GeoDataFrame` should contain one column for geometries. By default, the name of this column is `'geometry'`. The geometry column is a `GeoSeries` which contains the geometries (`Point`, `LineString`, `Polygon`) as shapely objects.

![](https://autogis-site.readthedocs.io/en/latest/_images/geodataframe.png)

In [None]:
import geopandas as gpd

<a id="readingshapefile"></a>
## Reading a Shapefile

In [None]:
fp = "data/L2_data/NLS/2018/L4/L41/L4132R.shp/m_L4132R_p.shp"
# Read file using gpd.read_file()
data = gpd.read_file(fp)
data.head()

In [None]:
type(data)

In [None]:
data.columns.values

As you might guess, the column names are in Finnish. Let’s select only the useful columns and rename them into English:

In [None]:
data = data[['RYHMA', 'LUOKKA',  'geometry']]

In [None]:
colnames = {'RYHMA':'GROUP', 'LUOKKA':'CLASS'}

In [None]:
data.rename(columns=colnames, inplace=True)

In [None]:
data.head()

Here we see that our data variable is a `GeoDataFrame`. `GeoDataFrame` extends the functionalities of `pandas.DataFrame` in a way that it is possible to handle spatial data using similar approaches and datastructures as in pandas (hence the name geopandas).

It is always a good idea to explore your data also on a map. Creating a simple map from a GeoDataFrame is really easy: you can use `.plot()` function from geopandas that creates a map based on the geometries of the data. Geopandas actually uses matplotlib for plotting.

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig = plt.figure(figsize=(6, 6))
ax = plt.axes()
data.plot(ax=ax)

<a id="geometriesgeopandas"></a>
## Geometries in Geopandas
Geopandas takes advantage of Shapely’s geometric objects. Geometries are stored in a column called geometry that is a default column name for storing geometric information in geopandas.

In [None]:
data['geometry'].head()

The geometry column contains familiar looking values, namely Shapely `Polygon` objects. Since the spatial data is stored as Shapely objects, it is possible to use Shapely methods when dealing with geometries in geopandas.

In [None]:
# Access the geometry on the first row of data
data.at[0, "geometry"]

In [None]:
# Print information about the area 
print("Area:", round(data.at[0, "geometry"].area, 0), "square meters")

Iterate over the GeoDataFrame rows using the `iterrows()` function. For each row, print the area of the polygon:

In [None]:
for i, row in data.iterrows():
    area = row['geometry'].area
    print("Area:", round(area, 0), "square meters")

As you see from here, all pandas methods, such as the `iterrows()` function, are directly available in Geopandas without the need to call pandas separately because Geopandas is an extension for pandas.

In practice, it is not necessary to use the `iterrows()` approach to calculate the area for all features. Geodataframes and geoseries have an attribute area which we can use for accessing the area for each feature at once:

In [None]:
data.area

Let’s next create a new column into our GeoDataFrame where we calculate and store the areas of individual polygons:

In [None]:
# Create a new column called 'area' 
data['area'] = data.area

In [None]:
data.head()

<a id="writingdata"></a>
## Writing data into a shapefile
It is possible to export `GeoDataFrame`s into various data formats using the `to_file()` method. In our case, we want to export subsets of the data into Shapefiles (one file for each feature class).

Let’s first select one class (class number 36200, “Lake water”) from the data as a new `GeoDataFrame`:

In [None]:
# Select a class
selection = data.loc[data["CLASS"]==36200]

In [None]:
fig = plt.figure(figsize=(6, 6))
ax = plt.axes()
selection.plot(ax=ax)

write this layer into a new Shapefile using the `gpd.to_file()` function:

In [None]:
# Create a output path for the data
output_fp = "created_files/Class_36200.shp"
# Write those rows into a new file (the default output file format is Shapefile)
selection.to_file(output_fp)

In [None]:
temp = gpd.read_file(output_fp)
temp.head()

In [None]:
data.to_file('created_files/Class_36200.geojson', driver='GeoJSON')

In [None]:
temp = gpd.read_file('created_files/Class_36200.geojson')
temp.head()

In [None]:
import geopandas as gpd
from fiona.drvsupport import supported_drivers

# Check supported format drivers
supported_drivers

<a id="practicegeopandas"></a>
## Practice

### Practice 4: Points to map
The aim is to plot a map based on a set of longitude and latitude coordinates that are stored in a csv file. The coordinates are in WGS84 decimal degrees (`EPSG:4326`), and the data is stored in `some_posts.csv` comma separated file in the folder data.

1. Import the needed modules
    - Read the data from `some_posts.csv` into a pandas `DataFrame` called `data`
    - Create an empty column called `geometry` where you will store shapely `Point` objects
    - Insert `Point` objects into the column `geometry` based on the coordinate columns

In [None]:
import pandas as pd
import geopandas as gpd

In [None]:
df = pd.read_csv('data/some_posts.csv')
df.head()

In [None]:
df['geometry'] = None

In [None]:
df.head()

In [None]:
points = []
for i, row in df.iterrows():
    points.append(Point(row['lat'], row['lon']))
df['geometry'] = gpd.GeoSeries(points)

In [None]:
df.head()

Next:
- Convert that `DataFrame` into a `GeoDataFrame` using the `geopandas.GeoDataFrame` constructor
- Update the CRS for coordinate system as WGS84 (i.e. epsg code: 4326)
- Save the data into a Shapefile called `Kruger_posts.shp`

In [None]:
data = gpd.GeoDataFrame(df)
data.crs = 'epsg: 4326'
print(type(data))

In [None]:
data.to_file('created_files/Kruger_posts.shp')

Finally:
- Create a simple map of the points using the `plot()` function.

In [None]:
fig = plt.figure(figsize=(6, 6))
ax = plt.axes()
data.plot(ax=ax)

<a id="folium"></a>
## Folium

[**folium**](http://python-visualization.github.io/folium/) builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the `leaflet.js` library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium.

Concepts
folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map.

The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON and TopoJSON overlays.

See also [this link](https://python-visualization.github.io/folium/quickstart.html#).

<a id='creatingmap'></a>
### Creating a map

To create a base map, simply pass your starting coordinates to Folium:

In [1]:
import folium

In [4]:
map_f = folium.Map(location=[45.5236, -122.6750])

To display it in a Jupyter notebook, simply ask for the object representation:

In [5]:
map_f

to save it in a file,

In [7]:
map_f.save("index.html")

The default tiles are set to OpenStreetMap, but Stamen Terrain, Stamen Toner, Mapbox Bright, and Mapbox Control Room, and many others tiles are built in.

In [12]:
folium.Map(location=[45.5236, -122.6750], tiles='StamenTerrain')

In [13]:
folium.Map(location=[45.5236, -122.6750], zoom_start=13)

<a id='markers'></a>
### Markers
There are numerous marker types, starting with a simple Leaflet style location marker with a popup and tooltip HTML.

In [19]:
map_f = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")

In [20]:
tooltip = "Click me!"
folium.Marker([45.3288, -121.6625], popup="<i>Mt. Hood Meadows</i>", tooltip=tooltip).add_to(map_f)
folium.Marker([45.3311, -121.7113], popup="<b>Timberline Lodge</b>", tooltip=tooltip).add_to(map_f)

<folium.map.Marker at 0x163962880>

In [21]:
map_f

There is built in support for colors and marker icon types from bootstrap.

In [22]:
map_f = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")

In [23]:
folium.Marker(
    location=[45.3288, -121.6625],
    popup="Mt. Hood Meadows",
    icon=folium.Icon(icon="cloud"),
).add_to(map_f)

folium.Marker(
    location=[45.3311, -121.7113],
    popup="Timberline Lodge",
    icon=folium.Icon(color="green"),
).add_to(map_f)

folium.Marker(
    location=[45.3300, -121.6823],
    popup="Some Other Location",
    icon=folium.Icon(color="red", icon="info-sign"),
).add_to(map_f)

<folium.map.Marker at 0x16397b5e0>

In [24]:
map_f

Leaflet’s `Circle` and `CircleMarker`, implemented to reflect radii in units of meters and pixels respectively, are available as features.

In [25]:
map_f = folium.Map(location=[45.5236, -122.6750], tiles="Stamen Toner", zoom_start=13)

In [28]:
folium.Circle(
    radius=100,
    location=[45.5244, -122.6699],
    popup="The Waterfront",
    color="crimson",
    fill=False,
).add_to(map_f)

folium.CircleMarker(
    location=[45.5215, -122.6261],
    radius=50,
    popup="Laurelhurst Park",
    color="#3186cc",
    fill=True,
    fill_color="#3186cc",
).add_to(map_f)

<folium.vector_layers.CircleMarker at 0x163990580>

In [29]:
map_f

a convenience function to enable lat/lon popovers. This can help users to find a location by interactively browsing the map.

In [30]:
map_f = folium.Map(location=[46.1991, -122.1889], tiles="Stamen Terrain", zoom_start=13)
map_f.add_child(folium.LatLngPopup())
map_f

and click-for-marker functionality for on-the-fly placement of markers:

In [31]:
map_f = folium.Map(location=[46.8527, -121.7649], tiles="Stamen Terrain", zoom_start=13)
folium.Marker([46.8354, -121.7325], popup="Camp Muir").add_to(map_f)
map_f.add_child(folium.ClickForMarker(popup="Waypoint"))
map_f

<a id="overlays"></a>
### Overlays

Both GeoJSON and TopoJSON layers can be passed to the map as an overlay, and multiple layers can be visualized on the same map:

In [32]:
url = (
    "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
antarctic_ice_edge = f"{url}/antarctic_ice_edge.json"
antarctic_ice_shelf_topo = f"{url}/antarctic_ice_shelf_topo.json"

In [36]:
import json
import requests

In [40]:
map_f = folium.Map(
    location=[-59.1759, -11.6016],
    tiles="cartodbpositron",
    zoom_start=2,
)

folium.GeoJson(antarctic_ice_edge, name="geojson").add_to(map_f)

folium.TopoJson(
    json.loads(requests.get(antarctic_ice_shelf_topo).text),
    "objects.antarctic_ice_shelf",
    name="topojson",
).add_to(map_f)

folium.LayerControl().add_to(map_f)

<folium.map.LayerControl at 0x162e0c790>

In [41]:
map_f

<a id="cloroplethmaps"></a>
### Cloropleth maps

In [42]:
import pandas as pd

In [43]:
url = (
    "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
state_geo = f"{url}/us-states.json"
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)

In [45]:
map_f = folium.Map(location=[48, -102], zoom_start=3)

folium.Choropleth(
    geo_data=state_geo,
    name="choropleth",
    data=state_data,
    columns=["State", "Unemployment"],
    key_on="feature.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Unemployment Rate (%)",
).add_to(map_f)

folium.LayerControl().add_to(map_f)

<folium.map.LayerControl at 0x162d40d60>

In [46]:
map_f

<a id="scikitmobility"></a>
# scikit-mobility

![GitHub Repo stars](https://img.shields.io/github/stars/scikit-mobility/scikit-mobility?style=social)
![GitHub](https://img.shields.io/github/license/scikit-mobility/scikit-mobility)
![GitHub release (latest by date)](https://img.shields.io/github/v/release/scikit-mobility/scikit-mobility)

[**scikit-mobility**](https://github.com/scikit-mobility/scikit-mobility) is a python library that provides scientists and practitioners with an environment to:

1. load and represent mobility data, both at the individual and the collective level, through easy-to-use data structures
(`TrajDataFrame` and `FlowDataFrame`); 
2. visualize trajectories and flows on interactive maps;
3. clean and preprocess mobility data using state-of-the-art techniques, such as trajectory clustering, compression, segmentation, and filtering;
4. analyze mobility data by using the main measures characterizing mobility patterns both at the individual and at the collective level, such as the computation of travel and characteristic distances, object and location entropies, location frequencies, waiting times, origin-destination matrices, and more;
4. run the most popular mechanistic generative models to simulate individual mobility, such as the Exploration and Preferential Return model (EPR) and its variants, and commuting and migratory flows, such as the Gravity
Model and the Radiation Model;
5. estimate the privacy risk associated with the analysis of a given mobility dataset through the simulation of the reidentification risk associated with a vast repertoire of privacy attacks.

- scikit-mobility is publicly available on GitHub at the following link: https://scikit-mobility.github.io/scikit-mobility/. 

- the documentation describing all the classes and functions of scikit-mobility
is available at https://scikit-mobility.github.io/scikit-mobility/.

The paper describing scikit-mobility may be found at: https://www.jstatsoft.org/article/view/v103i04

In [6]:
# import the library
import skmob

skmob.__version__

import pandas as pd
import geopandas as gpd

<a id="datastructures"></a>
## Data Structures

scikit-mobility provides two data structures to deal with raw trajectories and flows between places: 
- `TrajDataFrame`, for spatio-temporal trajectories; 
- `FlowDataFrame`, for mobility flows.

Both the data structures are an extension of the DataFrame implemented in the data analysis library [pandas](https://pandas.pydata.org/). Thus, both `TrajDataFrame`
and `FlowDataFrame` inherit all the functionalities provided by the `DataFrame` as well as all the efficient optimizations for reading and writing tabular data (e.g., mobility datasets). 

The current version of the library is designed to work with the latitude and longitude system (`epsg:4326`). Therefore, the Haversine formula is used by default when the library’s functions compute distances. 

<a id="trajdataframe"></a>
### The `TrajDataFrame`

Mobility data describe the movements of a set of objects during a period of observation. The objects may represent individuals, private vehicles, boats, and even players on a sports field. 

Mobility data are generally collected in an automatic way as a by-product of human activity on electronic devices (e.g., mobile phones, GPS devices, social
networking platforms, video cameras) and stored as trajectories, a temporally ordered sequence of spatio-temporal points where an object stopped in or went through. 

A `TrajDataFrame` is an extension of the pandas DataFrame that has specific columns names and data types. Each row in a `TrajDataFrame` describes a trajectory's point and contains the following columns:

- `lat` - latitude of the point
- `lng` - longitude of the point
- `datetime` - date and time of the point

For multi-user data sets, there are two optional columns:

- `uid` - user's identifier to which the trajectory belongs to
- `tid` - identifier for the trajectory


#### Creating a `TrajDataFrame`

A `TrajDataFrame` can be created from:

- a python list or numpy array
- a python dictionary
- a pandas DataFrame
- a text file

#### From a python list

In [None]:
# From a list
data_list = [[1, 39.984094, 116.319236, '2008-10-23 13:53:05'],
             [1, 39.984198, 116.319322, '2008-10-23 13:53:06'],
             [1, 39.984224, 116.319402, '2008-10-23 13:53:11'],
             [1, 39.984211, 116.319389, '2008-10-23 13:53:16']]
data_list

In [None]:
type(data_list)

We must set the indexes of the mandatory columns using arguments `latitude`, `longitude` and `datetime`.

In [None]:
tdf = skmob.TrajDataFrame(data_list, 
                          latitude=1, longitude=2, 
                          datetime=3)
print(type(tdf))
tdf

##### From a pandas DataFrame

In [None]:
# build a dataframe from the 2D list
data_df = pd.DataFrame(data_list, columns=['user', 'lat', 'lng', 'hour'])
print(type(data_df)) # type of the structure
data_df.head() # head of the DataFrame

Note that:

- the name of columns in `data_df` do not match the names required
- you must specify the names of the mandatory columns using arguments `latitude`, `longitude` and `datetime`

In [None]:
# Create a TrajDataFrame from a DataFrame
tdf = skmob.TrajDataFrame(data_df, datetime='hour', user_id='user')
print(type(tdf))
tdf.head()

Columns of a `TrajDataFrame` have specific types

In [None]:
# In the DataFrame
print(type(data_df))
data_df.dtypes

In [None]:
print(type(tdf)) # In the TrajDataFrame
tdf.dtypes

In [None]:
tdf['lat'].head()

##### From an URL

In [None]:
# create a TrajDataFrame from a dataset of trajectories 
url = "https://github.com/scikit-mobility/tutorials/raw/master/mda_masterbd2020/data/geolife_sample.txt.gz"
tdf = skmob.TrajDataFrame.from_file(url)
print(type(tdf))
tdf.head()

#### Attributes of a TrajDataFrame
- `crs`: the coordinate reference system. Default: epsg:4326 (lat/long)
- `parameters`: dictionary to add as many as necessary additional properties

In [None]:
## wsg84 datum
print(tdf.crs)
print(tdf.parameters)

In [None]:
# add your own parameter
tdf.parameters['analyzed'] = 1
tdf.parameters

In [None]:
### Visualizing a `TrajDataFrame`
tdf.plot_trajectory()

<a id="tessellation"></a>
### Tessellation
In mobility tasks, the geography is often discretized by mapping the coordinates to a *tessellation*, i.e., a covering of the
bi-dimensional space using a countable number of geometric shapes (e.g., squares, hexagons), called tiles, with no overlaps
and no gaps. 

For instance, for the analysis or prediction of mobility flows, a spatial tessellation is used to aggregate flows of people moving among locations (the tiles of the tessellation). 

#### Creating tessellations given a city name and a tile size

##### Squared tessellations

In [None]:
from skmob.tessellation.tilers import tiler
from skmob.utils.plot import plot_gdf

In [None]:
tess_squared = tiler.get('squared', base_shape='Florence, Italy', meters=1000)
print("tiles = %s" %len(tess_squared))
tess_squared.head()

In [None]:
plot_gdf(tess_squared, zoom=11)

In [None]:
tess_squared = tiler.get('squared', base_shape='Florence, Italy', meters=200)
print("tiles = %s" %len(tess_squared))
tess_squared.head()

In [None]:
plot_gdf(tess_squared, zoom=11)

##### Hexagonal tessellation

In [None]:
tess_h3 = tiler.get('h3_tessellation', base_shape='Florence, Italy', meters=1000)
print("tiles = %s" %len(tess_h3))
tess_h3.head()

In [None]:
plot_gdf(tess_h3, zoom=11)

In [None]:
tess_h3 = tiler.get('h3_tessellation', base_shape='Florence, Italy', meters=200)
print("tiles = %s" %len(tess_h3))
tess_h3.head()

In [None]:
plot_gdf(tess_h3, zoom=11)

##### Voronoi tessellations

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi,voronoi_plot_2d
from geovoronoi import voronoi_regions_from_coords, points_to_coords
import numpy as np
import random
from shapely.geometry import Polygon, Point
import folium

In [None]:
def get_convex_hull(tess):
    polygon = tess.at[0, 'geometry']
    for tile in tess['geometry']:
        polygon = polygon.union(tile)
    return polygon.convex_hull

In [None]:
poly_ch = get_convex_hull(tess_squared)
print(type(poly_ch))
poly_ch

In [None]:
# Defining the randomization generator
def polygon_random_points(poly, num_points):
    min_x, min_y, max_x, max_y = poly.bounds
    points = []
    while len(points) < num_points:
        random_point = Point([random.uniform(min_x, max_x), random.uniform(min_y, max_y)])
        if (random_point.within(poly)):
            points.append([random_point.x, random_point.y])
    return np.array(points)

In [None]:
# Choose the number of points desired. This example uses 20 points. 
points = polygon_random_points(poly_ch, 5)
points[:10]

In [None]:
def to_GeoDataFrame(region_polys):
    name=[]
    for i in range(1, len(region_polys) + 1):
        name.append('cell ' + str(i))
    gdf = gpd.GeoDataFrame(columns=['name','geometry'], crs={'init': 'epsg:4326'})
    gdf['name'] = name
    for index, row in gdf.iterrows():
        gdf.at[index, 'geometry'] = region_polys[index]
    return gdf


def get_voronoi_tessellation(poly_ch, points):
    vor = Voronoi(points, qhull_options='Qbb Qc Qx')
    #fig = voronoi_plot_2d(vor)
    #plt.show()
    region_polys, region_pts = voronoi_regions_from_coords(points, poly_ch)
    tess_voronoi = to_GeoDataFrame(region_polys)
    return tess_voronoi

In [None]:
tess_voronoi = get_voronoi_tessellation(poly_ch, points)
tess_voronoi.head()

In [None]:
plot_gdf(tess_voronoi, zoom=12)

<a id="flowdataframe"></a>
### The `FlowDataFrame`

Origin-destination matrices, aka *flows*, are another common representation of mobility data. While trajectories refer to movements of single objects, flows refer to aggregated movements of objects between a set of locations. An example of flows is the daily commuting flows between the neighbourhoods of a city.

In scikit-mobility, an origin-destination matrix is described by a `FlowDataFrame`, an extension of the pandas DataFrame that has specific column names and data types. 

A row in a `FlowDataFrame` represents a flow of objects between two locations, described by three mandatory columns: 
- `origin` (any type), 
- `destination` (any type),
- `flow` (type: integer). 

In mobility tasks, the geography is often discretized by mapping the coordinates to a *tessellation*, i.e., a covering of the
bi-dimensional space using a countable number of geometric shapes (e.g., squares, hexagons), called tiles, with no overlaps
and no gaps. 

For instance, for the analysis or prediction of mobility flows, a spatial tessellation is used to aggregate flows of people moving among locations (the tiles of the tessellation). 

For this reason, each `FlowDataFrame` is associated with a spatial tessellation, a [geopandas](https://geopandas.org/) GeoDataFrame that contains two mandatory columns: 
- `tile_ID` (any type) indicates the identifier of
a location; 
- `geometry` indicates the geometric shape that describes the location on a territory (e.g., a square, an hexagon, the shape of a neighborhood).

Each location identifier in the origin and destination columns of a `FlowDataFrame` must be present in the associated spatial tessellation. Otherwise, the library raises an exception. 

Similarly, scikit-mobility raises an exception if the type of the `origin` and `destination` columns in the `FlowDataFrame` and the type of
the `tile_ID` column in the associated tessellation are different.

#### Creating a `FlowDataFrame`

Each `FlowDataFrame` goes in companion with a spatial tessellation. So, we must first create/upload a spatial tessellation, which as geopandas GeoDataFrame.



In [None]:
url = "https://raw.githubusercontent.com/scikit-mobility/tutorials/master/mda_masterbd2020/data/NY_counties_2011.geojson"
tessellation = gpd.read_file(url) # load a tessellation
tessellation.head()

In [None]:
plot_gdf(tessellation, zoom=6)

#### Tip
Once you have a `GeoDataFrame` or a `GeoSeries` (i.e., just the `geometry` column), you can construct a squared tessellation on it.
(There's a bug instead for the h3 tessellation).

In [None]:
ny_tess_squared = tiler.get('squared', base_shape=tessellation, meters=10000)
print("tiles = %s" %len(ny_tess_squared))
ny_tess_squared.head()

In [None]:
plot_gdf(ny_tess_squared, zoom=7)

Then, we can create a `FlowDataFrame` from a file/url, specifying the spatial tessellation it refers to using argument `tessellation`. 

Also, you must specify the name of the column in the tessellation `GeoDataFrame` containing the identifier of the locations.

In [None]:
url = "https://github.com/scikit-mobility/tutorials/raw/master/mda_masterbd2020/data/NY_commuting_flows_2011.csv"
fdf = skmob.FlowDataFrame.from_file(url, tessellation=tessellation, tile_id='tile_id')
fdf.head()

In [None]:
fdf.dtypes

In [None]:
type(fdf)

You can access the spatial tessellation associated with the created `FlowDataFrame` using the attribute `.tessellation`.

In [None]:
# The tessellation is an attribute of the FlowDataFrame
fdf.tessellation.head()

In [None]:
fdf['origin'].unique()

In [None]:
tessellation['tile_id'].unique()

In [None]:
fdf.plot_flows()

In [None]:
fdf.plot_tessellation()

In [None]:
map_f = fdf.plot_tessellation()
fdf.plot_flows(map_f=map_f)