In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import plotly.express as px
import os
import numpy.linalg as la

import sklearn.linear_model

import matplotlib.pyplot as plt
%matplotlib inline

# Trip distribution

The purpose of this notebook is to learn how to work with simple gravity models for trip distribution.

First, let's load the population, commuting, and municipality data from the first exercise:

In [None]:
df_employment = pd.read_parquet("employment.parquet")
df_commutes = pd.read_parquet("commutes.parquet")

# geopandas
df_municipalities = gpd.read_parquet("municipalities.parquet")

**Task**: As before, reduce all data sets to the area of Île-de-France.

In [None]:
idf_departments = ["75", "92", "93", "94", "95", "77", "91", "78"]

In [None]:
# Insert code here ...
# ...


**Task**: Keeping track of the order of the data will be important. Set up a fixed list of municipalities and adjust the indices of all data sets. Especially, take care of the commuting data set.

Hint: Make use of `pd.MultiIndex.from_product`

In [None]:
# Insert code here ...
# ...

# municipalities = 

# df_emloyment = 
# df_commutes = 


Have a look at your data sets after reindexing, do you notice anything special?

**Task**: How many flow values can we theoretically have (between all zones) in Île-de-France? For how many do we have actual values?

In [None]:
# Insert code here ...
# ...


**Task**: Replace missing values by zero (zero commuters).

In [None]:
# Insert code here ...
# ...


## Friction term

The first step in setting up our model is to obtain the friction term.

**Task**: The gravity model puts into relation different places in the study area. The friction term describes how easy it is to reach one municipality from another one. The first step is, therefore, to obtain the distances between all zones. Complete the following code to set up a distance matrix `distance_matrix`.

In [None]:
centroids = df_municipalities["geometry"].centroid
centroids = np.array([centroids.x, centroids.y] ).T

distance_matrix = np.zeros((len(municipalities), len(municipalities)))

for k in range(len(municipalities)):
    ###  Insert code here
    # ...
    
    # distance_matrix[k,:] = # Calculate the Euclidean distance, you may also try numpy.linalg.norm
    

**Task:** Plot the distance matrix (it may take a while) using `matplotlib`'s `pcolor`.

In [None]:
# Insert code here ...
# ...


**Task**: Analogously to the distance matrix, we need a flow matrix indicating all observed movements (`weight`) between all zones. Obtain this matrix by transforming the commuting data set into a matrix.

Hint: Have a look at `numpy.ndarray.reshape`

In [None]:
# Insert code here ...
# ...

# flow_matrix = 


**Task**: Now we obtain the data to set up the friction model:
- Bin the distances into about twenty distance bins and sum up the commuters you find in each distance bin
- Plot how much flow occurs at every distance bin

In [None]:
df_friction = pd.DataFrame({
    "distance": distance_matrix.flatten(),
    "flow": flow_matrix.flatten()
})

distance_classes = np.arange(20) * 5000

# Hint: Check numpy.digitize

# Insert code here ...
# ...


**Task**: Now divide the obtained flow in each bin by the total flow, to obtain an empirical probability density function (pdf). Plot the function in absolute coordinates and with the probability logarithmized. What do you observe?

In [None]:
# Insert code here ...
# ...

# pdf = 


**Task**: In logarithmic space, manually (or automatically, if you like), fit a linear function on the graph that you see.

In [None]:
# Insert code here ...
# ...

# a = ?
# b = ?

# logy = a + b * np.log(pdf)


**Task**: Now plot the initial data along with your fitted curve in linear space. How does you friction model look like?

In [None]:
# Insert code here ...
# ...


**Task**: Based on your distance matrix and your friction model, calculate a friction matrix:

In [None]:
# Insert code here ...
# ...

# friction_matrix = 


## Single-constrained gravity model

Based on the friction model, it is now possible to set up a single-constrained gravity model.

As in the example during the lecture, we assume the following attraction model:

$$
A_i = w_i^\lambda
$$

with $w_i$ being the employment in zone $i$ and $\lambda$ the parmeter we need to obtain.

**Task**: For a parameter of $\lambda = 0.5$ calculate the attraction term. Treat NaN values as "no emloyment" (= 0).

In [None]:
# Insert code here ...
# ...

# attraction = 


**Task**: The single-constrained gravity model is defined as

$$
F_{ij} = \frac{A_j \cdot \rho_{ij}}{\sum_j A_j \cdot \rho_{ij}} O_i
$$

The friction term $\rho_{ij}$ is already known in our example. $A_j$ has been calculated in the last task for one specific parameter $\lambda$. As the next exercise, calculate $F_{ij}$ according to the formula above.

In [None]:
origins = np.sum(flow_matrix, axis = 1)

F = np.zeros((len(municipalities), len(municipalities)))

# Insert code here ...
# ...

#for i in range(len(municipalities)):
    # nominator =
    # denominator =

    # F[i,:] = 


**Task**: Create a scatter plot where you compare the obtained flows $F_{ij}$ with the reference flows. Think about how to reshape the matrix.

In [None]:
df_comparison = df_commutes.copy()

# Insert code here ...
# ...


**Task**: Now wrap the code of the last cells in a loop and test various values for $\lambda$. Plot the difference with the reference data $\sum_{ij} (F_{ij} - \hat F_{ij})^2$ as a function of $\lambda$

In [None]:
# Insert code here ...
# ...

# lambdas = np.linspace(0.1, 2.0, 10)
# objectives = []

# for lbda in lambdas:
    # ...


**Task**: Using the obtained parameter, calculate the resulting flow matrix from your model, then perform a scatter plot again to see the model fit. 

In [None]:
# Insert code here ...
# ...


What do you observe? Which municipalities could be the outliers on the bottom?

**Task**: Provide the same plot and color the flows in red which go from one municipality to itself.

In [None]:
# Insert code here ...
# ...


**Task**: Try to estimate a new model using the following modified friction term:

$$
F_{ij} = \begin{cases}
    w_i^\lambda & \text{if} i \neq j \\
    w_i^\lambda + \gamma & \text{if} i = j
\end{cases}
$$

Which parameters $\lambda$ and $\gamma$ work best?

Hint: Keep your existing friction matrix in `friction_matrix` and create new matrices on the fly for testing.

In [None]:
# Insert code here ...
# ...


**Task**: Show the fit of your new model in a scatter plot.

In [None]:
# Insert code here ...
# ...


**Task**: Create also scatter plots for the fit in terms of origin counts and destination counts. What do you expect? What do you observe?

In [None]:
# Insert code here ...
# ...


What do you observe?

## Double-constrained gravity model

Let's move on to the double-constrained model. In that model, both the origin and destination flows $O_i$ and $D_j$ are known and we aim to automatically find the attraction and production terms $A_j$ and $P_i$.

The model is defined as follows:

$$
F_{ij} = \frac{O_i \cdot D_i}{(\sum_i P_i \cdot \rho_{ij})\cdot (\sum_j A_j \cdot \rho_{ij})}
$$

the attraction and production terms are obtained by iteratively executing:

$$
P_i = \frac{O_i}{\sum_j A_j \cdot \rho_{ij}}
$$
$$
A_j = \frac{D_j}{\sum_i P_i \cdot \rho_{ij}}
$$

**Task**: Implement the double-constrained gravity model to calculate the production and attraction terms.

In [None]:
### Insert code here
# ...

# origins = # Format properly
# destinations = # Format properly

# production = # Initialize to one
# attraction = # Initialize to one

# for iteration in range(500):
#    for i in range(len(municipalities)):
        # ...

#    for j in range(len(municipalities)):
        # ...


**Task**: Extend the example from above and plot the sum of the attraction term and the sum of the production term over the iterations. What do you observe? Do they stabilize?

In [None]:
### Insert code here
# ...


**Task**: Calculate the resulting flows of your model. Compare the flows with the reference data, and also compare origin and destination flows in two additional plots.

In [None]:
### Insert code here
# ...


**Task**: Do you remeber the initial data frame `df_commutes`? Add a new column to this data frame into which you write your latest modeling results. Show the data frame.

In [None]:
### Insert code here
# ...


**Congratulations!** You can now solve exercise 2.3 of the course project.