# Node radius study

This notebook aims to perform some research about *how can we approximate a node radius*.

**Why ?** Finding the radius of a node is important in order to be able to associate a tweet with a node without too much error. A radius too large would potentially collect too many tweet and biase the analysis, whereas a radius too small would lose some information about potential flows. 

## Techniques


### Exact boundaries

One could be wondering why looking for an approximation of the node surface, when there are plenty of useful online tools and APIs in order to get the exact boundaries of a city.

Let's look at this [Lausanne map](http://nominatim.openstreetmap.org/search.php?q=lausanne&polygon=1&viewbox=) for example. As we can see, it's difficult to find a general shape of Lausanne's borders.
Furthermore, we have to keep in mind that our ultimate goal is not to exaclty locate tweets in their respective cities, but more to associate each tweet with a global node of Switzerland. In this case, we would like to associate tweets from Renens or Ecublens to the Lausanne node. But this is not possible using this technique given these cities are obviously out of the bounds.

### Approximation from exact boundaries

If it's not reasonnable to take into account the exact boundaries of a city, we could consider to find a way to extract an approximation from them.

One way could be to take the furthermost point of the boundary from the center, and take this distance as a radius for the node.

This time we can take a [Geneva map](http://nominatim.openstreetmap.org/search.php?q=geneve&polygon=1&viewbox=) as a bad example. This time we see that the boundaries are pretty circular, which is convenient. But if we use this tecnique, we will find ourselves taking into account the city of Gevena only, ommitting the around cities that could bring a lot of potential tweets for the same node.

### Approximation from the population

We use this [tool](http://obeattie.github.io/gmaps-radius/?lat=47.125275&lng=6.962085&z=11&u=km&r=5) to try to explore the different nodes and the best radius that could fit. 

In [1]:
import sys
sys.path.append('../swiss_flows')
from node import Node

In [2]:
nodes = Node.generate_nodes(n_swiss_nodes=15, n_foreign_nodes=3, pop_threshold=15000)

  df = df[df['feature code'].str.contains(r'PPL(A\d?|C)?$')]
  exec(code_obj, self.user_global_ns, self.user_ns)
  exec(code_obj, self.user_global_ns, self.user_ns)


In [5]:
for node in nodes:
    print(node)

[Node] Zurich, CH, ZH, (47.36667, 8.55), 15 km, 341730 people
[Node] Geneve, CH, GE, (46.202220000000004, 6.14569), 15 km, 183981 people
[Node] Basel, CH, BS, (47.55839, 7.57327), 15 km, 164488 people
[Node] Bern, CH, BE, (46.94809, 7.447439999999999), 15 km, 121631 people
[Node] Lausanne, CH, VD, (46.516000000000005, 6.63282), 15 km, 116751 people
[Node] Winterthur, CH, ZH, (47.50564, 8.72413), 15 km, 91908 people
[Node] Sankt Gallen, CH, SG, (47.42391, 9.37477), 15 km, 70572 people
[Node] Luzern, CH, LU, (47.05048, 8.30635), 15 km, 57066 people
[Node] Biel/Bienne, CH, BE, (47.13713, 7.24608), 15 km, 48614 people
[Node] Thun, CH, BE, (46.75118, 7.62166), 15 km, 42136 people
[Node] Koniz, CH, BE, (46.92436, 7.414569999999999), 15 km, 37196 people
[Node] La Chaux-de-Fonds, CH, NE, (47.09993, 6.8258600000000005), 15 km, 36825 people
[Node] Rapperswil, CH, SG, (47.225570000000005, 8.822280000000001), 15 km, 34776 people
[Node] Schaffhausen, CH, SH, (47.69732, 8.63493), 15 km, 33863 people

After trying different possibilities of radiuses in order to maximize the coverage of each node by taking into account the very close small cities, but staying in reasonnable bound for the actual city surface. 

If we try to correlate these surfaces with the city population above we get the following scheme : 

$$pop \geq 300.000 \leftrightarrow 12km$$
$$100.000 \leq pop < 300.000 \leftrightarrow 10km$$
$$40.000 \leq pop < 100.000 \leftrightarrow 8km$$
$$pop < 40.000 \leftrightarrow 5km$$

After modification of the code : 

In [3]:
for node in nodes:
    print(node)

[Node] Zurich, CH, ZH, (47.36667, 8.55), 12 km, 341730 people
[Node] Geneve, CH, GE, (46.202220000000004, 6.14569), 10 km, 183981 people
[Node] Basel, CH, BS, (47.55839, 7.57327), 10 km, 164488 people
[Node] Bern, CH, BE, (46.94809, 7.447439999999999), 10 km, 121631 people
[Node] Lausanne, CH, VD, (46.516000000000005, 6.63282), 10 km, 116751 people
[Node] Winterthur, CH, ZH, (47.50564, 8.72413), 8 km, 91908 people
[Node] Sankt Gallen, CH, SG, (47.42391, 9.37477), 8 km, 70572 people
[Node] Luzern, CH, LU, (47.05048, 8.30635), 8 km, 57066 people
[Node] Biel/Bienne, CH, BE, (47.13713, 7.24608), 8 km, 48614 people
[Node] Thun, CH, BE, (46.75118, 7.62166), 8 km, 42136 people
[Node] Koniz, CH, BE, (46.92436, 7.414569999999999), 5 km, 37196 people
[Node] La Chaux-de-Fonds, CH, NE, (47.09993, 6.8258600000000005), 5 km, 36825 people
[Node] Rapperswil, CH, SG, (47.225570000000005, 8.822280000000001), 5 km, 34776 people
[Node] Schaffhausen, CH, SH, (47.69732, 8.63493), 5 km, 33863 people
[Node] F