# Aéroports français

## Modèle de données

Les données sont extraites de la base openflights

* Aéroports français
* Vols commerciaux entre ces aéroports


## Création des aéroports

* Création index
```cypher
CREATE INDEX ON  :Airport(iata)
```

* Import
```cypher
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/marcdexet-cnrs/graph_jupyter/master/french_airports.csv' as line
CREATE (:Airport {iata: line['iata'],
  city: line['city'],
  latitude: toFloat(line['latitude']),
  longitude: toFloat(line['longitude']),
  position: point({latitude: toFloat(line['latitude']), longitude: toFloat(line['longitude'])}),
  name: line['name']} )
```

## Création des vols

Avec un merge _idempotent_

```Cypher
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/marcdexet-cnrs/graph_jupyter/master/french_flights.csv' as line
MATCH (src:Airport {iata: line['source_airport']}), (dest:Airport {iata: line['destination_airport']})
WITH *, distance(src.position,dest.position) as distance                                                                              
MERGE (src)-[r:TO {airlineId: line['airline_id'], distance: distance}]->(dest)
```

## Exemples de requêtes

### Aéroports parisiens

```cypher
MATCH (p:Airport {city: 'Paris'}) RETURN p.name
```


### Aéroport avec un O dans le code iata

```cypher
MATCH (a:Airport) 
WHERE a.iata STARTS WITH 'O' 
RETURN a.name, a.iata
```

### Aéroport avec le plus de départs 

```cypher
MATCH (n:Airport)-[r:TO]->(m:Airport) 
WITH n.name as airport, count(r) AS nb 
RETURN airport, nb 
ORDER BY nb DESC
```

### Liste des trajets allant de ORLY à TOULOUSE de 1 à 3 étapes

```cypher
MATCH p=(src:Airport {iata: 'ORY'})-[:TO*1..3]-(dest:Airport {iata: 'TLS'})
WHERE all(n in nodes(p)[1..-1] WHERE not n in  [src,dest])
RETURN DISTINCT length(p) as len, extract(n in nodes(p) | n.name)
ORDER BY len```

### Liste des aéroports non desservis depuis ORLY

```cypher
MATCH (src:Airport {iata: 'ORY'}), (dest:Airport) 
WHERE NOT (src)-[:TO]->(dest) AND (dest)-[:TO]-() 
RETURN dest
```

Il faudrait calculer les distance entre aéroports.

### Le plus court chemin entre Tours et Avignon

```cypher
MATCH p=shortestpath((n:Airport {iata: "TUF"})-[r:TO*1..5]->(m:Airport {iata: "AVN"}))
RETURN length(p) as len, 
reduce(s='', rel in rels(p) |s+' ('+startnode(rel).name+')-['+id(rel)+']->('+endnode(rel).name+')') AS path
 ```

### Prise en compte de la distance réelle

```cypher
// Récupérer tous les codes IATA comme une liste 
MATCH  (s:Airport) WITH collect(s.iata) as iatalist

// Dérouler la liste
UNWIND(iatalist) AS iata
// Rechercher les aéroports liés...
MATCH (s:Airport {iata: iata})-[:TO]->(t:Airport)
// ... mais sans relation 'distancié'                                       
WHERE NOT (s)-[:DISTANCIATED]-(t)

// avec s, t et un calcul de distance                               
WITH s, t, distance(s.position,t.position) as distance
// crée la relation 'distancié'
MERGE (s)-[:DISTANCIATED {distance: distance}]->(t)
```

### Le plus court chemin en fonction de la distance

```cypher
MATCH (n:Airport {iata: "TUF"}),(m:Airport {iata: "AVN"})
                                                                    
CALL apoc.algo.dijkstra(n,m,'DISTANCIATED','distance') YIELD path as p, weight
RETURN length(p) as len, 
reduce(s='', rel in rels(p) |s+' ('+startnode(rel).name+')-['+id(rel)+']->('+endnode(rel).name+')') AS path, 
weight
```

## Mesures

### Betweeness

En théorie des graphes et théorie des réseaux, la centralité intermédiaire, centralité d'intermédiarité ou intermédiarité est une mesure de centralité d'un sommet d'un graphe. Elle est égale au nombre de fois que ce sommet est sur le chemin le plus court entre deux autres nœuds quelconques du graphe. Un nœud possède une grande intermédiarité s'il a une grande influence sur les transferts de données dans le réseau, sous l'hypothèse que ces transferts se font uniquement par les chemins les plus courts. 


```cypher
MATCH (ap:Airport)
WITH collect(ap) AS airports
CALL apoc.algo.betweenness(['TO'],airports, 'BOTH')
YIELD node, score
SET node.betweenness = score
RETURN node.name AS Airport, score ORDER BY score DESC LIMIT 200
```

# Annexe

## Extraction des données


In [1]:
import pandas as pd
from IPython.display import display, HTML

flights_index = ['airline',
'airline_id',
'source_airport',
'source_airport_id',
'destination_airport',
'destination_airport_id',
'codeshare',
'stops',
'equipment']

airport_index = ['airport_id',
'name',
'city',
'country',
'iata',
'icao',
'latitude',
'longitude',
'altitude',
'timezone',
'dst',
'tz_database_time',
'zone_type',
'source']

airports = pd.read_csv('https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat',
                      names=airport_index)

flights = pd.read_csv('https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat', names=flights_index)




In [2]:
display(HTML(airports.head().to_html()))

Unnamed: 0,airport_id,name,city,country,iata,icao,latitude,longitude,altitude,timezone,dst,tz_database_time,zone_type,source
0,1,Goroka Airport,Goroka,Papua New Guinea,GKA,AYGA,-6.08169,145.391998,5282,10,U,Pacific/Port_Moresby,airport,OurAirports
1,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789001,20,10,U,Pacific/Port_Moresby,airport,OurAirports
2,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388,10,U,Pacific/Port_Moresby,airport,OurAirports
3,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239,10,U,Pacific/Port_Moresby,airport,OurAirports
4,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.44338,147.220001,146,10,U,Pacific/Port_Moresby,airport,OurAirports


In [3]:
display(HTML(flights.head().to_html()))

Unnamed: 0,airline,airline_id,source_airport,source_airport_id,destination_airport,destination_airport_id,codeshare,stops,equipment
0,2B,410,AER,2965,KZN,2990,,0,CR2
1,2B,410,ASF,2966,KZN,2990,,0,CR2
2,2B,410,ASF,2966,MRV,2962,,0,CR2
3,2B,410,CEK,2968,KZN,2990,,0,CR2
4,2B,410,CEK,2968,OVB,4078,,0,CR2


### Simplification des données

In [4]:
simple_flights = flights[['source_airport','destination_airport','airline_id','equipment']]
simple_airports = airports[['iata','name','city','country','latitude', 'longitude', 'altitude']]

### Restriction aux aéroports français

In [5]:
french_airports = simple_airports[simple_airports['country'] == 'France']

### Curation des noms d'aéroports 

In [6]:
french_curated_airports = french_airports.copy()
french_curated_airports.name = french_curated_airports.name.apply(lambda s: s.replace('\\','').replace('"',''))

### Ecriture dans un fichier CSV

In [7]:
valid_iata_names_mask = french_curated_airports['iata'].str.len() > 2

french_curated_airports[valid_iata_names_mask]\
[['iata','name','city','country','latitude','longitude','altitude']] \
.to_csv('french_airports.csv')

### jointure des vols et des aéroports

In [8]:
def join_flight_to_airport(flights, airports) :
    src = pd.merge(flights, airports, left_on=['source_airport'], right_on=['iata'])
    dest = pd.merge(src, airports, left_on=['destination_airport'], right_on=['iata'],suffixes=('_src','_dest'))
    return dest

In [9]:
france_flights = join_flight_to_airport(simple_flights, french_airports)

In [10]:
france_flights[['airline_id','source_airport', 'destination_airport']].to_csv('french_flights.csv')

## APOC

Voir la liste sur https://neo4j-contrib.github.io/neo4j-apoc-procedures/