In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv

### Reading in the data

### Airlines

As of January 2012, the OpenFlights Airlines Database contains 5888 airlines. Each entry contains the following information:

**Airline ID**	Unique OpenFlights identifier for this airline.

**Name**	Name of the airline.

**Alias**	Alias of the airline. For example, All Nippon Airways is commonly known as "ANA".

**IATA**	2-letter IATA code, if available.

**ICAO**	3-letter ICAO code, if available.

**Callsign**	Airline callsign.

**Country**	Country or territory where airport is located. See Countries to cross-reference to ISO 3166-1 codes.

**Active**	"Y" if the airline is or has until recently been operational, "N" if it is defunct. This field is not reliable: in particular, major airlines that stopped flying long ago, but have not had their IATA code reassigned (eg. Ansett/AN), will incorrectly show as "Y".
The data is UTF-8 encoded. The special value \N is used for "NULL" to indicate that no value is available, and is understood automatically by MySQL if imported.

Notes: Airlines with null codes/callsigns/countries generally represent user-added airlines. Since the data is intended primarily for current flights, defunct IATA codes are generally not included. For example, "Sabena" is not listed with a SN IATA code, since "SN" is presently used by its successor Brussels Airlines.

In [4]:
input_data = "DATA/airlines.txt"

headers = ['ID', 'Name', 'Alias', 'IATA', 'ICAO', 'Callsign', 'Country', 'Active']

airlines = pd.read_csv(input_data, sep=',', names=headers)
airlines = airlines.iloc[1:,]

In [5]:
airlines.shape

(6161, 8)

In [6]:
airlines.head()

Unnamed: 0,ID,Name,Alias,IATA,ICAO,Callsign,Country,Active
1,1,Private flight,\N,-,,,,Y
2,2,135 Airways,\N,,GNL,GENERAL,United States,N
3,3,1Time Airline,\N,1T,RNX,NEXTIME,South Africa,Y
4,4,2 Sqn No 1 Elementary Flying Training School,\N,,WYT,,United Kingdom,N
5,5,213 Flight Unit,\N,,TFU,,Russia,N


### Airports

**Airport ID**	Unique OpenFlights identifier for this airport. 

**Name** Name of airport. May or may not contain the City name.

**City**	Main city served by airport. May be spelled differently from Name.

**Country**	Country or territory where airport is located. See Countries to cross-reference to ISO 3166-1 codes.

**IATA**	3-letter IATA code. Null if not assigned/unknown.

**ICAO**	4-letter ICAO code. Null if not assigned.

**Latitude**	Decimal degrees, usually to six significant digits. Negative is South, positive is North.

**Longitude**	Decimal degrees, usually to six significant digits. Negative is West, positive is East.

**Altitude**	In feet.

**Timezone**	Hours offset from UTC. Fractional hours are expressed as decimals, eg. India is 5.5.

**DST**	Daylight savings time. One of E (Europe), A (US/Canada), S (South America), O (Australia), Z (New Zealand), N (None) or U (Unknown). 

**Tz database timezone**	Timezone in "tz" (Olson) format, eg. "America/Los_Angeles".

**Type**	Type of the airport. Value "airport" for air terminals, "station" for train stations, "port" for ferry terminals and "unknown" if not known. In airports.csv, only type=airport is included.

**Source**	Source of this data. "OurAirports" for data sourced from OurAirports, "Legacy" for old data not matched to OurAirports (mostly DAFIF), "User" for unverified user contributions. In airports.csv, only source=OurAirports is included.
The data is UTF-8 encoded.

Note: Rules for daylight savings time change from year to year and from country to country. The current data is an approximation for 2009, built on a country level. Most airports in DST-less regions in countries that generally observe DST (eg. AL, HI in the USA, NT, QL in Australia, parts of Canada) are marked incorrectly.

In [7]:
input_data = "DATA/airports.txt"

headers = ['ID', 'Name', 'City', 'Country', 'IATA', 'ICAO', 'Latitude', 'Longitude', 'Altitude', 'Timezone', 'DST', 'Tz database timezone', 'Type', 'Source']

airports = pd.read_csv(input_data, sep=',', names=headers)

In [8]:
airports.shape

(7698, 14)

In [9]:
airports.head()

Unnamed: 0,ID,Name,City,Country,IATA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,Tz database timezone,Type,Source
0,1,Goroka Airport,Goroka,Papua New Guinea,GKA,AYGA,-6.08169,145.391998,5282,10,U,Pacific/Port_Moresby,airport,OurAirports
1,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789001,20,10,U,Pacific/Port_Moresby,airport,OurAirports
2,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388,10,U,Pacific/Port_Moresby,airport,OurAirports
3,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239,10,U,Pacific/Port_Moresby,airport,OurAirports
4,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.44338,147.220001,146,10,U,Pacific/Port_Moresby,airport,OurAirports


### Routes

As of June 2014, the OpenFlights/Airline Route Mapper Route Database contains 67663 routes between 3321 airports on 548 airlines spanning the globe, as shown in the map above. Each entry contains the following information:

**Airline**	2-letter (IATA) or 3-letter (ICAO) code of the airline.

**Airline ID**	Unique OpenFlights identifier for airline (see Airline).

**Source airport**	3-letter (IATA) or 4-letter (ICAO) code of the source airport.

**Source airport ID**	Unique OpenFlights identifier for source airport (see Airport)

**Destination airport**	3-letter (IATA) or 4-letter (ICAO) code of the destination airport.

**Destination airport ID**	Unique OpenFlights identifier for destination airport (see Airport)

**Codeshare**	"Y" if this flight is a codeshare (that is, not operated by Airline, but another carrier), empty otherwise.

**Stops**	Number of stops on this flight ("0" for direct)

**Equipment**	3-letter codes for plane type(s) generally used on this flight, separated by spaces
The data is UTF-8 encoded. The special value \N is used for "NULL" to indicate that no value is available, and is understood automatically by MySQL if imported.

Notes:
Routes are directional: if an airline operates services from A to B and from B to A, both A-B and B-A are listed separately.
Routes where one carrier operates both its own and codeshare flights are listed only once.

In [10]:
input_data = "DATA/routes.txt"

headers = ['Airline', 'Airline ID', 'Source', 'Source ID', 'Destination', 'Destination ID', 'Codeshare', 'Stops', 'Equipment']

routes = pd.read_csv(input_data, sep=',', names=headers)

In [11]:
routes.shape

(67663, 9)

In [12]:
routes.head()

Unnamed: 0,Airline,Airline ID,Source,Source ID,Destination,Destination ID,Codeshare,Stops,Equipment
0,2B,410,AER,2965,KZN,2990,,0,CR2
1,2B,410,ASF,2966,KZN,2990,,0,CR2
2,2B,410,ASF,2966,MRV,2962,,0,CR2
3,2B,410,CEK,2968,KZN,2990,,0,CR2
4,2B,410,CEK,2968,OVB,4078,,0,CR2


### Graph creation

In [13]:
import networkx as nx

In [14]:
G = nx.DiGraph()

for index, row in airports.iterrows():
    G.add_node(row['ID'], name=row['Name'], city=row['City'], country=row['Country'], iata=row['IATA'], icao=row['ICAO'], lat=row['Latitude'], lon=row['Longitude'])

for index, row in routes.iterrows():
    G.add_edge(row['Source ID'], row['Destination ID'], airline=row['Airline ID'])