# Using RAPIDS cuGraph and cuSpatial to analyze airport and flight data
## Intro
We have airports and flights datasets.  We have cuGraph and cuSpatial.  What craziness can we get up to here?

We're going to use cuGraph and cuSpatial to answer these questions of our data:
1. Which airport is the most trafficked airport in our dataset?
1. What are the max number of plane rides (hops) do you need to take to get from the most trafficked airport to get to any other airport in our dataset?
1. How many hops do you need to take to get from the most trafficked airport to one of the least trafficked airport?
1. How far is that distance really?
1. What is the topology of our airport network, based on our dataset and distance from one another?

Note: The Airports data in this toy dataset is using hashed identifiers. In the beginning, this may throw you for a loop, but by the end of the notebook everything will be clearer.

## Imports and Data Gathering/Prep

In [1]:
import pandas as pd
import numpy as np
import cuspatial, cugraph, cudf, cuml

In [2]:
!wget https://raw.githubusercontent.com/rapidsai/cuDataShader/master/cudatashader-notebooks/data/airports.csv
!wget https://raw.githubusercontent.com/rapidsai/cuDataShader/master/cudatashader-notebooks/data/flights.csv

--2019-10-07 13:09:21--  https://raw.githubusercontent.com/rapidsai/cuDataShader/master/cudatashader-notebooks/data/airports.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.128.133, 151.101.0.133, 151.101.64.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.128.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 191508 (187K) [text/plain]
Saving to: ‘airports.csv’


2019-10-07 13:09:21 (7.98 MB/s) - ‘airports.csv’ saved [191508/191508]

--2019-10-07 13:09:21--  https://raw.githubusercontent.com/rapidsai/cuDataShader/master/cudatashader-notebooks/data/flights.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.64.133, 151.101.0.133, 151.101.192.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.64.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5015643 (4.8M) [text/plain]
Saving to: ‘flights.csv’


2019-10-

In [3]:
data_dir = './'
fdf = cudf.read_csv(data_dir+'flights.csv')
adf = cudf.read_csv(data_dir+'airports.csv')

In [4]:
fdf.head()

Unnamed: 0,PASSENGERS,ORIGIN_AIRPORT_ID,DEST_AIRPORT_ID
0,1,12264,10397
1,1,13541,12197
2,1,10599,12206
3,1,10792,10581
4,1,10792,14576


In [5]:
fdf.dtypes

PASSENGERS           int64
ORIGIN_AIRPORT_ID    int64
DEST_AIRPORT_ID      int64
dtype: object

### Prep
Since we'll be using cuGraph, which uses int32, and the above dtypes are int64, we recast each Series:

In [6]:
fdf['ORIGIN_AIRPORT_ID'] = fdf['ORIGIN_AIRPORT_ID'].astype(np.int32)
fdf['DEST_AIRPORT_ID'] = fdf['DEST_AIRPORT_ID'].astype(np.int32)
fdf['PASSENGERS'] = fdf['PASSENGERS'].astype(np.int32)

In [7]:
fdf.dtypes

PASSENGERS           int32
ORIGIN_AIRPORT_ID    int32
DEST_AIRPORT_ID      int32
dtype: object

Okay, better!  Now let's make some some graphs.  Why?  Cause graphs are fun and informative!

## Graphs
Recall that we're going to ask these questions of our data:
1. Which airport is the most trafficked airport in our dataset?
1. What are the max number of plane rides (hops) do you need to take to get from the most trafficked airport to get to any other airport in our dataset?
1. How many hops do you need to take to get from the most trafficked airport to one of the least trafficked airport?
1. How far is that distance really?
1. What is the topology of our airport network, based on our dataset and distance from one another?

**Let's get started!**

### Build the foundations

In [8]:
G = cugraph.Graph()
G.add_edge_list(fdf["ORIGIN_AIRPORT_ID"], fdf["DEST_AIRPORT_ID"])

In [9]:
fdf["ORIGIN_AIRPORT_ID"].value_counts()

13930    12020
11292    10297
10397     9137
13487     6848
11433     6715
11298     6352
12892     6157
12266     6075
11057     5942
11618     5634
12889     5300
14771     5202
14747     5001
14107     4894
13204     4762
14100     4757
14869     4329
10721     4146
12264     3990
11278     3803
12953     3748
11697     3470
12478     3463
10821     3290
10423     2987
15304     2931
14492     2886
10693     2844
14057     2819
14679     2725
         ...  
16628        1
16634        1
16658        1
16666        1
16669        1
16680        1
16683        1
16706        1
16715        1
16720        1
16724        1
16725        1
16729        1
16764        1
16770        1
16771        1
16776        1
16777        1
16790        1
16791        1
16792        1
16793        1
16807        1
16811        1
16812        1
16816        1
16823        1
16838        1
16839        1
16840        1
Name: ORIGIN_AIRPORT_ID, Length: 1147, dtype: int32

In [10]:
fdf["DEST_AIRPORT_ID"].value_counts()

13930    12082
11292    10425
10397     9179
13487     6913
11433     6651
11298     6527
12266     6211
12892     6014
11057     5957
11618     5830
12889     5236
14771     5129
14747     4940
13204     4785
14107     4781
14100     4737
14869     4284
12264     4012
10721     3980
11278     3891
12953     3870
12478     3659
11697     3523
10821     3286
10423     3075
15304     2886
10693     2812
14492     2797
14057     2792
10551     2692
         ...  
16569        1
16585        1
16588        1
16624        1
16628        1
16658        1
16666        1
16669        1
16680        1
16715        1
16724        1
16725        1
16744        1
16764        1
16770        1
16771        1
16790        1
16791        1
16792        1
16793        1
16795        1
16804        1
16807        1
16812        1
16816        1
16819        1
16820        1
16838        1
16839        1
16840        1
Name: DEST_AIRPORT_ID, Length: 1181, dtype: int32

### Question 1: Which airport is the most trafficked airport in our dataset?

The easiest way to find out which airport is the most trafficked is the same way Google does it for websites: Pagerank!

In [11]:
df_page = cugraph.pagerank(G)

So now we have a graph, `df_page`.  Great!  What does it look like?

In [12]:
df_page.head()

Unnamed: 0,vertex,pagerank
0,0,4.3e-05
1,1,4.3e-05
2,2,4.3e-05
3,3,4.3e-05
4,4,4.3e-05


Pagerank isn't ordered by rank, but by vertex number, but it is easy to find the max rank and sort the orders.  Let's get our max and the top 10 airports in our dataset

In [13]:
pr_max = df_page['pagerank'].max()
print(pr_max)

0.008008820936083794


In [14]:
sort_pr = df_page.sort_values('pagerank', ascending=False)
sort_pr.head(10)

Unnamed: 0,vertex,pagerank
13930,13930,0.008009
11292,11292,0.00747
10397,10397,0.006169
11298,11298,0.005
13487,13487,0.004667
11433,11433,0.004259
11057,11057,0.004206
12892,12892,0.004058
12266,12266,0.003992
11630,11630,0.00396


In [16]:
sort_pr = df_page.sort_values('pagerank', ascending=False) # Just for fun, we're looking to see which airports have the least traffic
sort_pr.head(10)

Unnamed: 0,vertex,pagerank
13930,13930,0.008009
11292,11292,0.00747
10397,10397,0.006169
11298,11298,0.005
13487,13487,0.004667
11433,11433,0.004259
11057,11057,0.004206
12892,12892,0.004058
12266,12266,0.003992
11630,11630,0.00396


Those are the top 10 trafficked airports.  While it was easy to see from the origin and destination airports counts that 13930 would be the most trafficked, the order of the others in the list required a bit more work.  It is also interesting that no single airport acconts for 1% of the total flights.

### Question 2: max number of plane rides (hops)?

Let's do a breadth first search (BFS) on the airports to fly out of and see how many hops it takes to get from popular airport, 13930, to an isolated one.  We'll do the BFS from the most poular airport to a randomly chosen one.

In [17]:
df = cugraph.bfs(G,13930)

In [18]:
df.count()

vertex         16841
distance       16841
predecessor    16841
dtype: int64

In [19]:
df['predecessor'].value_counts()

-1        15666
 13930      227
 11630       71
 10299       60
 15167       38
 12889       30
 13232       28
 12197       26
 10056       19
 13369       15
 11292       13
 14122       13
 10245       12
 10551       12
 14100       12
 13796       11
 10170       10
 10721       10
 11697       10
 13204       10
 14027       10
 10693        9
 12892        9
 12953        9
 14828        9
 10559        8
 11298        8
 12523        8
 13303        8
 14107        8
          ...  
 14112        1
 14193        1
 14256        1
 14307        1
 14572        1
 14576        1
 14670        1
 14672        1
 14695        1
 14709        1
 14711        1
 14794        1
 14986        1
 15011        1
 15069        1
 15070        1
 15096        1
 15154        1
 15160        1
 15195        1
 15236        1
 15295        1
 15446        1
 15447        1
 15579        1
 15779        1
 15855        1
 15919        1
 16101        1
 16498        1
Name: predecessor, Lengt

hmmm...what's `-1`?  Why does it's value so high?  Well, maybe it doesn't matter...let's get the max

In [20]:
df["distance"].max()

2147483647

**Whoa!**  That distance value is unexpected...but really not.  In the BFS demo, Brad told us that this occurs because the isolated vertex, 0, is unreachable.  Whenever a graph contains disjointed components, the distance to the unconnected vertices will always be max_int.  He also showed us how to fix it by dropping all insanely large distances.  We'll keep `df` untouched, in case we need it again, and make a second dataframe `df2`

In [21]:
# drop all large distances 
exp="distance < 100"
df2 = df.query(exp)

In [22]:
df2['predecessor'].value_counts()

13930    227
11630     71
10299     60
15167     38
12889     30
13232     28
12197     26
10056     19
13369     15
11292     13
14122     13
10245     12
10551     12
14100     12
13796     11
10170     10
10721     10
11697     10
13204     10
14027     10
10693      9
12892      9
12953      9
14828      9
10559      8
11298      8
12523      8
13303      8
14107      8
12127      7
        ... 
14112      1
14193      1
14256      1
14307      1
14572      1
14576      1
14670      1
14672      1
14695      1
14709      1
14711      1
14794      1
14986      1
15011      1
15069      1
15070      1
15096      1
15154      1
15160      1
15195      1
15236      1
15295      1
15446      1
15447      1
15579      1
15779      1
15855      1
15919      1
16101      1
16498      1
Name: predecessor, Length: 250, dtype: int32

That looks better!  A positive number has the most, and it's of course, airport 13930.  Now, let's see what the real graph distance is.

In [23]:
df2["distance"].max()

5

Okay great!  We know that no matter what, in the US, you're no more than 5 flights away from any other airport.  

### Question 3: How many hops do you need to take to get from the most trafficked airport to one of the least trafficed airport
Let's find out how many flights it takes to get us to a remote airport.  Let's pick one that has 1 flight from it.  I'm choosing `16838`, but you can change that value to another airport.  Also, there's a helper function to help make it a nicer print.

In [24]:
end_airport = 16838 # change to any other airport

In [25]:
def print_path(df, id):
    
    # Use the BFS predecessors and distance to trace the path 
    # from vertex id back to the starting vertex ( vertex 1 in this example)
    dist = df['distance'][id]
    lastVert = id
    for i in range(dist):
        nextVert = df['predecessor'][lastVert]
        d = df['distance'][lastVert]
        print("Airport " + str(lastVert) + " was reached from airport " + str(nextVert) + 
        " where the graph distance to Airport 13930 was " + str(d) )
        lastVert = nextVert

In [26]:
print_path(df, end_airport)

Airport 16838 was reached from airport 13418 where the graph distance to Airport 13930 was 3
Airport 13418 was reached from airport 12197 where the graph distance to Airport 13930 was 2
Airport 12197 was reached from airport 13930 where the graph distance to Airport 13930 was 1


If you used my number, it would take 3 flights So now we know which airports you would connect to between those two airports.  But that is the graph distance.  What about the real distances?  

### Question 4:  How far is that distance really?
Well, for that, we need to bring in our other dataset, `adf`, which is a list of the airport's latitude and longitudes, as well as the GPU accelerated `cuSpatial` library to compute the Haversine distances (distances on the surface of the globe [sphere] instead of a straight line) 

In [27]:
adf.head()

Unnamed: 0,AIRPORT_ID,LATITUDE,LONGITUDE
0,10001,58.109444,-152.906667
1,10003,65.548056,-161.071667
2,10004,68.083333,-163.166667
3,10005,67.57,-148.183889
4,10006,57.745278,-152.882778


Let's make a new function that calculates the haversine distance of all the airports in our flights at once.  This is a great time to use merge().  We'll do 2 merges, first on `ORIGIN_AIRPORT_ID` and then on `DEST_AIRPORT_ID`. To do the merge, we'll need to typecast the queries on our original 2 dataframes.

In [28]:
fdf['AIRPORT_ID'] = fdf['ORIGIN_AIRPORT_ID'].astype(np.int64) # create a common key with origin airport
hdf = fdf.merge(adf, on=['AIRPORT_ID'], how='left')
hdf.rename(columns = {'LATITUDE': 'LATITUDE_O', 'LONGITUDE': 'LONGITUDE_O'}, inplace=True) # Origin lat and long
hdf['AIRPORT_ID'] = hdf['DEST_AIRPORT_ID'].astype(np.int64) # recreate a common key with destination airport
hdf = hdf.merge(adf, on=['AIRPORT_ID'], how='left')
hdf.rename(columns = {'LATITUDE': 'LATITUDE_D', 'LONGITUDE': 'LONGITUDE_D'}, inplace=True) # Origin lat and long
hdf.head()

Unnamed: 0,PASSENGERS,ORIGIN_AIRPORT_ID,DEST_AIRPORT_ID,AIRPORT_ID,LATITUDE_O,LONGITUDE_O,LATITUDE_D,LONGITUDE_D
0,2,12889,13244,13244,36.08,-115.152222,35.049722,-89.978611
1,2,12892,10540,10540,33.9425,-118.408056,42.470833,-71.29
2,2,12892,14679,14679,33.9425,-118.408056,32.732778,-117.187222
3,2,12932,12339,12339,43.628056,-72.305833,39.728889,-86.281667
4,2,12953,13204,13204,40.779444,-73.875833,28.431667,-81.324722


In [29]:
x1 = hdf["LONGITUDE_O"]
y1 = hdf["LATITUDE_O"]
x2 = hdf["LONGITUDE_D"]
y2 = hdf["LATITUDE_D"]

hdf['H-distance'] = cuspatial.haversine_distance(x1, y1, x2, y2)
hdf.head(10)

Unnamed: 0,PASSENGERS,ORIGIN_AIRPORT_ID,DEST_AIRPORT_ID,AIRPORT_ID,LATITUDE_O,LONGITUDE_O,LATITUDE_D,LONGITUDE_D,H-distance
0,2,12889,13244,13244,36.08,-115.152222,35.049722,-89.978611,2273.552209
1,2,12892,10540,10540,33.9425,-118.408056,42.470833,-71.29,4169.058512
2,2,12892,14679,14679,33.9425,-118.408056,32.732778,-117.187222,175.941393
3,2,12932,12339,12339,43.628056,-72.305833,39.728889,-86.281667,1237.124452
4,2,12953,13204,13204,40.779444,-73.875833,28.431667,-81.324722,1531.448949
5,2,13232,10257,10257,41.785,-87.751944,42.745833,-73.805278,1151.284544
6,2,13232,10540,10540,41.785,-87.751944,42.470833,-71.29,1357.583284
7,2,13244,12889,12889,35.049722,-89.978611,36.08,-115.152222,2273.552209
8,2,13244,15167,15167,35.049722,-89.978611,40.849722,-74.062222,1534.327434
9,2,13303,10821,10821,25.7925,-80.286111,39.175556,-76.671389,1525.87945


Let's get the actual distances that one must fly to get between those airports

In [30]:
H = cugraph.Graph()
#hdf["ORIGIN_AIRPORT_ID_0"] = hdf["ORIGIN_AIRPORT_ID"] - 10001
#hdf["DEST_AIRPORT_ID_0"] = hdf["DEST_AIRPORT_ID"] - 10001
#hdf["data"] = 1.0
H.add_edge_list(hdf["ORIGIN_AIRPORT_ID"], hdf["DEST_AIRPORT_ID"], hdf["H-distance"])
hgdf = cugraph.bfs(H,13930)

**Fun Fact** Deleting the -1s throws off your indexes and doesn't return you a valid answer.  Try it if you'd like!

In [31]:
def print_dist_path(df, id):
    # Use the BFS predecessors and distance to trace the path 
    # from vertex id back to the starting vertex ( vertex 1 in this example)
    dist = df['distance'][id]
    hdist = 0
    print("Your overall flight has " + str(dist) + " hops")
    lastVert = id
    for i in range(dist):
        nextVert = df['predecessor'][lastVert]
        d = df['distance'][lastVert]
        a = hdf.query("ORIGIN_AIRPORT_ID == @nextVert and DEST_AIRPORT_ID == @lastVert")
        a.head()
        hdist = hdist+ a["H-distance"][0]
        print("Airport: " + str(lastVert) + " was reached from Airport " + str(nextVert) + 
        " and flight distance was " + str(a["H-distance"][0]) )
        lastVert = nextVert
    print("Your total flying distance was " + str(hdist))

In [32]:
print_dist_path(hgdf, 16838)

Your overall flight has 3 hops
Airport: 16838 was reached from Airport 13418 and flight distance was 235.3294186784242
Airport: 13418 was reached from Airport 12197 and flight distance was 416.77900332448866
Airport: 12197 was reached from Airport 13930 and flight distance was 1184.955067107391
Your total flying distance was 1837.0634891103039


Okay, pretty cool.  We now know the distance between these airports...but where are they in the world?  Normally, we'd use use [cuDataShader](https://github.com/rapidsai/cuDataShader) for this, but it is not a library in this container.  [They've got a great example here that you can adapt to your needs](https://github.com/rapidsai/cuDataShader/blob/master/cudatashader-notebooks/cuDatashader%20Edge%20Bundling%20(US%20air%20traffic).ipynb)

### Question 5: What is the topology of our airport network

Let's look at the topology of this network of airports.  One way to do that is to measure the modularity of our airport system!  To do that, we use Louvain.  However, we need to make some changes to our data, as Louvain requires us to start from 0.  It also requires weights.  Let's see how weights change our answer.  We will use our Haversine distances as our weights in one set, and be unweighted in the next!

In [33]:
L = cugraph.Graph()
L2 = cugraph.Graph()
hdf["ORIGIN_AIRPORT_ID_0"] = hdf["ORIGIN_AIRPORT_ID"] - 10001
hdf["DEST_AIRPORT_ID_0"] = hdf["DEST_AIRPORT_ID"] - 10001
hdf["data"]= 1.0
L2.add_edge_list(hdf["ORIGIN_AIRPORT_ID_0"], hdf["DEST_AIRPORT_ID_0"], hdf["data"]) # Unweighted Modularity
L.add_edge_list(hdf["ORIGIN_AIRPORT_ID_0"], hdf["DEST_AIRPORT_ID_0"], hdf["H-distance"]) # Distance Weighted Modularity

In [34]:
# Call Louvain on the graph
hgdf, mod = cugraph.louvain(L) 
hgdf2, mod2 =cugraph.louvain(L2) 
# Print the modularity score
print('Modularity using Distance as a weight was {}'.format(mod))
print()
print('Modularity unweighted was {}'.format(mod2))
print()

Modularity using Distance as a weight was 0.009855977986124594

Modularity unweighted was 0.2059180297193165



In [35]:
hgdf.head(10)

Unnamed: 0,vertex,partition
0,0,0
1,1,1
2,2,2
3,3,3
4,4,1431
5,5,44
6,6,4
7,7,5
8,8,145
9,9,6


In [36]:
hgdf2.head(10)

Unnamed: 0,vertex,partition
0,0,0
1,1,1
2,2,2
3,3,3
4,4,4469
5,5,42
6,6,4
7,7,5
8,8,1054
9,9,3335


That's a high partition number for both graphs.  This is of course, based on a small dataset of flights.  I'll be working on a larger one in notebooks_contrib that uses DOT 2015 Flight data and use cuDataShader for graph visualizations.  Let's see what the value counts look like.

In [37]:
print(len(hgdf['partition'].unique()))
print(len(hgdf2['partition'].unique()))

6122
5685


In [38]:
hgdf['partition'].value_counts()

1431    105
3376     92
256      90
2909     76
1138     52
694      50
1916     41
1370     30
44       23
1408     23
3599     18
3322     17
1927     14
2484     13
145      11
1743     10
3735      9
440       7
1994      7
2950      4
3620      4
1227      3
1718      3
3259      3
3371      3
3535      3
3722      3
4117      3
4953      3
5169      3
       ... 
6091      1
6092      1
6093      1
6094      1
6095      1
6096      1
6097      1
6098      1
6099      1
6100      1
6101      1
6103      1
6104      1
6105      1
6106      1
6107      1
6108      1
6109      1
6110      1
6111      1
6112      1
6113      1
6114      1
6115      1
6116      1
6117      1
6118      1
6119      1
6120      1
6121      1
Name: partition, Length: 6122, dtype: int32

In [39]:
hgdf2['partition'].value_counts()

3335    762
4469    179
415      75
2041     47
3354     43
42       28
2842     12
1054     11
278       2
1286      2
3654      2
4026      2
4245      2
5671      2
0         1
1         1
2         1
3         1
4         1
5         1
6         1
7         1
8         1
9         1
10        1
11        1
12        1
13        1
14        1
15        1
       ... 
5654      1
5655      1
5656      1
5657      1
5658      1
5659      1
5660      1
5661      1
5662      1
5663      1
5664      1
5665      1
5666      1
5667      1
5668      1
5669      1
5670      1
5672      1
5673      1
5674      1
5675      1
5676      1
5677      1
5678      1
5679      1
5680      1
5681      1
5682      1
5683      1
5684      1
Name: partition, Length: 5685, dtype: int32

It seems that the unweighted graph is less modular.  Let's remove paritions of 1 from the .

In [40]:
def get_mod(df):
    val_counts = df['partition'].value_counts()
    relevant_partitions = val_counts[val_counts>1].index
    print(len(relevant_partitions))
    query = 'partition == '+ str(relevant_partitions[0])
    for i in range (1, len(relevant_partitions)):
            query += ' or partition == '+ str(relevant_partitions[i])
    return df.query(query)

In [41]:
# How many partitions where found
def get_partitions(df):
    part_ids = df["partition"].unique()
    for p in range(len(part_ids)):
        part = []
        for i in range(len(df)):
            #print(df['partition'][i])
            if (df['partition'][i] == part_ids[p]):
                part.append(df['vertex'][i] +1+10001)
        print("Partition " + str(part_ids[p]) + " contains these airports:")
        print(part)

In [42]:
print("Number of partitions > 1 in Distance Weighted Modularity:")
hgdf_1 = get_mod(hgdf)
print("Number of partitions > 1 in Unweighted Modularity:")
hgdf_2 = get_mod(hgdf2)

hgdf_1.head()

Number of partitions > 1 in Distance Weighted Modularity:
55
Number of partitions > 1 in Unweighted Modularity:
14


Unnamed: 0,vertex,partition
4,4,1431
5,5,44
8,8,145
13,13,1431
15,15,1927


In [43]:
print("------Distance Weighted Modularity------")
get_partitions(hgdf_1)
print("------Unweighted Modularity------")
get_partitions(hgdf_2)

------Distance Weighted Modularity------
Partition 44 contains these airports:
[10007, 10018, 10043, 10044, 10048, 10057, 10064, 10079, 10101, 10244, 10279, 11514, 12510, 12706, 12713, 12744, 12772, 12786, 12855, 12870, 13889, 15092, 15837]
Partition 145 contains these airports:
[10010, 10171, 10918, 10927, 11300, 11556, 12773, 12867, 13864, 13935, 14856]
Partition 192 contains these airports:
[10225, 15326]
Partition 256 contains these airports:
[10166, 10217, 10226, 10238, 10300, 10467, 10552, 10616, 10755, 10776, 10784, 10875, 11110, 11337, 11434, 11446, 11472, 11512, 11536, 11551, 11560, 11638, 11765, 11776, 11814, 11826, 11828, 11834, 11845, 11846, 11942, 11953, 12120, 12196, 12295, 12322, 12639, 12654, 12664, 12672, 12705, 12720, 12722, 12749, 12756, 12808, 12820, 12823, 12842, 13088, 13204, 13297, 13399, 13580, 13705, 13716, 13768, 13874, 13943, 13971, 14047, 14103, 14131, 14168, 14269, 14274, 14322, 14486, 14494, 14710, 14739, 14806, 14881, 14896, 14920, 15013, 15064, 15087, 15

Partition 1054 contains these airports:
[10010, 10171, 10244, 11300, 12722, 12773, 12867, 13864, 13935, 14856, 15733]
Partition 1286 contains these airports:
[11591, 12079]
Partition 2041 contains these airports:
[10205, 10698, 10887, 10927, 10962, 11231, 11368, 11402, 11485, 11546, 11620, 11721, 11998, 12172, 12176, 12253, 12254, 12524, 12611, 12634, 12729, 12774, 12816, 12820, 12848, 12858, 12984, 13505, 13694, 14029, 14063, 14230, 14257, 14271, 14798, 14829, 15232, 15742, 15755, 15794, 15842, 15886, 15992, 16341, 16344, 16346, 16347]
Partition 2842 contains these airports:
[10415, 10505, 10614, 11774, 13126, 13508, 13675, 13988, 14183, 15856, 16443, 16667]
Partition 3335 contains these airports:
[10011, 10012, 10136, 10137, 10141, 10142, 10147, 10155, 10156, 10158, 10159, 10172, 10186, 10195, 10199, 10209, 10217, 10225, 10256, 10258, 10268, 10269, 10276, 10280, 10299, 10301, 10310, 10323, 10326, 10328, 10330, 10334, 10346, 10349, 10362, 10373, 10379, 10386, 10398, 10401, 10409, 1042

Okay great!  Now we know what each partition is, you can once again use [cuDataShader](https://github.com/rapidsai/cuDataShader) or [cuXFilter](https://github.com/rapidsai/cuxfilter) to visualize the results.  Let's make a pretty picture (that sound you just heard was Allan Enemark grinding his teeth :).  He's a friend, so I shouldn't befall any physcal harm by his hands.  He also leads the team that does data visualizations, and their libraries, such as [cuXFilter](https://github.com/rapidsai/cuxfilter) and [cuDataShader](https://github.com/rapidsai/cuDataShader)).