## Introduction to Big Data
### Segment 5 of 5

# Exploration

*Lesson Developer: Jayakrishnan Ajayakumar, jxa421@case.edu*

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout
import pandas as pd
import queue
import threading
import time
import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

## Reminder
<a href="#/slide-2-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

<br>
</br>
<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

## Solving the congestion!

From our previous experiment, we saw that our naive implementation of distance-based search is not capable of handling the volume and velocity of SBD in a real-world scenario. The reason is that for the naive implementation there are many unnecessary distance calculations. An illustration is provided below


<img src="supplementary/images/less_congestion.jpg" width=50%>
The red lines indicate un-necessary distance calculations. If our scan radius is 200m and we are checking whether a point that is 10,000m is within the scan radius then it is an un-necessary calculation. So how can we avoid such calculations? 

Enter another tree, <span STYLE="font-size:18.0pt;color:black">KD-Tree</span>, which is optimized for neighbor lookups. 

Again we won't go into the details of KD-Tree as it would need a strong background in Tree based algorithms. An illustration is shown below. 

<img src="supplementary/images/kd_tree_illustration.jpg" width=50%>

Luckily we do have a KD-Tree implementation in Python from the Scipy library (free and open-source Python library used for scientific computing and technical computing)

You can import the KD-Tree implementation from scipy library using the following import statement 
<code>from scipy.spatial import cKDTree</code>. Let us first look at a small example using KD-Tree

In [None]:
import geopandas as gpd
from scipy.spatial import cKDTree
import pandas as pd
import numpy as np

In [None]:
#lets create a dataframe of some cities in USA
cityFrame = pd.DataFrame([['New York City',40.712778,-74.006111],\
['Los Angeles',34.05,-118.25],['Chicago',41.881944, -87.627778],['Houston',29.762778, -95.383056],\
['Phoenix',33.448333, -112.073889],['Philadelphia',39.952778, -75.163611],['San Antonio',29.425, -98.493889],\
['San Diego',32.715, -117.1625],['Dallas',32.779167, -96.808889],['San Jose',37.336111, -121.890556]],columns=['City','latitude','longitude'])
cityFrame

Convert this to a geodataframe

In [None]:
cityGeoData = gpd.GeoDataFrame(cityFrame['City'],geometry=gpd.points_from_xy(cityFrame.longitude,cityFrame.latitude),crs='EPSG:4326')
cityGeoData.plot();

Since we want to do distance based queries, it would be beneficial to convert the geographic coordinates to projected coordinates

In [None]:
cityGeoData = cityGeoData.to_crs('EPSG:9822')

Now we need to create a KD-Tree using the coordinates in this form [[lon1,lat1],[lon2,lat2]....]. So lets extract the coordinates in this form the geodataframe. We will use numpy to achieve this

In [None]:
coordinates = np.dstack((cityGeoData.geometry.x,cityGeoData.geometry.y))[0]
coordinates

Now we can create our KD tree from this set of coordinates

In [None]:
cityTree = cKDTree(coordinates);

Now we can start to query this tree for neighbors. Let us create a target geodataframe containing locations to search for and then extract out the coordinates. You have to make sure that the projection matches,

In [None]:
targetPlacesFrame = pd.DataFrame([['My Home',41.50161178703802, -81.59157919814577]],columns=['Location','latitude','longitude'])
targetGeoData = gpd.GeoDataFrame(targetPlacesFrame['Location'],geometry=gpd.points_from_xy(targetPlacesFrame.longitude,targetPlacesFrame.latitude),crs='EPSG:4326')
targetGeoData = targetGeoData.to_crs('EPSG:9822')
targetCoordinates = np.dstack((targetGeoData.geometry.x,targetGeoData.geometry.y))[0]
targetCoordinates

Now we can query the tree, for example find the closest city from the target location (in this case the target location is in Cleveland)

In [None]:
cityTree.query(targetCoordinates)

The query methods return a tuple of two array, the first array contains the distance and the second array contains the corresponding index of the tree. We can just print the results like this

In [None]:
closest = cityTree.query(targetCoordinates)
print ('The closest city is ',cityGeoData.loc[closest[1][0]].City,'and is ',closest[0][0],'meters away')

You can also query for multiple neighbors. For example, find two closest cities

In [None]:
closest = cityTree.query(targetCoordinates,k=2)
print ('The closest cities are ',cityGeoData.loc[closest[1][0]].City.values)

We can also query based on distance. For example, find 3 nearest cities that are within 200 miles (200*1600m) of the target location

In [None]:
closest = cityTree.query(targetCoordinates,k=3,distance_upper_bound=200*1600)
#We have to remove non-matches
matchingIds = closest[1][0][closest[1][0]!=len(cityGeoData)]
print ('The closest cities are ',cityGeoData.loc[matchingIds].City.values)

So there is not even a single city that is within 200 miles from the target location. Let us try 1500 miles

In [None]:
closest = cityTree.query(targetCoordinates,k=3,distance_upper_bound=1500*1600)
#We have to remove non-matches
matchingIds = closest[1][0][closest[1][0]!=len(cityGeoData)]
print ('The closest cities are ',cityGeoData.loc[matchingIds].City.values)

We can also query with multiple target locations.

In [None]:
targetPlacesFrame = pd.DataFrame([['Earth Quake Location 1',33.96762282324769, -118.3389937350672],\
                    ['Earth Quake Location 2',34.01222880117181, -118.28794093440929],\
                    ['Earth Quake Location 3',29.420108950140303, -98.4839186031248]],columns=['Location','latitude','longitude'])
targetGeoData = gpd.GeoDataFrame(targetPlacesFrame['Location'],geometry=gpd.points_from_xy(targetPlacesFrame.longitude,targetPlacesFrame.latitude),crs='EPSG:4326')
targetGeoData = targetGeoData.to_crs('EPSG:9822')
targetCoordinates = np.dstack((targetGeoData.geometry.x,targetGeoData.geometry.y))[0]
targetCoordinates

Query the three locations for the nearest neighbor

In [None]:
nearest = cityTree.query(targetCoordinates,k=1)
for i,row in targetPlacesFrame.iterrows():
    print ('Nearest city to ',row.Location,'is',cityGeoData.loc[nearest[1][i]].City,' and it is',nearest[0][i],'meters away.')

Another method that you can use is query_ball_point, which finds all neighbors within a specific distance from the target points. Let us try that out. Find all neighbors within 100 miles from the target location

In [None]:
nearest = cityTree.query_ball_point(targetCoordinates,100*1600)
nearest

This returns an array of list objects with each list corresponding to neighbors of each target location. If there is no neighbors then it will be an empty list. 

In [None]:
#neighbor with in 2 miles
nearest = cityTree.query_ball_point(targetCoordinates,2*1600)
nearest

Now this query_ball_point results can be used to alter our question, which is even more powerful, how many target locations within x miles from the neighbors. Let us try that out. The question here is how many target locations within 100 miles of each city. The target location could be anything. It could location of earthquakes, Walmart, McDonalds etc.

In [None]:
#neighbor with in 100 miles
nearest = cityTree.query_ball_point(targetCoordinates,100*1600)
nearest

Now you can join the results together to a single list

In [None]:
results = np.concatenate(nearest)
results

Now there are many ways to assign it back to the City dataframe. The easiest is to first create a frequency count of the results. Then we need to create a new column in the city dataframe and based on the frequency count keys (which is index in the city dataframe) assign the values. Let us see that as step-by-step

In [None]:
from collections import Counter
cityCounter = Counter(results)
cityCounter

As you can see, it creates a Counter object (which is a modified version of Dict) which stores the frequency. Now we can create a new column in the cities geodataframe.

In [None]:
cityGeoData['counts'] = 0
cityGeoData

Now we just need to modify the counts based on index

In [None]:
cityGeoData.loc[list(cityCounter.keys()),'counts'] = list(cityCounter.values())
cityGeoData

So Los Angeles have two target locations with in 100 miles and San Antonio has 1. 

Now let's try this out in some real data so that we can gauge the performance.

For this small test we are going to use our city data and the earthquake data for the past 5 years (https://earthquake.usgs.gov/) 

First, load the city data

In [None]:
cities = gpd.read_file(r'supplementary/data/USA_Major_Cities/USA_Major_Cities.shp').to_crs('EPSG:9822')
print (cities.shape)
cities.head()

So we have 3,886 cities. Now let's load the earthquake data. 

In [None]:
eqData  = pd.read_csv(r'supplementary/data/eq_data_from_2016.csv')
print (eqData.shape)
eqData.head()

So there are 17,907 records. Now convert this to a geodataframe

In [None]:
eqGeo  = gpd.GeoDataFrame(eqData,geometry = gpd.points_from_xy(eqData.longitude,eqData.latitude),crs='EPSG:4326').to_crs('EPSG:9822')

Now we will create the tree. We will create tree, using cities as the source. Let us see how much time it takes to create a tree

In [None]:
%%timeit
cityTree = cKDTree(np.dstack((cities.geometry.x,cities.geometry.y))[0])

It just took only 1.54ms (.001 seconds) to create the tree. So that's really fast. Now let’s see the heavyweight operation. Our goal is to find out how many earthquakes within 30 miles from city.

In [None]:
%%timeit
nearest = cityTree.query_ball_point(np.dstack((eqGeo.geometry.x,eqGeo.geometry.y))[0],30*1600)

It just took 10.8 ms (.01s) to get all the neighbors. Now it’s just packing them together. Let us add a eqCount column in our city geodataframe

In [None]:
cities['eqCount'] = 0

Now we will do the frequency counter and will update the eqCount

In [None]:
%%timeit
cityCounter=Counter(np.concatenate(nearest))
cities.loc[list(cityCounter.keys()),'eqCount'] = list(cityCounter.values())

That’s just 314 micro seconds which is (.000314 seconds). So let’s run the entire program.

In [None]:
%%timeit
cityTree = cKDTree(np.dstack((cities.geometry.x,cities.geometry.y))[0])
nearest = cityTree.query_ball_point(np.dstack((eqGeo.geometry.x,eqGeo.geometry.y))[0],30*1600)
cities['eqCount'] = 0
cityCounter=Counter(np.concatenate(nearest))
cities.loc[list(cityCounter.keys()),'eqCount'] = list(cityCounter.values())

Just 63.6 (.0636 seconds) microseconds to run the entire program (including the creation of tree). That is fast considering the fact that a naive implementation will lead to

3886 * 17907 = 66,062,000!! Calculations.

Ok now we will see the modified version of our previous experiment with traffic signals. Notice the traffic time

In [None]:
import pandas as pd
import queue
import threading
import time
import geopandas as gpd
import json
from scipy.spatial import cKDTree
import numpy as np
import pandas as pd

In [None]:
class GPSThread(threading.Thread):
    # we will use two queues, one for pushing the GPS data and other to recieve any message from main thread
    def __init__(self, dataFrame,status):
        threading.Thread.__init__(self)
        self.dataFrame = dataFrame
        self.status = status
    def run(self):
        #load the data file
        data = pd.read_parquet(r'supplementary/data/taxi1hr_gps.parquet')
        #create an index based on seconds 
        data.set_index('sec',inplace=True)
        #now we need to loop through the dataset
        for sec in range(data.index.min(),data.index.max()):
            #kill switch is a message in the message queue
            if self.status[0]==0:
                break
            dat = data.loc[sec]
            gpsDat = dat[['id','lng','lat']].to_json(orient='records')
            self.dataFrame.loc[sec] = gpsDat
            #after one iteration sleep for a second......
            time.sleep(1)
            #remove data that is older than 5 minutes seconds with respect to current time
            #self.dataFrame.drop(self.dataFrame.index[self.dataFrame.index<(sec-300)],inplace=True)
        #if simulation is over put a pill in the outQueue
        self.status[0] = 2
        
class TrafficThread(threading.Thread):
    # we will use two queues, one for pushing the GPS data and other to recieve any message from main thread
    def __init__(self, gpsDataFrame,resultsDict,status,params):
        threading.Thread.__init__(self)
        self.gpsDataFrame = gpsDataFrame
        self.resultsDict = resultsDict
        self.status = status
        self.tree = None
        self.params = params
    #function to calculate nearest neighbour for a GeoDataframe with a given distance (a very poor implementation)
    def findNearestNeighbors(self,data,distance=100):
        out = pd.DataFrame({'id':np.arange(len(self.tree.data),dtype=np.int16)})
        results = self.tree.query_ball_point(np.dstack((data.geometry.x,data.geometry.y))[0],distance+.0001)
        index,counts = np.unique(np.concatenate(results),return_counts=True)
        countData = pd.DataFrame({'id':index,'counts':counts})
        out = out.merge(countData,on='id',how='left').fillna(0)
        return out.counts.values
    
    def run(self):
        processed = []
        #load the traffic signals 
        data = gpd.read_file(r'supplementary/data/nyc__traffic_signals/nyc__traffic_signals.shp')
        outframe = pd.DataFrame({'id':data.id,'lat':data.geometry.y,'lng':data.geometry.x,'counts':[0]*len(data)})
        #we need to project the data for distance calculations
        data_projected = data.to_crs('EPSG:32618')
        #initialize a tree
        self.tree = cKDTree(np.dstack((data_projected.geometry.x,data_projected.geometry.y))[0])
        #Now we will monitor the gpsDataFrame continuosly for changes
        while True:
            if self.status[0]==0 or self.status[0]==2:
                break
            #avoid dirtyread problem by making a copy
            currentFrame = self.gpsDataFrame.copy()
            #retreive the earliest element not processed from gpsDataFrame
            toProcess = currentFrame.loc[~currentFrame.index.isin(processed)]
            if len(toProcess)>0:
                currentGPS = toProcess.iloc[0]
                currentGPSData = pd.read_json(currentGPS.data)
                #we need to convert to GeoDataFrame and project the data for distance calculation
                currentGPSProjected = gpd.GeoDataFrame(currentGPSData['id'],geometry=gpd.points_from_xy(currentGPSData.lng,currentGPSData.lat),crs='EPSG:4326').to_crs('EPSG:32618')
                #now perform the nearest neighbor caclulation and add it to result dict
                #TODO, the result will be signals with id and count
                nearestNeighbors = self.findNearestNeighbors(currentGPSProjected,self.params['monitorDistance'])
                newData = outframe[['id','lat','lng']].assign(counts=nearestNeighbors)
                self.resultsDict[currentGPS.name] = newData.loc[newData.counts>=self.params['warningCount']]
                processed.append(currentGPS.name)
                

gpsData = None
trafficResults = None
status = [-1]
trafficParams = None
def start():
    global gpsData
    global trafficResults
    global status
    global trafficParams
    gpsData = pd.DataFrame(columns = ['sec','data'])
    gpsData.set_index('sec',inplace=True)
    trafficResults = {}
    status[0] = 1
    trafficParams = {'monitorDistance':500,'warningCount':2}
    #startup threads
    gpsThread = GPSThread(gpsData,status)
    gpsThread.start()
    trafficThread = TrafficThread(gpsData,trafficResults,status,trafficParams)
    trafficThread.start()
    #small delay for the thread to startup
    time.sleep(.5)
    #gpsThread.join()
    #trafficThread.join()
    return "started"

def modifyTrafficParams(param):
    global trafficParams
    trafficParams['monitorDistance'] = float(param['monitorDistance'])
    trafficParams['warningCount'] = int(param['warningCount'])
    return json.dumps(trafficParams)
    
def getGPSData(sec):
    if sec in gpsData.index:
        return gpsData.loc[sec].data+">>>currentGPSTime:"+str(max(gpsData.index))
    return "No Data" 

def getTrafficData(sec):
    if sec in trafficResults:
        return trafficResults.pop(sec).to_json(orient="records")
    return "No Data" 

def getStatus():
    global status
    if status[0] == 2 or status[0] == 0:
        return "sim over"
    return "running"

def stop():
    global status
    status[0] = 0
    cleanup()
    return "stopping"         


In [None]:
%%html
<link rel="stylesheet" href="https://unpkg.com/leaflet@1.8.0/dist/leaflet.css"/>
<div id="main">
    <div id="mapandcontrols">
        <div id="map"></div>
        <div id="controls">
            <button id="start_button" class ="controlbuttons" onclick = "run()">Start Simulation</button>
            <button id="stop_button" class ="controlbuttons" onclick = "stop()" disabled>Stop Simulation</button>
            <span id="active_cars" class ="controlbuttons">Active Cars:0</span>
            <span id="gpstime" class ="controlbuttons">GPSTime:0</span>
            <span id="traffictime" class ="controlbuttons">TrafficTime:0</span>
            <span id="realgpstime" class ="controlbuttons">RealGPSTime:0</span>
        </div>
    </div>
    <div id="params">
        <div class = "maxwidth" style="height:10%;">
            <span class="controlbuttons" style="width:50%">Scan Radius (m):</span>
            <input type="text" id="scanrad" class="controlbuttons " style="width:30%" value="200">
        </div>
        <div class = "maxwidth" style="height:10%;">
            <span class="controlbuttons" style="width:55%">Warning Car Count:</span>
            <select id="warncount" class="controlbuttons " style="width:25%"></select>
        </div>
    </div>
</div>
<style>
#main { height: 500px;width:800px; }
#mapandcontrols { height: 100%;width:65%;float:left}
#params { height: 100%;width:30%;float:left;margin-left:2%;}
#map { height:75%;width:100%; }
#controls { height: 20%;margin-top:4%; }
.maxwidth{width:100%;}
.maxheight{height:100%;}
.halfwidth{width:50%;}
.halfheight{height:50%;}
.controlbuttons{float:left;margin:1%;}
</style>
<script>
    var map, datInterval, current, currentTraffic, trafficDatInterval, statusInterval;
    require.config({
        paths: {
            d3: 'https://d3js.org/d3.v7.min',
            L: 'https://unpkg.com/leaflet@1.8.0/dist/leaflet'
        }
    });
    
    function updateParams(){
        require(['d3'], function(d3) {
            var warningCount = d3.select('#warncount').property('value');
            var scanRadius = d3.select('#scanrad').property('value');
        
            paramObj = JSON.stringify({'monitorDistance':scanRadius,'warningCount':warningCount});
            IPython.notebook.kernel.execute(
                "modifyTrafficParams("+paramObj+")", 
                {
                    iopub: {
                        output: function(response) {
                            var dataString = response.content.data['text/plain'];
                        }
                    }
                },
                {
                    silent: false, 
                    store_history: false, 
                    stop_on_error: true
                }
            );
        });
    }
    
    function cleanup(){
        clearInterval(statusInterval)
        clearInterval(datInterval)
        clearInterval(trafficDatInterval)
        currentTraffic = 0;
        current = 0;
        require(['d3'], function(d3) {
            d3.select("#trafficFrame").selectAll("circle").remove();
            d3.select("#gpsFrame").selectAll("circle").remove();
            d3.select('#warncount').property('value',20);
            d3.select('#scanrad').property('value',200);
            document.getElementById("start_button").disabled = false;
            document.getElementById("stop_button").disabled = true;
            document.getElementById("active_cars").innerText = "Active Cars:0";
            document.getElementById("traffictime").innerText = "TrafficTime:0";
            document.getElementById("gpstime").innerText = "GPSTime:0";
            document.getElementById("realgpstime").innerText = "RealGPSTime:0";
            document.getElementById("warncount").disabled = false;
            document.getElementById("scanrad").disabled = false;
        });
    }
    
    function getStatus(){
        IPython.notebook.kernel.execute(
            "getStatus()", 
            {
                iopub: {
                    output: function(response) {
                        var dataString = response.content.data['text/plain'];
                        if (dataString.includes("sim over")){
                            console.log('Time to clean up everything');
                            cleanup();
                        }
                    }
                }
            },
            {
                silent: false, 
                store_history: false, 
                stop_on_error: true
            }
        );    
    }

    function fetchTrafficData() {
        //first check whether the points have been loaded. If points are not loaded we need to retry
        require(['d3'], function(d3) {
            //if points are already loaded then we just need to update the traffic signals based on counts
            IPython.notebook.kernel.execute(
                "getTrafficData(" + currentTraffic + ")", {
                    iopub: {
                        output: function(response) {
                            // Print the return value of the Python code to the console
                            var dataString = response.content.data['text/plain'];
                            if (!(dataString.includes("sim over") || dataString.includes("No Data"))) {
                                var data = JSON.parse(dataString.slice(1, dataString.length - 1));
                                d3.select("#trafficFrame")
                                    .selectAll("circle")
                                    .data(data, d => d.id)
                                    .join(
                                        enter => enter.append('circle')
                                        .attr("cx", d => map.latLngToLayerPoint([d.lat, d.lng]).x)
                                        .attr("cy", d => map.latLngToLayerPoint([d.lat, d.lng]).y)
                                        .attr("r", 1)
                                        .style("fill", "red")
                                        .attr("stroke", "red")
                                        .attr("stroke-width", 1)
                                        .attr("fill-opacity", 1)
                                        .attr("opacity", 1)
                                        .transition()
                                        .duration(500)
                                        .attr("r", 5)
                                        .selection(),
                                        
                                        update => update
                                        .attr("r", 1)
                                        .transition()
                                        .duration(500)
                                        .attr("r", 5)
                                        .selection(),
                                        
                                        exit => exit
                                        .remove()
                                    )
                                document.getElementById("traffictime").innerText = "TrafficTime: " + currentTraffic;
                                currentTraffic += 1;
                            }
                        }
                    }
                }, {
                    silent: false,
                    store_history: false,
                    stop_on_error: true
                }
            );
        });
    }

    function fetchGPSData() {
        IPython.notebook.kernel.execute(
            "getGPSData(" + current + ")", {
                iopub: {
                    output: function(response) {
                        
                        // Print the return value of the Python code to the console
                        var dataString = response.content.data['text/plain'];
                        if (!(dataString.includes("sim over") || dataString.includes("No Data"))) {
                            require(['d3'], function(d3) {
                                var sections = dataString.split(">>>");
                                var data = JSON.parse(sections[0].slice(1, sections[0].length));
                                document.getElementById("active_cars").innerText = "Active Cars: " + data.length;
                                d3.select("#gpsFrame")
                                    .selectAll("circle")
                                    .data(data, d => d.id)
                                    .join(
                                        enter => enter.append('circle')
                                        .attr("cx", d => map.latLngToLayerPoint([d.lat, d.lng]).x)
                                        .attr("cy", d => map.latLngToLayerPoint([d.lat, d.lng]).y)
                                        .attr("r", 2)
                                        .attr("stroke", "green")
                                        .attr("stroke-width", 0.4)
                                        .attr("fill-opacity", 0.3)
                                        .style("fill", "green")
                                        .selection(),

                                        update => update
                                        .attr("cx", d => map.latLngToLayerPoint([d.lat, d.lng]).x)
                                        .attr("cy", d => map.latLngToLayerPoint([d.lat, d.lng]).y)
                                        .selection(),

                                        exit => exit
                                        .remove()
                                    )
                                document.getElementById("gpstime").innerText = "GPSTime: " + current;
                                document.getElementById("realgpstime").innerText = "RealGPSTime: " + sections[1].replace("'","").split(":")[1];
                                current += 1;
                            });
                        }
                    }
                }
            }, {
                silent: false,
                store_history: false,
                stop_on_error: true
            }
        );
    }

    function update() {
        require(['d3'], function(d3) {
            d3.selectAll("circle")
                .attr("cx", d => map.latLngToLayerPoint([d.lat, d.lng]).x)
                .attr("cy", d => map.latLngToLayerPoint([d.lat, d.lng]).y)
        });
    }

    function stop() {
        IPython.notebook.kernel.execute(
            "stop()", {
                iopub: {
                    output: function(response) {}
                }
            }, {
                silent: false,
                store_history: false,
                stop_on_error: true
            }
        )
    }

    function run() {
        current = 0;
        currentTraffic = 0;
        IPython.notebook.kernel.execute(
            "start()", {
                iopub: {
                    output: function(response) {
                        statusInterval = setInterval(getStatus, 500);
                        datInterval = setInterval(fetchGPSData, 500);
                        trafficDatInterval = setInterval(fetchTrafficData, 1000);
                        //disable the start button
                        document.getElementById("start_button").disabled = true;
                        document.getElementById("stop_button").disabled = false;
                        document.getElementById("warncount").disabled = true;
                        document.getElementById("scanrad").disabled = true;
                        updateParams();
                    }
                }
            }, {
                silent: false,
                store_history: false,
                stop_on_error: true
            }
        )
    }

    require(['d3', 'L'], function(d3, L) {
        map = L
            .map('map')
            .setView([40.763231753511604, -73.98383956127027], 10); // center position + zoom

        // Add a tile to the map = a background. Comes from OpenStreetmap
        L.tileLayer('https://tile.openstreetmap.org/{z}/{x}/{y}.png', {
            maxZoom: 19,
            attribution: '© OpenStreetMap'
        }).addTo(map);
        // Add a svg layer to the map
        L.svg().addTo(map);
        map.on("moveend", update)
        d3.select("#map").select("svg").append("g").attr("id", "gpsFrame");
        d3.select("#map").select("svg").append("g").attr("id", "trafficFrame");
        var carcount = [];
        for(i=1;i<=1000;i++)
            carcount.push(i);
        d3.select('#warncount').selectAll('option').data(carcount).enter().append('option').property('value',function(d){return d;}).text(function(d){return d;});
        d3.select('#warncount').property('value',20);
    });
    
    
</script>

Now check the traffic time. You can clearly see that it matches with the GPS time, indicating that the scanning process is working in tandem with the GPS data. Now why don't you see any warnings? Because this is a stringent criteria. 20 Taxis should be within 200 meter of a traffic pole to generate warning. It might take a while before the condition is met. What you can try instead is to increase the scan radius and reduce the Car count. 

The key section of the code is 
```python
def findNearestNeighbors(self,data,distance=100):
    out = pd.DataFrame({'id':np.arange(len(self.tree.data),dtype=np.int16)})
    results = self.tree.query_ball_point(np.dstack((data.geometry.x,data.geometry.y))[0],distance+.0001)
    index,counts = np.unique(np.concatenate(results),return_counts=True)
    countData = pd.DataFrame({'id':index,'counts':counts})
    out = out.merge(countData,on='id',how='left').fillna(0)
    return out.counts.values
```

Which is similar to our previous examples in this section.

So again we see that 
><span STYLE="font-size:18.0pt;color:black">"With efficient Algorithms we can tackle the Volume and Velocity challenges of Spatial Big Data" </span>.

# Congratulations!


**You have finished an Hour of CI!**


But, before you go ... 

1. Please fill out a very brief questionnaire to provide feedback and help us improve the Hour of CI lessons. It is fast and your feedback is very important to let us know what you learned and how we can improve the lessons in the future.
2. If you would like a certificate, then please type your name below and click "Create Certificate" and you will be presented with a PDF certificate.

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="https://forms.gle/JUUBm76rLB8iYppN7">Take the questionnaire and provide feedback</a></font>

In [None]:

# This code cell loads the Interact Textbox that will ask users for their name
# Once they click "Create Certificate" then it will add their name to the certificate template
# And present them a PDF certificate
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw

from ipywidgets import interact

def make_cert(learner_name, lesson_name):
    cert_filename = 'hourofci_certificate.pdf'

    img = Image.open("../../supplementary/hci-certificate-template.jpg")
    draw = ImageDraw.Draw(img)

    cert_font   = ImageFont.truetype('../../supplementary/cruft.ttf', 150)
    cert_fontsm = ImageFont.truetype('../../supplementary/cruft.ttf', 80)
    
    _,_,w,h = cert_font.getbbox(learner_name)  
    draw.text( xy = (1650-w/2,1100-h/2), text = learner_name, fill=(0,0,0),font=cert_font)
    
    _,_,w,h = cert_fontsm.getbbox(lesson_name)
    draw.text( xy = (1650-w/2,1100-h/2 + 750), text = lesson_name, fill=(0,0,0),font=cert_fontsm)
    
    img.save(cert_filename, "PDF", resolution=100.0)   
    return cert_filename


interact_cert=interact.options(manual=True, manual_name="Create Certificate")

@interact_cert(name="Your Name")
def f(name):
    print("Congratulations",name)
    filename = make_cert(name, 'Intermediate Big Data')
    print("Download your certificate by clicking the link below.")
    
    
    

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="hourofci_certificate.pdf?download=1" download="hourofci_certificate.pdf">Download your certificate</a></font>

