# Uber: Price estimates in Natal-RN
by <a href="https://github.com/adrianabenicio">Adriana Benício</a> and <a href="https://github.com/gisliany">Gisliany Alves</a>

### Uber API

<p style="text-align: justify">
Firstly, we need to create a developer account in <a href="https://developer.uber.com/">Developers | Uber</a> page. After login, we create an app, and the page returns a <b>server_token</b>, <b>client_id</b>, and <b>client_secret</b>. These are used to authenticate our application and the rider when calling the API. In sequence, we install the `uber-rides` package using following command:
</p>

>```
!pip install uber-rides
```

<p style="text-align: justify">
Now, we create a session with the server
</p>

```python
from uber_rides.session import Session
from uber_rides.client import UberRidesClient

session = Session(server_token='our_server_token')
client = UberRidesClient(session)
```

<p style="text-align: justify">
To get the price estimates, we use the following method
</p>

```python
client.get_price_estimates(start_latitude, start_longitude, end_latitude, end_longitude, seat_count)
```

### Choosing the origins and destinations

<p style="text-align: justify">
We chose two origins: Instituto Metrópole Digital (IMD) and Escola de Ciências e Tecnologia (ECT). For the destinations, we chose two random points in 20 neighborhoods of Natal. We can get a GeoJson file with the Natal neighborhoods by executing the query below into <a href="http://overpass-turbo.eu/">http://overpass-turbo.eu/</a> page:
</p>

```
[out:json][timeout:25];
{{geocodeArea:Natal RN Brasil}}->.searchArea;
(
  relation["admin_level"="10"](area.searchArea);
);
out body;

out skel qt;
```
<p style="text-align: justify">
The file `natal.geojson` is also in the geojson folder in this repository. To get the 20 random points, we execute the code below. In the GeoJson file, we have information about all the Natal's neighborhoods, like their names and the coordinates of their bounds. And this last information is passed to the Polygon class and used in the algorithm to define if a point is inside of the neighborhood's shape. Finally, we use the method `is_land()` of Basemap to verify if the generated point is on land (not in the water).
</p>

In [6]:
## All necessary imports
import os
import folium
import json
import pandas as pd
from branca.colormap import linear
import numpy as np
from shapely.geometry import Polygon
from shapely.geometry import Point
from numpy import random
# import geojson file about natal neighborhood

In [1]:
###########################################################################################
def generate_random(number, polygon, neighborhood):
    """ Returns a number of points inside the polygon.

        Keyword arguments:
        - number:       number of random point
        - polygon:      polygon object representing the neighborhood's geometry
        - neighborhood: neighborhood's name
    """
    
    list_of_points = []
    minx, miny, maxx, maxy = polygon.bounds
    counter = 0
    bm = Basemap(resolution='i') # instance of basemap with intermediary resolution
    
    while counter < number:
        x = random.uniform(minx, maxx)
        y = random.uniform(miny, maxy)
        pnt = Point(x, y)
        
        # verify if the point is inside the neighborhood's polygon and
        # if the point is on land (avoid points on water)
        if polygon.contains(pnt) and bm.is_land(pnt.x, pnt.y): 
            list_of_points.append([x,y,neighborhood])
            counter += 1
    return list_of_points
##########################################################################################

number_of_points = 2
points_all = []

# import geojson file about natal neighborhood
natal_neigh = os.path.join('geojson', 'natal.geojson')

# load the data and use 'UTF-8'encoding
geo_json_natal = json.load(open(natal_neigh,encoding='UTF-8'))

for feature in geo_json_natal['features']: 
    neighborhood = feature['properties']['name']
    geom = feature['geometry']['coordinates']
    polygon = Polygon(geom[0])
    points = generate_random(number_of_points,polygon, neighborhood)
    points_all.extend(points);
    
points_all

<p style="text-align: justify">
This code will generate 72 destinations, being two points for each one of the 36 neighborhoods. We select just 40 points (20 neighborhoods) to reduce the amount of requests to be made to the API. So, the origins and the destinations are:
</p>

In [8]:
# [longitude, latitude, name]
origins = [[-35.1995, -5.8434, 'Escola de Ciências e Tecnologia'], [-35.2054, -5.8323,'Instituto Metrópole Digital']]

destinations = [[-35.236981800685165, -5.870840666224352, 'Pitimbu'],
 [-35.239511057500785, -5.8623961914886396, 'Pitimbu'],
 [-35.24860515689553, -5.852370621667493, 'Planalto'],
 [-35.24739205909851, -5.8422971531192, 'Planalto'],
 [-35.15596978298069, -5.887973915383883, 'Ponta Negra'],
 [-35.168584855247744, -5.887199968737731, 'Ponta Negra'],
 [-35.20974352004816, -5.866488729132839, 'Neópolis'],
 [-35.216579352759986, -5.865620580955296, 'Neópolis'],
 [-35.20825434639825, -5.844952862777679, 'Capim Macio'],
 [-35.192032562474694, -5.865902667145231, 'Capim Macio'],
 [-35.25667890343955, -5.7640447137790165, 'Potengi'],
 [-35.25103068409885, -5.751524469672768, 'Potengi'],
 [-35.20293156188922, -5.799199687762013, 'Barro Vermelho'],
 [-35.21175595624902, -5.795702059430728, 'Barro Vermelho'],
 [-35.22380104989862, -5.848666799851974, 'Candelária'],
 [-35.2138541432291, -5.859300106086236, 'Candelária'],
 [-35.22215431544492, -5.749424954633537, 'Redinha'],
 [-35.2331018634999, -5.755193450124638, 'Redinha'],
 [-35.285105355750254, -5.741227853201468, 'Nossa Senhora da Apresentação'],
 [-35.263257762701215, -5.749344486262619, 'Nossa Senhora da Apresentação'],
 [-35.20405658674978, -5.774736461859913, 'Ribeira'],
 [-35.20448374240866, -5.7751467508506815, 'Ribeira'],
 [-35.20646534329146, -5.781229934058792, 'Cidade Alta'],
 [-35.20618356547286, -5.784264982577109, 'Cidade Alta'],
 [-35.218644869640706, -5.790558452384586, 'Alecrim'],
 [-35.22044104895825, -5.790315058307776, 'Alecrim'],
 [-35.2353263560941, -5.798918267396442, 'Quintas'],
 [-35.236522332120465, -5.791683775684841, 'Quintas'],
 [-35.23474121280764, -5.813442626757365, 'Nossa Senhora de Nazaré'],
 [-35.232292557638345, -5.816801834175864, 'Nossa Senhora de Nazaré'],
 [-35.219957385550096, -5.828918326434649, 'Lagoa Nova'],
 [-35.20587339928591, -5.813038977663765, 'Lagoa Nova'],
 [-35.19941251638022, -5.8210933802055225, 'Nova Descoberta'],
 [-35.200640402172816, -5.827364323950782, 'Nova Descoberta'],
 [-35.19683237652891, -5.8119112440893455, 'Tirol'],
 [-35.20220670728412, -5.796057712196255, 'Tirol'],
 [-35.19980727338861, -5.781753220028521, 'Petrópolis'],
 [-35.19721630188386, -5.783385852484189, 'Petrópolis'],
 [-35.237040337969404, -5.821493068463147, 'Cidade da Esperança'],
 [-35.23935530014809, -5.829009577179275, 'Cidade da Esperança']]

In the map, we can see the neighborhoods, the origins (dark blue points) and the destinations (red points).

In [9]:
# Editing the GeoJson variable to have just the 20 selected neighborhoods

features = [] # new array of features

for i,value in enumerate(destinations):                 # 1. loops over the 20 destinations
    log, lat, neigh = value                             # 2. gets the neighborhood's name
    
    for geo_feature in geo_json_natal['features']:      # 3. iterates over each feature (neighborhood's properties)
        if neigh in geo_feature['properties'].values(): # 4. verifies if the selected neighborhood exists in the GeoJson feature
            if neigh not in features:                   # 5. verifies if the selected neighborhood wasn't in features array
                features.append(geo_feature)            # 6. appends the GeoJson feature into new features array
                
geo_json_natal['features'] = features                   # 7. overwrites the features into GeoJson variable

In [10]:
# Folium instance
m = folium.Map(
    location=[-5.802, -35.212558],
    zoom_start=12,
    tiles='Stamen Terrain'
)

# adds the GeoJson with the 20 neighborhoods
folium.GeoJson(geo_json_natal).add_to(m)

# adds the origins
for i,value in enumerate(origins):
    log, lat, name = value 
    folium.CircleMarker([lat,log], radius=2, popup=name, color='darkblue').add_to(m)

# adds the destinations
for i,value in enumerate(destinations):
    log, lat, name = value 
    folium.CircleMarker([lat,log], radius=2, popup='%s %s%d' % (name, '#', i), color='red').add_to(m)
    
m

### Getting the price estimates

<p style="text-align: justify;">
Now, we have the origins and the destinations coordinates, so we can use the `get_price_estimates` method to request the price estimates from each origin to each destination. The code below shows how we deal with the requests to the Uber API. This code is saved into the `collect.py` file and it does a request of 3 minutes for a week. When a request fails, we try two more times. The response is written in a CSV file.
</p>

```python

def handler(signum, frame):
    """
    Prints a timeout exception exception
    """
    print("**Timeout**")
    raise Exception("Timeout de resposta da API")


def getPriceEstimates(lat1,long1,lat2,long2):
    """
    Requests the price estimates to Uber API
    """
    return client.get_price_estimates(
        start_latitude=lat1,
        start_longitude=long1,
        end_latitude=lat2,
        end_longitude=long2,
        seat_count=2)
###############################################################################################################

signal.signal(signal.SIGALRM, handler)                  # set the handler of a timeout function

writeHeader = True                                      # write the header into csv file
delta = datetime.now() + timedelta(days=8)              # delta of one week

while(datetime.now() < delta):
    delta3min = datetime.now() + timedelta(minutes=3)   # delta of three minutes
    while(datetime.now() < delta3min):
        initial = datetime.now()
        for o in origins:
            for d in destinations:
                
                retries = 1
                
                # loop over the retries
                while(retries < 4):
                    try:
                        signal.alarm(3) # requests that a SIGALRM signal be sent to the process in 3 seconds
                        response = getPriceEstimates(o[1],o[0],d[1],d[0])
                        signal.alarm(0) # if the response arrives before the 3 seconds, the alarm is canceled
                        retries = 4;
                    except Exception as inst:
                        retries = retries + 1;
                        print(type(inst))
                        print(str(retries) + ') ')

                try:
                    # extracts the response if it's not none
                    if response is not None and type(response) != str:
                        json = response.json
                    else:
                        json = None
                        print(str(datetime.now()) + ' response none or str')
                    
                    if json is not None:
                        prices = json.get('prices')

                        # adds the date, origin and destination fields
                        if prices is not None:
                            for p in prices:
                                p['date'] = str(datetime.now())
                                p['origin'] = o[2]
                                p['destination'] = d[2]
                        else:
                            print(str(datetime.now()) + ' prices none')
                    else:
                        print(str(datetime.now()) + ' json none')
                    
                    pd.DataFrame(data=prices).to_csv('uberdata.csv', mode='a', header=writeHeader) # writes into a file
                    writeHeader = False 
                except Exception as inst:
                    print(type(inst))
                    print(inst)
        
        final = datetime.now()
        time_to_wait = 180 - divmod((final - initial).seconds, 200)[1] > 0
        
        # if the request lasted less than 3 minutes, wait the remaining time for 3 minutes
        if (time_to_wait > 0)
            sleep(time_to_wait)
```

### Reading the data and plotting the maps

We use the pandas library to deal with the generated csv file.

In [3]:
dataset_uber = os.path.join('data', 'uberdata.csv')
price_data = pd.read_csv(dataset_uber);
uber_data.head(5)

Unnamed: 0.1,Unnamed: 0,currency_code,date,destination,display_name,distance,duration,estimate,high_estimate,localized_display_name,low_estimate,origin,product_id
0,0,BRL,2017-10-27 15:50:30.795377,Pitimbu,uberX,5.25,840,R$15-19,19.0,uberX,15.0,ECT,65cb1829-9761-40f8-acc6-92d700fe2924
1,1,BRL,2017-10-27 15:50:30.795430,Pitimbu,UberSELECT,5.25,840,R$17-22,22.0,UberSELECT,17.0,ECT,bf8f99ca-f5f2-40d4-8ffc-52f1e2b17138
2,0,BRL,2017-10-27 15:50:32.165398,Pitimbu,uberX,4.38,780,R$13-17,17.0,uberX,13.0,ECT,65cb1829-9761-40f8-acc6-92d700fe2924
3,1,BRL,2017-10-27 15:50:32.165471,Pitimbu,UberSELECT,4.38,780,R$15-20,20.0,UberSELECT,15.0,ECT,bf8f99ca-f5f2-40d4-8ffc-52f1e2b17138
4,0,BRL,2017-10-27 15:50:33.845143,Planalto,uberX,5.59,1020,R$16-21,21.0,uberX,16.0,ECT,65cb1829-9761-40f8-acc6-92d700fe2924


In [6]:
uber_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 611490 entries, 0 to 611489
Data columns (total 13 columns):
Unnamed: 0                611490 non-null int64
currency_code             611490 non-null object
date                      611490 non-null object
destination               611490 non-null object
display_name              611490 non-null object
distance                  611490 non-null float64
duration                  611490 non-null int64
estimate                  611490 non-null object
high_estimate             611490 non-null float64
localized_display_name    611490 non-null object
low_estimate              611490 non-null float64
origin                    611490 non-null object
product_id                611490 non-null object
dtypes: float64(3), int64(2), object(8)
memory usage: 60.6+ MB


<p style="text-align: justify;">
As we can see in the dataset above, we have two Uber services available in Natal: UberX and UberSelect (see the `display_name` column). Also, the CSV file has the `low_estimate` and the `high_estimate` columns, so we can obtain a mean of this two values to use as parameter for each trip. In sequence, we add the column `price_mean` into the dataset and we made a pivot table manually by grouping by the columns `display_name`, `origin` and `destination`, and then calculating the mean of the `price_mean` column.
</p>

In [11]:
uber_data['price_mean'] = (uber_data['high_estimate'] + uber_data['low_estimate'])/2
uber_prices = uber_data.pivot_table(index=['display_name', 'origin','destination'], values="price_mean", aggfunc=np.mean)['price_mean']
uber_prices

display_name  origin  destination                  
UberSELECT    ECT     Alecrim                          23.860348
                      Barro Vermelho                   21.696494
                      Candelária                       15.689626
                      Capim Macio                      13.389390
                      Cidade Alta                      25.179226
                      Cidade da Esperança              21.425497
                      Lagoa Nova                       16.339744
                      Neópolis                         19.121272
                      Nossa Senhora da Apresentação    43.674189
                      Nossa Senhora de Nazaré          19.352695
                      Nova Descoberta                  14.305534
                      Petrópolis                       24.152930
                      Pitimbu                          22.315215
                      Planalto                         26.097527
                      Ponta Negra     

<p style="text-align: justify;">
To plot the results we need to create a function that maps one value to a RGB color (of the form #RRGGBB). For this, we'll use colormap tools from branca.colormap.
</p>

In [14]:
    colormapUberX = linear.YlOrRd.scale(
    uber_prices['uberX'].values.min(),
    uber_prices['uberX'].values.max())

colormapUberSelect = linear.YlGnBu.scale(
    uber_prices['UberSELECT'].values.min(),
    uber_prices['UberSELECT'].values.max())

Now we use the Folium library to plot a choropleth map using the GeoJson file with the Natal's neighborhoods and we fill the neighborhood's polygon with a color, whose intensity is defined by the colormap. We can see a choropleth map by service (uberX or uberSelect) and by origin (ECT or IMD).

In [15]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=12,
    tiles='Stamen Terrain'
)

# uberX layer
# ECT
folium.GeoJson(
    geo_json_natal,
    name='UberX Price Estimates - ECT',
    style_function=lambda feature: {
        'fillColor': colormapUberX(uber_prices['uberX']['ECT'][feature['properties']['name']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.9,
        'name': feature['properties']['name']
    }
).add_to(m)

# IMD
folium.GeoJson(
    geo_json_natal,
    name='UberX Price Estimates - IMD',
    style_function=lambda feature: {
        'fillColor': colormapUberX(uber_prices['uberX']['IMD'][feature['properties']['name']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.9,
        'name': feature['properties']['name']
    }
).add_to(m)

colormapUberX.caption = 'UberX Price Estimates'
colormapUberX.add_to(m)


# uberSELECT layer
# ECT
folium.GeoJson(
    geo_json_natal,
    name='UberSELECT Price Estimates - ECT',
    style_function=lambda feature: {
        'fillColor': colormapUberSelect(uber_prices['UberSELECT']['ECT'][feature['properties']['name']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.9
    }
).add_to(m)

#IMD
folium.GeoJson(
    geo_json_natal,
    name='UberSELECT Price Estimates - IMD',
    style_function=lambda feature: {
        'fillColor': colormapUberSelect(uber_prices['UberSELECT']['IMD'][feature['properties']['name']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.9
    }
).add_to(m)

colormapUberSelect.caption = 'UberSELECT Price Estimates'
colormapUberSelect.add_to(m)

folium.LayerControl().add_to(m)

m

Then, you can get a choropleth map using the Folium `choropleth` method too:

In [20]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=12,
    tiles='Stamen Terrain'
)

# create a threshold of legend
threshold = np.linspace(uber_prices[['uberX', 'UberSELECT']].values.min(),
                              uber_prices[['uberX', 'UberSELECT']].values.max(), 5, dtype=int).tolist()


m.choropleth(
    geo_data=geo_json_natal,
    name='UberX',
    data=uber_data[uber_data['display_name'] == 'uberX'],
    columns=['destination', 'price_mean'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    legend_name='Price Estimates UberX',
    highlight=True,
    fill_opacity=0.9,
    threshold_scale = threshold
)

m.choropleth(
    geo_data=geo_json_natal,
    name='UberSELECT',
    data=uber_data[uber_data['display_name'] == 'UberSELECT'],
    columns=['destination', 'price_mean'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    legend_name='Price Estimates UberSELECT',
    highlight=True,
    fill_opacity=0.9,
    threshold_scale = threshold
)

folium.LayerControl().add_to(m)
m

The map above compares the prices between the uberX and uberSelect services.