# Create queries for simple flask api with pandas

Goal is to replace the call to the Firestore Database with a fetch from local disk.

I.e. we will save a csv file on the app engine machine.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import pandas as pd
import sys
sys.path.append('../../')

In [None]:
api_data_dir = '../../api/data/'

file_name = 'wikivoyage_destinations.csv'

### Read data

Read data that has been prepared for the frontend by `feature_engineering.py`


In [None]:
df = pd.read_csv(api_data_dir + file_name).set_index("id", drop=False)

## Row lookup

Read with index column on `pageid`, then use `.loc`

In [None]:
df.loc[146019].to_dict()

To fetch random:

In [None]:
df.sample(1).iloc[0].to_dict()

## Build queries

### based on geo

ne_lat, ne_lng, sw_lat, and sw_lng will be given:

`&ne_lat=43.97363914475397&ne_lng=5.173845810128569&sw_lat=38.69043481932856&sw_lng=-0.5720037992464313`

In [None]:
ne_lat, ne_lng, sw_lat, sw_lng = 43.9, 5.17, 38.7, -0.57

Apply filter:

In [None]:
from api.resources.utils.selection import filter_on_geolocation

In [None]:
(
    df
    .pipe(filter_on_geolocation, ne_lat, ne_lng, sw_lat, sw_lng)
).head()

### Sampling`

What to do if the area is too small? Need to handle an error as there is no data available!

In [None]:
ne_lat, ne_lng, sw_lat, sw_lng = 48.9, 2.47, 48.82, 2.22

There are two cases: 

1. Really nothing can be found
2. Only a small number can be found, for example less then the 10 requested by `sample(10)`

Rather than sampling, let's try a technique where we sort the dataframe at random, and then pick the top x observations. This way, we can also work with offsets if we preserve the ordering. Preserve the ordering by setting a random seed.

In [None]:
(
    df
    .pipe(filter_on_geolocation, ne_lat, ne_lng, sw_lat, sw_lng)
).sample(frac=1, random_state=1234)

### Error handling

Now, what is returned in case no records are found?

In [None]:
ne_lat, ne_lng, sw_lat, sw_lng = 48.8, 2.2, 48.82, 2.22

try:
    (
        df
        .pipe(filter_on_geolocation, ne_lat, ne_lng, sw_lat, sw_lng)
    ).sample(frac=1, random_state=1234)
except ValueError:
    print("Oops, ValueError! Must have at least one record. Return empty list?")

### Offsets

Select subset of results when working with an offset.

In [None]:
n = 10
offset = 0
n_results = 3
subset = df.sample(frac=1, random_state=1234).head(n)

In [None]:
subset

In [None]:
subset.iloc[offset:offset+n_results]

Works.

### Weighted sampling

In order to get some randomness, but still sample more important destinations first, use weights created in one of the feature engineering notebooks.

In [None]:
(
    df
    .sample(frac=1, random_state=1234, weights='weight')
    .head(3)
)

In [None]:
df['weight'].value_counts()

### Convert to json

Provide the places, as well as some metadata.

In [None]:
from api.resources.utils.utils import prettify_n_results

In [None]:
# example of prettifying an X number of results for the front-end
prettify_n_results(3500)

In [None]:
subset = df.sample(2).to_dict(orient='records')

{
    "Results": len(subset),
    "Results_string": prettify_n_results(len(subset)),
    "Destinations": subset
}

Done.