## Hotels Challenge I.

given a database of hotels, and a set of input coordinates, for each coordinate, find the hotel closest to it

a solution is represented by a directory

the directory must contain one [yaml](https://en.wikipedia.org/wiki/YAML) file:
`commands.yaml`

and in the file, on the top level, 4 keys can have values: `setup-env-command`, `etl-command`, `process-command`
and `cleanup-command`. 

- `setup-env-command` sets up the environment where the other commans can run. it can assume the presence of python3.7 and pip
- `etl-command` runs, when the data is already accessible by the solution, in this case in a `hotel_table.csv` in the root of the solution. the etl command can do whatever it wants with the data to prepare it for the process command
- when `process-command` runs, an additional `inputs.json` file is also present in the solution root. your task is to make this command write out the answers to the queries found in inputs into an `outputs.json` file in the root of the solution, as fast as possible. this is the only mandatory value
- `cleanup-command` runs after everything is done


solutions will be avaluated based on:
- scaling with size of input
- scaling with data size

there are 4 levels for evaluation:
- 10k hotels - 1, 2, 5, 10 queries
- 5 queries - 10k, 50k, 100k, 200k hotels
- 50k hotels - 1, 10, 100, 1000 queries
- 500k hotels - 1, 10, 100, 1000 queries

### install package for data downloading and evaluation

In [None]:
!pip install --upgrade git+https://github.com/endreMBorza/jkg_evaluators

In [None]:
from jkg_evaluators.challenges.data.hotels import get_hotel_data, dump_hotel_input
import shutil
import os

### download practice data

In [None]:
get_hotel_data()

### select one and move to notebook root

In [None]:
data_size_to_copy = 10000
shutil.copyfile(os.path.join("data", 
                             f"{data_size_to_copy}.csv"), 
                "data.csv")

### generate some inputs

In [None]:
dump_hotel_input(size=10, path="inputs.json")

## base solution ETL

In [None]:
%%time
import pandas as pd

data_file_path = "data.csv"

df = pd.read_csv(data_file_path)

df.loc[:, ['lon','lat','name']].to_csv('filtered.csv',index=None)

## base solution process

In [None]:
%%time
import pandas as pd
import numpy as np
import json

input_locations = json.load(open('inputs.json', 'r'))

df = pd.read_csv('filtered.csv')

answers = []

for place in input_locations:
    min_distance = np.inf
    closest_place = {}
    for idx,row in df.iterrows():
        distance = ((place['lon']-row['lon']) ** 2 + (place['lat']-row['lat']) ** 2) ** 0.5
        if distance < min_distance:
            min_distance = distance
            closest_place = row[['lon','lat','name']].to_dict()
    answers.append(closest_place.copy())

json.dump(answers,open('output.json','w'))