## Hotels Challenge II.

given a database of hotels, and a set of input coordinates, price ranges and star ratings, for each coordinate-pricerange-startrating combination, find the hotel closest to the coordinates with the given start rating and within the price range. if no hotel fits in the price range star rating combination, return the dict `{"missing": True}`

a solution is represented the same way as in [challenge one](../../challenge-1-hotels/draft_notebooks/challenge-draft-v0.ipynb), input sizes are the same as well, outputs now must include star-rating and price as well

### install package for data downloading and evaluation

In [1]:
from jkg_evaluators.challenges.data.hotels import get_hotel_data, dump_hotel_filter_input
import shutil
import os

### download practice data

In [5]:
get_hotel_data()

### select one and move to notebook root

In [2]:
data_size_to_copy = 10000
shutil.copyfile(os.path.join("data", 
                             f"{data_size_to_copy}.csv"), 
                "data.csv")

'data.csv'

### generate some inputs

In [3]:
dump_hotel_filter_input(size=10, path="inputs.json")

## base solution ETL

In [4]:
%%time
import pandas as pd

data_file_path = "data.csv"

df = pd.read_csv(data_file_path)

df.drop_duplicates().assign(
    price=lambda _df: _df["current-price"]
    .str[1:]
    .str.replace(",", "")
    .astype(float)
).loc[:, ["lon", "lat", "name", "stars", "price"]].to_pickle("filtered.pkl")


CPU times: user 56.8 ms, sys: 8.12 ms, total: 64.9 ms
Wall time: 64.9 ms


## base solution process

In [5]:
%%time
import pandas as pd
import numpy as np
import json

input_dicts = json.load(open("inputs.json", "r"))

df = pd.read_pickle("filtered.pkl")

min_distances = [np.inf] * len(input_dicts)

answers = [{"missing": True}] * len(input_dicts)

for idx, row in df.iterrows():

    for input_idx, input_dict in enumerate(input_dicts):
        if row["stars"] != input_dict["stars"]:
            continue
        if (row["price"] > input_dict["max_price"]) or (
            row["price"] < input_dict["min_price"]
        ):
            continue
        distance = (
            (input_dict["lon"] - row["lon"]) ** 2
            + (input_dict["lat"] - row["lat"]) ** 2
        ) ** 0.5
        if distance < min_distances[input_idx]:
            min_distances[input_idx] = distance
            answers[input_idx] = row[["lon", "lat", "name", "stars", "price"]].to_dict()

json.dump(answers, open("outputs.json", "w"))


CPU times: user 1.58 s, sys: 9.88 ms, total: 1.59 s
Wall time: 1.59 s
