# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [None]:
# @title Notebook setup.
%cd ..
import random
from semantic_routing.benchmark import utils
from semantic_routing.benchmark.query_engines import labeled_query_engines
from semantic_routing.benchmark.query_engines import basic_query_engines

# Query Engines

In order to simulate ground-truth optimal routes in a programmatic fashion, every entry in the user query dataset is labeled with a ground-truth structured interpretation of their content.
Below, we provide examples of user text queries that can be found in, or generated by, the user query dataset.

These examples are sampled by calling `engine.sample_query(split, rng)` on an instantiated query engine `engine`.

In [2]:
# @title Examples of user queries from dataset.

poi_specs = utils.get_poi_specs()
rng = random.Random(0)

num_queries_to_generate = 5 # @param

engine = labeled_query_engines.HumanLabeledQueryEngine(poi_specs=poi_specs, splits=(0.95, 0, 0.05), seed=0)
engine.touring_prop = 0
print("Examples of waypoint routing queries, generated by the query engine...")
for _ in range(num_queries_to_generate):
  query_data, query_text = engine.sample_query(0, rng)
  print("\"{}\"".format(query_text.strip()))

A basic synthetic query engine, which composes user queries using natural language templates, is also available for use and debugging.

In [3]:
# @title Examples of user queries from basic engine.

poi_specs = utils.get_poi_specs(benchmark.REDUCED_POI_SPECS_PATH)
rng = random.Random(0)

num_queries_to_generate = 5 # @param

engine = basic_query_engines.POIBasedRoutingQueryEngine(poi_specs=poi_specs, splits=(0.95, 0, 0.05), seed=0)
print("Examples of waypoint routing queries, generated by the query engine...")
for _ in range(num_queries_to_generate):
  query_data, query_text = engine.sample_query(0, rng)
  print("\"{}\"".format(query_text.strip()))

## The Content of a Waypoint Routing Query

A waypoint routing query communicates one of two types of information:

1. A "need" which can be fulfilled by stopping by a point-of-interest. For example, gasoline is a need which can be fulfilled by stopping at a gas station along the route.

2. A "driving preference" which should be taken into account along with driving time when weighing the cost of a route. For example, a driving preference is disliking highways. Roughly 60% of queries have no road preferences and 20%/20% like and dislike highways respectively.

To programatically evaluate how well a route satisfies a user query, and hence generate ground-truth routes for our dataset, we implement the following criteria.
1. If the route does not reach the desired destination, it is immediately disqualified.
2. If the route does not stop by POIs that satisfy all of the user's needs, it is immediately disqualified.
3. The smaller the route's total travel time the better. If the user happens to prefer/disprefer driving on highways, we multiply the user's travel time on highways by 0.5/5. See `points_of_interest_and_driving_preferences_tutorial.ipynb` for details.

In [4]:
# @title Examples of waypoint routing queries from training and testing splits.

poi_specs = utils.get_poi_specs(benchmark.POI_SPECS_PATH)
rng = random.Random(0)
engine = labeled_query_engines.HumanLabeledQueryEngine(poi_specs=poi_specs, splits=(0.95, 0, 0.05), seed=0)
i = 0
while i < 5:
  query_data, query_text = engine.sample_query(0, rng)
  if "time_budget" not in query_data:
    print("From training split: \"{}\"".format(query_text.strip()))
    i += 1

i = 0
while i < 5:
  query_data, query_text = engine.sample_query(2, rng)
  if "time_budget" not in query_data:
    print("From testing split: \"{}\"".format(query_text.strip()))
    i += 1

In [5]:
# @title A sample of waypoint routing query statistics.

rng = random.Random(0)
road_preferences_count = {k: 0 for k in ("like highways", "dislike highways")}
poi_size_count = {k: 0 for k in range(8)}
for _ in range(100):
  query_data, query_text = engine.sample_query(0, rng)
  poi_size_count[len(query_data["pois"])] += 1
  if query_data["linear"]:
    road_preferences_count[query_data["linear"]] += 1
print("Training set road preference counts (of 100): ", road_preferences_count)
print("Training set POI request size counts (of 100): ", poi_size_count)
road_preferences_count = {k: 0 for k in ("like highways", "dislike highways")}
poi_size_count = {k: 0 for k in range(8)}
for _ in range(100):
  query_data, query_text = engine.sample_query(2, rng)
  poi_size_count[len(query_data["pois"])] += 1
  if query_data["linear"]:
    road_preferences_count[query_data["linear"]] += 1
print("Training set road preference counts (of 100): ", road_preferences_count)
print("Training set POI request size counts (of 100): ", poi_size_count)

## The Content of a Trip Planning Query

A trip planning query communicates one of two types of information:

1. A "desire" which can be fulfilled by stopping by a point-of-interest. For example, sampling local wines is a desire which can be entertained by visiting a winery.

2. A time budget: the total amount of time the user can spend on the road. This is not taking into account time spent at a venue, which is assumed to be at the user's discretion.

As with waypoint routing queries, the set of user queries that are found in our dataset's testing split are significantly more challenging than those found in the training split. In the training set, most queries specify at most one desire. In the test set, all queries specify at least two desires.

To programatically evaluate how well an itinerary satisfies a user query, and hence generate ground-truth routes for our dataset, we implement the following criteria.
1. If it is impossible for the user to complete the itinerary within  the time budget, it is immediately disqualified.
2. For every user desire that is satisfied by at least one POI in the itinerary, a reward of +1000 is awarded.

In [6]:
# @title Examples of trip planning queries from training and testing splits.

rng = random.Random(0)
engine = labeled_query_engines.HumanLabeledQueryEngine(poi_specs=poi_specs, splits=(0.95, 0, 0.05), seed=0)
engine.touring_prop = 1
i = 0
while i < 3:
  query_data, query_text = engine.sample_query(0, rng)
  if "time_budget" in query_data:
    print("From training split: \"{}\"".format(query_text.strip()))
    i += 1

i = 0
while i < 3:
  query_data, query_text = engine.sample_query(2, rng)
  if "time_budget" in query_data:
    print("From testing split: \"{}\"".format(query_text.strip()))
    i += 1