Run the unit tests with:

> make fix-lint test

Your goal is to improve the classes under `ao3_disco_ranking/query/*.py`. The three things that would be nice, in order from most to least important, are:

1. add unit tests
2. add some basic docstrings. use google docstring format (https://google.github.io/styleguide/pyguide.html)
3. assert the input has reasonable values =
   - (i.e. if the user specifies tag_type = "asdfasdf" which isn't a real type of tag, raise an error)
   - (i.e. check that the workID field is valid - naively, just check whether it can be parsed as an integer)

The most important classes, in order, for adding documentation + test are:

 - TagsFilter
 - GraphRanker
 - EmbedddingRanker # mock the model
 - QueryHandler

Here are the "works" and "collections" objects for your reference. Use fake values in the tests:

In [9]:
import pickle
with open("../data/works_collections.pkl", "rb") as fin:
    works, collections = pickle.load(fin)
    print(len(works), len(collections))
    print(type(works))
    print(works.keys()[:2])
    print(works.values()[:2])
workIDs = list(works.keys())

268937 16815
<class 'dict'>


TypeError: 'dict_keys' object is not subscriptable

In [2]:
workIDs[0] # each work ID = the /works/<...> part of the URL

'10001435'

In [10]:
import json

# Each work is represented as a dictionary like this.
# The tags filter extracts the tags and makes it possible to filter on it.
# example: excluded_tags = [("fandom", "Harry Potter"), ...]
# it takes a list of tags (tag = type, value) to require / exclude. make sure the results are correct
print(json.dumps(works[workIDs[0]], indent=2))

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [5]:
# Each collection is a tuple of work IDs and indicates that a user has bookmarked all those works
# The graph ranker uses this information to select works
collections[0]

('43954231', '37117771', '34923421', '20497667', '18572542')