Provides a method to search for matches within a list of documents using TF-IDF.
There are two main use cases: finding which documents are the most similar within the list; finding which document is the most related to a search query.
Convert each document into a tuple where the first item is an identifier, and the second is a list of tokens. Tokenizer is not included, because you may want to write your own.
Example of document_list:
[{1, ~w(this is a a sample)},
{2, ~w(this example is another example)}]
Create an index using the function index_documents
.
index = list_documents |> TruffleHog.index_documents()
Use find_matches
to find the matches on the index.
matches = index |> TruffleHog.find_matches(["search", "items"], quantity)
If available in Hex, the package can be installed
by adding truffle_hog
to your list of dependencies in mix.exs
:
def deps do
[
{:truffle_hog, "~> 0.1.0"}
]
end
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/truffle_hog.