Decoding a Neural Retriever's Latent Space for Query Suggestion

Leonard Adolphs, Michelle Chen Huebscher, Christian Buck, Sertan Girgin, Olivier Bachem, Massimiliano Ciaramita, Thomas Hofmann

Traversal Data

You can download and decompress the traversal data together with the associated msmarco corpus by running the provided download script:

bash download.sh

To merge the traversal data with the paragraphs from the MSMarco dataset, we provide a merge script that you can run as

python merge_data.py

The merged jsonl data has the following structure

[
    {"original": {
        "query": "the original query",
        "documents": [{
            "doc_id": "123",
            "score": 99, # the retrieval score
            "content": {"text": "The paragraph's text"}
            }, {...}, ... ]
    }, "variants": [ 
         # the list of reformulations with their search results
        {"query": "the first reformulated query",
        "documents": [{
            "doc_id": "124",
            "score": 100, # the retrieval score
            "content": {"text": "Another paragraph's text"}
            }, ... ]}, {...}, ...
        ]
    }, {...}, ...
]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
download.sh		download.sh
merge_data.py		merge_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

download.sh

download.sh

merge_data.py

merge_data.py

Repository files navigation

Decoding a Neural Retriever's Latent Space for Query Suggestion

Traversal Data

About

Releases

Packages

Languages

leox1v/query_decoder

Folders and files

Latest commit

History

Repository files navigation

Decoding a Neural Retriever's Latent Space for Query Suggestion

Traversal Data

About

Resources

Stars

Watchers

Forks

Languages