Skip to content

Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval - MultiDoc2dia 2022 Challenge

Notifications You must be signed in to change notification settings

Sharif-SLPL-NLP/Docalog-2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Docalog-2022

Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval - MultiDoc2dia 2022 Challenge.

This repository contains notebooks which are used during this challenge. They're showing our efforts to making the best predictions on document retriever part so they would be useful as a practice pattern.

IR_PLDA

This file contains our first efforts for making the document retriever which starts with using PLDA method. If you're looking for our last (best efforts) you can see the DR_TEIT.ipynb file

DR_TEIT

This file contains some tested methods for document retriever which you can see them in below table. We called our best model "Document Retrival with Title Embedding and IDF on Texts (DR.TEIT)". In this method we used two scoring measure and aggregate them by a convex combination (λ * Similiarity_{Title Embedding} + (1 - λ) * Similiarity_{TextIDF}).

We used LaBSE model for out embeddings. For computing title embedding similarities we used cosine similarity between query embeddings and each document's title embedding. For the second part we used character-level (2gram to 8gram). We also trained our TF-IDF transformation matrix on the Multidoc2dial2022 documnets.

Method @1 @5 @10 @50 @100 MRR (mean, var)
IDF - vanilla 13% 30% 39% 64% 83% (0.22, 0.11)
IDF - power-order 15% 31% 41% 65% 83% (0.23, 0.12)
IDF - power-order (softmax) 10.7% 23% 31% 57.6% 78% (0.18, 0.09)
IDF - self-attention 13.9% 29% 38% 62% 82% (0.22, 0.11)
DR. TEIT 61.6% 86% 91% 96% 98% (0.72, 0.13)

The main model which stack the document retriever and span predictor part is in this repository (CAiRE).

About

Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval - MultiDoc2dia 2022 Challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published