WSM_indri

This project aims to customize Laplace smoothing and compare the performance of different ways of searching methods.

Key contribution of this experiment

Tuning the below mentioned ranking functions and implementing laplace smoothing function since the original code didn't provide this function.
Also, I wrote a C++ function to modify TREC queries' topic and description to compare the ranking results.
Indexing methods are modified as well.
Provide shell script to run automatically since queries searching may take time.

Introduction of indri

This experiment runs the set of queries against the WT2g collection, returns a ranked list of documents (the top 1000) using various ranking functions in a particular format, and evaluates the ranked lists.
The implemented ranking functions include:

Vector space model, terms weighted by Okapi TF (see note) times an IDF value, and inner product similarity between vectors
Language modeling, maximum likelihood estimates with Laplace smoothing only, query likelihood
Language modeling, Jelinek-Mercer smoothing using the corpus, 0.8 of the weight attached to the background probability, query likelihood

Implement Laplace smoothing in codebase

Put the modified LaplaceTermScoreFunction.hpp in include/indridirectory.

In src/TermScoreFactory.cpp

add #include "indri/LaplaceTermScoreFunction.hpp" at the beginning
add the following code in line 61:

else if( method == "laplace" || method == "add_one" || method == "l" ) {
  double alpha = spec.get( "alpha", 1.0 );
  return new indri::query::LaplaceTermScoreFunction( spec.get("index_path", ""), alpha );
}

User Guide

Environment

This experiment uses Ubuntu 20.04

Download Indri-5.18

Link to download: https://sourceforge.net/p/lemur/wiki/Home/

Download Datasets

a set of 50 TREC queries for the corpus, with the standard TREC format having topic title, description and narrative. Documents from the corpus have been judged with respect to their relevance to these queries by NIST assessors. Queries must be downloaded before proceed in the following operation.

File tree

Run

./query_build.sh

run this script to build queries from TREC queries.

./run.sh

The above line in terminal builds a pipeline to make the downloaded indri c++ code, build index, run query, and evaluate the return searching results. Note that queries must be extracted in advance before running the queries.

Ranking results

See WSM_indri.pdf for detailed comparison.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
index_param		index_param
query_param		query_param
LaplaceTermScoreFunction.hpp		LaplaceTermScoreFunction.hpp
README.md		README.md
WSM_indri.pdf		WSM_indri.pdf
query.txt		query.txt
query_build.sh		query_build.sh
query_des.txt		query_des.txt
read_query.cpp		read_query.cpp
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WSM_indri

Key contribution of this experiment

Introduction of indri

Implement Laplace smoothing in codebase

User Guide

Environment

Download Indri-5.18

Download Datasets

File tree

Run

Ranking results

About

Releases

Packages

Languages

sandy273040/WSM_indri

Folders and files

Latest commit

History

Repository files navigation

WSM_indri

Key contribution of this experiment

Introduction of indri

Implement Laplace smoothing in codebase

User Guide

Environment

Download Indri-5.18

Download Datasets

File tree

Run

Ranking results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages