Documents Retrieval - Project 2

We are ask to implement several different retrieval methods.

Some of these retrieval methods will be the implementation of the basic retrieval models studied in the class (e.g. TF-IDF, BM25, Language Models with different Smoothing).

Various tools are build on top of Lemur Project toolkits, includes search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining.

Requirements

TODO: add requirements for this project.

while installing pyserini, it might fail to install nmslib.

here's a work around to install nmslib on python3.11 environment.

Directory and Files

(assuming you have these files)

Document Corpus
- WT2g/: a collection contains Web documents, with being a 2GB corpus. Will use the corpus to test the retrieval algorithms, and run experiments.
Queries
- topics.401-450.txt: a set of 50 TREC queries for the corpus, with the standard TREC format having topic title, description and narrative. Documents from the corpus have been judged with respect to their relevance to these queries by NIST assessors.
- qrels.trec8.small_web
- qrels.401-450.txt

Evaluation

Evaluation tools:

trec_eval.pl - provides a number of statistics about how well the retrieval function corresponding to the results_file did on the corresponding queries.
ireval.jar

for using trec_eval.pl, you can run the following command:

perl trec_eval.pl -[q] qrel_file results_file

to reproduce the results from pyterrier, run the following command:

make pyterrier

the results will be saved to pyterrier_results.csv

Works

We need to run the set of queries against the WT2g collection, return a ranked list of documents (the top 1000) in a particular format, and then evaluate the ranked lists. see WSM Project 2.pdf for project report.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
proj2_sample_run		proj2_sample_run
results		results
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
WSM Project 2.pdf		WSM Project 2.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documents Retrieval - Project 2

Requirements

Directory and Files

Evaluation

Works

About

Releases

Packages

Languages

hzionn/Documents-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Documents Retrieval - Project 2

Requirements

Directory and Files

Evaluation

Works

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages