Skip to content

mehranjeelani/Information-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snlp-final-project

Final Project for Statistical Natural Language Processing Course

Tasks

  1. Baseline Document Retreival Model
    • Extract text from corpus
    • Preprocess the texts from corpus and apply tokenisation
    • Compute idf
    • Comput tf
    • Give list of query terms as product of term's idf and tf-value
    • Relavance based on cosine similarity
    • Sort similarity scores and output top 50 most relevant documents
    • Function to evaluate performance of document using precision at r with r = 50
    • Test on test_questions.txt
  2. Advanced Document Retriever with Re-Ranking
    • Use the baseline model and return the top 1000 documents
    • Re-rank the top 1000 documents with a more advanced approach
  3. Sentence Ranker
    • Split the top 50 documents into sentences (sent_tokenize)
    • Treat the sentences likedocuments to rank them and return the top 50 sentences (same approach as above)
    • Evaluate performance using Mean Reciprocal Rank

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages