Skip to content

Latest commit

 

History

History
59 lines (38 loc) · 2.3 KB

README.md

File metadata and controls

59 lines (38 loc) · 2.3 KB

Information Retrieval Engine

Requirements

Execution

java -jar IR-maven.jar <path corpus folder > <path stoptword file> <Max memory>

--

Task 1

Modelling: classes and main methods definition. a) Keep in mind modularity and flexibility. b) Describe your classes, main methods, and data flow in the report.

Task 2

Implement a simple corpus reader, tokenizer, and Boolean indexer. a) Develop your own tokenizer from scratch. Integrate the Porter stemmer (http://snowball.tartarus.org/) and a stopword filter in your code. b) Index a small corpus (to be defined later) and submit a text file with the resulting index, following the scheme: term,document frequency,list of documents

Task 3

Implement an indexer based on the vector-space model, using the tf-idf weighting scheme and lnc.ltc strategy, as described in the slides. a) Write your index to disk so that the searcher module can efficiently load it. b) Index the corpus (to defined later on).

Task 4

Implement a ranked retrieval method. a) Load the index from disk.

Authors