Corpus Search Engine

A Java implementation of a Lucene-based search engine. The corpus for this engine is a collection of news articles aggregated from 4 different sources.

Financial Times Limited (1991, 1992, 1993, 1994)
Federal Register (1994)
Foreign Broadcast Information Service (1996)
Los Angeles Times (1989, 1990)

Building the project

$ git clone https://github.com/httpdaniel/CorpusSE.git

$ cd CorpusSE

$ mvn package

Creating the index

$ java -cp target/CorpusSE-1.0-SNAPSHOT.jar CreateIndex

Querying the index

$ java -cp target/CorpusSE-1.0-SNAPSHOT.jar CorpusSearch

The results will be outputted to a file "Results.txt" in the corpus folder

Evaluating the results

$ cd corpus
$ ./trec_eval-9.0.7/trec_eval qrels.assignment2.part1 Results.txt

To display only Mean Average Precision & Recall

$ ./trec_eval-9.0.7/trec_eval -m map -m recall qrels.assignment2.part1 Results.txt

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
corpus		corpus
src/main/java		src/main/java
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Corpus Search Engine

Building the project

Creating the index

Querying the index

Evaluating the results

To display only Mean Average Precision & Recall

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Corpus Search Engine

Building the project

Creating the index

Querying the index

Evaluating the results

To display only Mean Average Precision & Recall

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages