-
Notifications
You must be signed in to change notification settings - Fork 1
Terrier 4.0 mod.
License
sauparna/Terrier
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Terrier Mod Sauparna Palchowdhury sauparna.palc [at] gmail [dot] com Terrier-4.0's README has been copied over to README-Terrier-4.0.txt. Any rights, responsibilities and credits stem from the contents of that file. ---------------------------------------------------------------------- DESCRIPTION This is Terrier-4.0 with some additions and modification for doing IR experiments using TREC data. The purpose of distributing this piece of software is to augment Terrier with better documentation. See NOTES.txt. To run the commands described below you will need the sample TREC data: http://sauparna.sdf.org/Search/.files/ap.tgz ---------------------------------------------------------------------- COMPILING Type "ant" in the shell. ---------------------------------------------------------------------- INDEXING bin/trec_terrier.sh -i \ -Dcollection.spec=filelist.txt \ -Dterrier.index.path=ap/AP \ -Dstopwords.filename=ap/ser17.txt \ -Dtermpipelines=Stop,SStemmer \ -DTrecDocTags.doctag=DOC \ -DTrecDocTags.idtag=DOCNO \ -DTrecDocTags.process= \ -DTrecDocTags.skip= \ -DTrecDocTags.casesensitive=false filelist.txt - A file containing a list of paths pointing to files of the corpus. This can be generated by typing this in the shell: find -L corpus/* -type f >file.txt ap/AP - This is a directory. In the sample test-collection ap.txt is the only file in the corpus and it has been placed inside a directory named 'AP' because the script expects a path to a directory to look for a corpus in. ---------------------------------------------------------------------- RETRIEVAL bin/trec_terrier.sh -r \ -q \ -c i \ -Dterrier.index.path=ap/AP \ -Dtrec.topics=ap/query.txt \ -DTrecQueryTags.doctag=TOP \ -DTrecQueryTags.idtag=NUM \ -DTrecQueryTags.process=TOP,NUM,DESC \ -DTrecQueryTags.skip=TITLE,NARR \ -DTrecQueryTags.casesensitive=false \ -Dstopwords.filename=ap/ser17.txt \ -Dtermpipelines=Stop,SStemmer \ -Dtrec.model=TF_IDF \ -Dquerying.postprocesses.controls=qe:QueryExpansion \ -Dquerying.postprocesses.order=QueryExpansion \ -Dtrec.qe.model=org.terrier.matching.models.queryexpansion.Bo1 \ -Dexpansion.terms=10 \ -Dexpansion.documents=3 \ -Dtrec.results=./runs \ -Dtrec.results.file=run.txt The trec.results parameter is pointed to a directory named 'runs'. run.txt has the retrieval output in TREC format.