Skip to content

A text search-engine over the Stanford CS276 document collection.

License

Notifications You must be signed in to change notification settings

juliendoutre/beagle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beagle

A text search-engine over the Stanford CS276 document collection.

Install

Run

pip3 install -e .

to install the package.

Usage

python3 -m beagle

Tests

pytest

Dataset

The collection can be downloaded here: http://web.stanford.edu/class/cs276/pa/pa1-data.zip.

This is a 170MBs corpus organized in 10 folders. Each file contains a web page tokenized contents.

Stop words

The english stop words list we use (saved in stop_words.json) comes from this post : https://gist.github.com/sebleier/554280.

Report

More details can be found in the project report.

About

A text search-engine over the Stanford CS276 document collection.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published