Web Search Engines

Web search engines project started in Fall 2016 at New York University with multiple rankers, evaluators, Indexer, Query parser and more to come...

TODO:

Rankers are busted as DocumentFullScan.java, IndexFullScan.java, Query.java representation has changed in HW2.
Removed RankerCosine, RankerLinear, RankerNumViews, RankerPhrase. Put them back after refactoring with the indexing code ni Hw2 if time permits.

Getting Started

These instructions will get you up and running with respect to compiling the code, generating the index file and starting the search engine. Then you can issue queries to it via various input channels and get your output as HTTP or in a results file. You can also run search engine evalutors on it.

Prerequisities

You would need Git, Java Development Kit(JDK) installed. If you are using Windows you would also need to install cURL if you want to issue queries/generate results via termainal/command line(You can issue queries/generate results via browser as well).

Configuring

Open terminal and run command from the directory in which you want to clone this repo.

git clone https://github.com/praneethy91/websearchenginesnyu.git

In the terminal move into the root directory of the repo. Run all subsequent commands from this directory.
Run command to compile all source files. We are using jsoup to generate html output.

javac -cp "external/" src/edu/nyu/cs/cs2580/*.java

Generate the index file.

java -cp src edu.nyu.cs.cs2580.SearchEngine --mode=index --options=conf/engine.conf

Serve the search engine.

java -cp src -Xmx512m edu.nyu.cs.cs2580.SearchEngine --mode=serve --port=25802 --options=conf/engine.conf

Generating Results File/Running Queries/Evaluation

Open a new Terminal/Command prompt and move into the root of the Git repository as before.
Generate results for a particular ranker as below. Puts the results in a tsv file depending on the ranker in the [githubreporoot]/results directory.

curl "http://localhost:25802/search?queryfile=data%2Fqueries.tsv&ranker=<rankerType>&format=text&output=file"

If you don't want to generate results file from a query file for a ranker, and just want to check the top 10 results for a ranker for a particular query, you can do as follows:

curl "http://localhost:25802/search?query=<yourQuery>&ranker=<rankerType>&format=text"

Ranker Types

Put in one of these possible values for the <rankerType>

cosine
ql
phrase
numviews
linear

Input Types

You can give direct query input with the following CGI argument

query=your%20query

Or you can give a .tsv file containing the query with query words seperated by tabs and each query seperated by a line

queryfile=data%2fqueries.tsv

Output Types

Two output format types are supported. Default is http if not provided.

http - (returns an http response)
file - (writes to a results file in the [githubreporoot]/results directory)

For example, you input as the following cgi-argument:

output=http

Output Format

Two output format types are supported. Default is text if not provided. Remember the html output will only display results properly if the search is run from a browser instead of cUrl.

html
text

For example, you input as the following cgi-argument:

format=text

Number of ranked documents

You can pass in a value which determined how many ranked documents are returned. Default is 10.

For example, you input as the following cgi-argument to display top 30 results.

num=30

For example, you input as the following cgi-argument to display all documents in corpus in ranking order.

num=all

Evaluation

To evaluate rankers you need to generate rankers outputs to files in results folder. To acheive that you need to run following command for each ranker type you want to evaluate.

curl "http://localhost:25802/search?queryfile=data%2Fqueries.tsv&ranker=<rankerType>&format=text&output=file"

Once results are generated in results folder. You can run evalutation using following command.

java -cp src -Xmx512m edu.nyu.cs.cs2580.Evaluator data/labels.tsv

Authors

Praneeth Yenugutala - Profile
Sanketh Purwar
Mansi Virani

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgements

Cong Yu
Fernando Diaz

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
conf		conf
external		external
src/edu/nyu/cs/cs2580		src/edu/nyu/cs/cs2580
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Search Engines

TODO:

Getting Started

Prerequisities

Configuring

Generating Results File/Running Queries/Evaluation

Ranker Types

Input Types

Output Types

Output Format

Number of ranked documents

Evaluation

Authors

License

Acknowledgements

About

Releases 3

Packages

Contributors 3

Languages

License

praneethy91/websearchenginesnyu

Folders and files

Latest commit

History

Repository files navigation

Web Search Engines

TODO:

Getting Started

Prerequisities

Configuring

Generating Results File/Running Queries/Evaluation

Ranker Types

Input Types

Output Types

Output Format

Number of ranked documents

Evaluation

Authors

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages