IRWeb

This project is a search engine for ics.uci.edu. It is based on a Crawler first step and then access to indexed data.

The main characteristics are:

Crawler4j library is used for crawling the ics.uci.edu domain.
Google Gson library is used for comparing the same query results with NDCG metrics.
Berkeley DB is used for persistence (crawled data and indexes).
PageRank algorithm is used to rank the results.
A web interface is provided. A simple Servlet operates the search requests.

Authors: Joel Fuentes & Han Ke.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.settings		.settings
WebContent		WebContent
build/classes/engine		build/classes/engine
data		data
src/engine		src/engine
.classpath		.classpath
.project		.project
20MostCommonTwoGrams.txt		20MostCommonTwoGrams.txt
20MostCommonTwoGrams.txt.rej		20MostCommonTwoGrams.txt.rej
500MostCommonWords.txt		500MostCommonWords.txt
LICENSE		LICENSE
README.md		README.md
Subdomains.txt		Subdomains.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.settings

.settings

WebContent

WebContent

build/classes/engine

build/classes/engine

data

data

src/engine

src/engine

.classpath

.classpath

.project

.project

20MostCommonTwoGrams.txt

20MostCommonTwoGrams.txt

20MostCommonTwoGrams.txt.rej

20MostCommonTwoGrams.txt.rej

500MostCommonWords.txt

500MostCommonWords.txt

LICENSE

LICENSE

README.md

README.md

Subdomains.txt

Subdomains.txt

Repository files navigation

IRWeb

About

Releases

Packages

Languages

License

jfuentes/IRWeb

Folders and files

Latest commit

History

Repository files navigation

IRWeb

About

Resources

License

Stars

Watchers

Forks

Languages