GitHub - shaheming/spring19-cs221-project: Project Template for CS221 Spring 2019

CS221 Project - Peterman Search Engine

Project 1

In this project, we implement:

Punctuation Tokenizer
PorterStemmer
WordBreak--English version (Use Dynamic programming)
WordBreakCKJ(class)

Chinese and Japanese both Version use the dictionary called dic_cn and dic_jp under resource directory have corresponding test case called WordBreakCJKTokenizerTest(6 testcases,3 for Chinese, 3 for Japanese)

Project 2

In this project, we implement:

Based on previous project(analyer), it tokenlize and stem the input document. We implement a disk-based index structure is based on the idea of LSM (Log-Structured Merge tree). We use the one file to store the words dictionary and the the document ids. Beside, we use multi-thread merging and searching to improve the performance.

write and read
merge
search(and and or)
delete

Project 3

In this project, we implement:

Based on the previous project, we add a poistional list for each element of inverted list. So we allow user to search with a specific order of key words. Also, we compressed the data based on delta encoding and variable-length encoding.

To run this example:

run mvn clean install -DskipTests in command line
open IntelliJ -> Open -> Choose the directory. Wait for IntelliJ to finish importing and building.
You can run the HelloWorld program under src/main/java/edu.uci.ics.cs221 package to test if everything works.

Project 4

Implement ranking use TF-IDF and page rank.

Name		Name	Last commit message	Last commit date
Latest commit History 802 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

CS221 Project - Peterman Search Engine

Project 1

Project 2

Project 3

To run this example:

Project 4

About

Releases

Packages

Languages

shaheming/spring19-cs221-project

Folders and files

Latest commit

History

Repository files navigation

CS221 Project - Peterman Search Engine

Project 1

Project 2

Project 3

To run this example:

Project 4

About

Resources

Stars

Watchers

Forks

Languages