GitHub - jiangts/magic-selector-baseline: magic-selector-baseline

index.js takes a dataset file and runs the baseline selector against each phrase in each website

tokenize each leaf element (and some whitelist/blacklist els) by their text and all attributes
compute string overlap. weight text nodes by 5 and attrs by 1.
can run porter stemmer, vary the weights, change white/blacklist etc.

converter.js converts output from index.js into a viewer-compatible format

selector.js holds the original, browser-based selector

preprocess.js takes the raw data files and outputs a list of documents from ALL websites by above criteria (leaves/whitelist/blacklist)

tfidf.js computes tfidf scores from all the documents output by preprocess.js. It outputs 2 files: one of the tfidf object, the other of each document's membership in each page, since queries will need to be scoped to a particular page...

query.js runs the tfidf baseline

converter2.js converts it to results...

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset-v3		dataset-v3
dataset		dataset
phrase-node-render-test2		phrase-node-render-test2
vimium		vimium
.gitignore		.gitignore
converter.js		converter.js
converter2.js		converter2.js
files.js		files.js
index.js		index.js
package.json		package.json
phrase-node-render-v2.tgz		phrase-node-render-v2.tgz
phrasenode.tgz		phrasenode.tgz
preprocess.js		preprocess.js
query.js		query.js
readme.md		readme.md
results.txt		results.txt
tfidf.js		tfidf.js
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

jiangts/magic-selector-baseline

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages