duke-of-url

Predicts next URLs from browsing history using NuPIC.

Prerequisites

texttable py module

texttable home

Running

Extract the dataset into a file under this repo called data/raw.csv, as described below
Sanitize the data by runningpython py/sanitize.py
If your dataset is large, truncate the file to speed up swarming cat sanitized.csv | head -1500 > swarm.csv
If you did the step above, then change the description.py file to point to swarm.csv
Run a swarm over the dataset $NUPIC/bin/run_swarm.py --overwrite permutations.py
Update description.py to point back to sanitized.csv instead of swarm.csv
Train the model by running python py/train.py
Run the interactive shell by running python py/url_predictor.py

Dataset

Chrome on Mac

Export chrome history into pipe-delimited data file called raw.csv

/usr/bin/sqlite3 ~/Library/Application\ Support/Google/Chrome/Default/History > data/raw.csv <<EOF
SELECT replace(urls.url, '|', 'b'), urls.visit_count, urls.typed_count, datetime((urls.last_visit_time/1000000)-11644473600, 'unixepoch', 'localtime'), urls.hidden, datetime((visits.visit_time/1000000)-11644473600, 'unixepoch', 'localtime') as visittime, visits.from_visit, visits.transition
FROM urls, visits
WHERE urls.id = visits.url
order by visittime asc;
EOF

If you're curious what's in the URL table, try this.

/usr/bin/sqlite3 ~/Library/Application\ Support/Google/Chrome/Default/History
> PRAGMA table_info(urls);

TLDs

Original source: https://mxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat?raw=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

duke-of-url

Prerequisites

Running

Dataset

Chrome on Mac

TLDs

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
py		py
README.md		README.md
description.py		description.py
permutations.py		permutations.py

oomagnitude/duke-of-url

Folders and files

Latest commit

History

Repository files navigation

duke-of-url

Prerequisites

Running

Dataset

Chrome on Mac

TLDs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages