Skip to content

oomagnitude/duke-of-url

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

duke-of-url

Predicts next URLs from browsing history using NuPIC.

Prerequisites

texttable py module

texttable home

Running

  1. Extract the dataset into a file under this repo called data/raw.csv, as described below
  2. Sanitize the data by runningpython py/sanitize.py
  3. If your dataset is large, truncate the file to speed up swarming cat sanitized.csv | head -1500 > swarm.csv
  4. If you did the step above, then change the description.py file to point to swarm.csv
  5. Run a swarm over the dataset $NUPIC/bin/run_swarm.py --overwrite permutations.py
  6. Update description.py to point back to sanitized.csv instead of swarm.csv
  7. Train the model by running python py/train.py
  8. Run the interactive shell by running python py/url_predictor.py

Dataset

Chrome on Mac

Export chrome history into pipe-delimited data file called raw.csv

/usr/bin/sqlite3 ~/Library/Application\ Support/Google/Chrome/Default/History > data/raw.csv <<EOF
SELECT replace(urls.url, '|', 'b'), urls.visit_count, urls.typed_count, datetime((urls.last_visit_time/1000000)-11644473600, 'unixepoch', 'localtime'), urls.hidden, datetime((visits.visit_time/1000000)-11644473600, 'unixepoch', 'localtime') as visittime, visits.from_visit, visits.transition
FROM urls, visits
WHERE urls.id = visits.url
order by visittime asc;
EOF

If you're curious what's in the URL table, try this.

/usr/bin/sqlite3 ~/Library/Application\ Support/Google/Chrome/Default/History
> PRAGMA table_info(urls);

TLDs

Original source: https://mxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat?raw=1

About

Predicting next URLs from browsing history using NuPIC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages