Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

brainstorming limitations and features #13

Open
whilei opened this issue Jun 19, 2017 · 5 comments
Open

brainstorming limitations and features #13

whilei opened this issue Jun 19, 2017 · 5 comments

Comments

@whilei
Copy link

whilei commented Jun 19, 2017

which may or may not be existing/need refining/in the works...

  • randomized (but not TOO randomized) intervals... as below, general pattern mimicry would be ideal; exactly randomly between 1-10 seconds is not; humans are not just gravel, also rocks and boulders.
  • customizeable word lists
    • as crazy as it sounds, a chrome plugin to record actual searches and thereby use real starting data for mimicry might be effective (again, obfuscation vs privation)
  • variety of request types, ie POST, PATCH, DELETE... more tricky, but filtering vs GETS would be the first thing I'd do looking for real human logs
  • controlled variety of 'quest' depth. google+1click and then google something completely unrelated+1click is not convincing.

eg, my computer visiting 1000 random websites per day at 5 pages per minute is not going to be anywhere near convincing, given i visit a handful of sites in bursts normally (with that pattern already having been logged)


abstracted:

  • usage patterns that are not static randomness, but sporadic and clumpy, reasonably nonlinear
  • mimicry of actual/personalizeable trends in content

really abstracted:

  • better to make a handful of knitting needles than a busload of thumbtacks

I've said enough. Please close issue and destroy Github after reading.
🍺

@XayOn
Copy link

XayOn commented Jun 26, 2017

What about getting actual browsing data from volunteers to analyze multiple behaviours and decide what is the best option?

@NeuroWinter
Copy link

@XayOn I think that would be the best idea I will look into my browsing history today to see what I can see.

@t-mullen
Copy link

t-mullen commented Jul 5, 2017

To add to this, some of the modules have a possibility of generating traffic that could be harmful if not outright incriminating. Without some kind of "safe mode", users could be putting themselves at real risk.

The project could take advantage of services like Google "Safe Search" or MyWOT, but this would probably make real traffic easier to spot at the same time.

@eth0izzle
Copy link
Owner

eth0izzle commented Jul 6, 2017

@XayOn @NeuroWinter sounds great! Nirsoft have a free tool at http://www.nirsoft.net/utils/browsing_history_view.html to extract history - if you anonymise and share then we can start parsing them, finding patterns, etc. For Chrome this could be pretty helpful: https://chrome.google.com/webstore/detail/web-historian-web-history/chpcblajbmmlbhecpnnadmjmlbhkloji

@rationalcoding yes I agree. Would you mind creating an issue and taking ownership? Creating a list of English profanity words and cross-referencing with chosen words should do the trick for the majority of cases. Not sure how to tackle Alexas top 1M as it contains a lot of porn sites.

@NeuroWinter
Copy link

I have just got back from holiday and I am willing to work on this a bit now.

What sort of information are we looking for from a history dump?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants