Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring web page changes and keeping track of changes. #4

Open
jmatsushita opened this issue Aug 20, 2015 · 6 comments
Open

Monitoring web page changes and keeping track of changes. #4

jmatsushita opened this issue Aug 20, 2015 · 6 comments

Comments

@jmatsushita
Copy link
Member

There would be quite a number of pages apart from terms of services that would need to be monitored for changes. I'm wondering how Tosback2 is doing it? @pde @pierreozoux @hugoroy

Example of such pages would be:

  • Release Notes or Change logs
  • Documentation (of various kind, for users, developers,...)
  • Build Documentation
  • Security Documentation (Threat model)
  • Community guidelines
  • UX Specs
@pierreozoux
Copy link

I'm quiet new to Tosback2 so I can't comment here.

@jmatsushita
Copy link
Member Author

Maybe @Vinnl knows?

@Vinnl
Copy link

Vinnl commented Aug 27, 2015

I wasn't involved with Tosback, but I looked into it a little bit. Actually, the README contains a pretty clear description (although the source code seems to have some old mess included): https://github.com/tosdr/tosback2

So basically, there's a configuration file for every website to crawl, where the location of the terms of service are defined, and optionally the DOM element that includes it. The software isn't that complicated, collecting the pages to monitor appears to be a manual process.

Is that what you want to know?

@jmatsushita
Copy link
Member Author

Yes it is exactly what I needed. The fog of confusion has been lifted and the README now makes sense! So on the server that's running this ruby script the output goes to a checked out version of the git repo and is regularly committed and pushed to github? Then the web frontend just piggy backs on the github diff view.

I see that @JimmStout wrote most of the code! Can I ask if there has been a discussion on these design choices? Were there great ideas about where it could go? Are there other things out there that do similar things and that could be reused? I do like how this is such a small code base, but I'm curious about the context to help me decide if I should adopt it and work on it or expand my search.

@jmatsushita
Copy link
Member Author

Seems like there's a workflow aspect (crawl_reviewed) that's not super clear. Maybe @hugoroy can shed some light from the user perspective?

@hugoroy
Copy link

hugoroy commented Aug 28, 2015

Sorry, this is a bit too old for me to remember. Last time I looked into the technical aspects was in 2013 (right before Snowden!)

You may find some useful information on http://jimmstout.com/ (Jimm's blog)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants