Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Screenshot / webshot online service #63

Closed
rufuspollock opened this issue Sep 12, 2013 · 16 comments
Closed

Screenshot / webshot online service #63

rufuspollock opened this issue Sep 12, 2013 · 16 comments

Comments

@rufuspollock
Copy link
Owner

I want a service for taking screenshots.

Research

What exists? Is there something open and free?

Existing libs we could use

brenden/node-webshot#25 (comment)

@rossjones
Copy link

I started something simple a while back ( https://github.com/rossjones/urlshotserver ) as a replacement for the ScraperWiki screenshot app (which uses PyQT and embedded webkit). It isn't complete, but I seem to recall it did actually work. Should only be a few more lines to get the callback working.

node-webshot is likely to be more complete.

@simong
Copy link

simong commented Sep 21, 2013

Having worked with node-webshot and node.js extensively in the past, I feel this is something I could take on and contribute back. Is Heroku the desired platform for okfn-related services or would something that could be ran in a stand-alone mode be preferred?

I'm doing this partly to support @okfn and partly as a fun way to learn new stacks/services so I wouldn't mind doing it on a totally new stack.

@rufuspollock
Copy link
Owner Author

@simong great to hear you could contribute here!

In terms of the setup our strong preference would be nodejs or python and for nodejs to use express framework (and for python flask).

Let's assume you go with nodejs for the moment (seems a natural fit here - its an IO heavy, async style app) then:

  • app should work on its own: in "standalone" mode (after that's how devs will test and use it)
  • in terms of deployment we're pretty flexible. We just use heroku a lot for these because its free and pretty simple to deploy. For Heroku deployment all you really do is add a Procfile (and use environment variables for essential config which is pretty natural) so you aren't changing the style of the app

If you are looking for existing nodejs apps which run on Heroku here are a couple people have built in labs:

@opsb
Copy link

opsb commented Sep 23, 2013

I happened to need something similar, I built this on top of node-webshot over the weekend, https://github.com/opsb/node-webshot-server . It includes imagemagick for resizing, heroku config and I use Amazon Cloudfront in front of it for caching control.

@rufuspollock
Copy link
Owner Author

@opsb would you be happy for us and @simong to reuse (could you also add a license to your code?)

@opsb
Copy link

opsb commented Sep 23, 2013

@grp sure, I've added a BSD license.

@rufuspollock
Copy link
Owner Author

@simong what do you think of taking this on and utilizing @opsb excellent work. We'd want to deploy at screenshot.okfnlabs.org

I can help get you access to necessary facilities including Heroku :-)

@simong
Copy link

simong commented Oct 23, 2013

Hey @rgrp @opsb ,

Apologies for responding so late, this just got buried under some other work.

I think I can built upon @opsb's work. I'd be happy to deploy the app for okfn on Heroku.

In case we want something more "robust" we could also do the following things:

  • Store screenshots in S3 for re-use / linking
  • Generate more sizes / allow for size specifications in the API call
  • Add in an external queue (RabbitMQ?) so we can put in more worker nodes in case the service gets overloaded

WDYT?

@rufuspollock
Copy link
Owner Author

@simong all sounds very good - suggest we start with simplest thing possible and progressively enhance.

@opsb
Copy link

opsb commented Oct 23, 2013

Hey @rgrp @simong

I use amazon cloudfront in front of the service which makes reuse nice and fast (it allows cache key to include query string). Was thinking of using a standard format in the query string to pass options through to webshot, something along the lines of:

?webshot[windowSize.width]=300&webshot[windowSize.height]=150

you could make a prettier version but this way the translation from query string to webshot options would be simple. Adding in a queue sounds like a good idea. I don't have any experience with integrating node.js with queues, is it possible to keep a request open while a queue work does the job or would you need to use a web hook for when the image was ready?

@simong
Copy link

simong commented Oct 23, 2013

Although I feel that we should be using POSTs for this kind of thing, it certainly makes testing/using from the url bar easier.
What about:

GET /webshot?url=http://www.google.com&width=300&height=150

OR

POST /webshot
url=http://www.google.com
width=300
height=150

I don't think it would be that much work to translate the query/form parameters into webshot parameters.

Adding a queue in node can be done with something like amqp. The downside of adding a queue, is that we'll need a way to transfer the image from the worker node to the user doing the request. This can open a bit of a can of worms as then you need:

  • file storage
  • the user will need to poll until his url(s) has(have) been parsed
  • some kind of session management
  • user management maybe? registration, etc ...

which moves away from the simple (but way less overhead) app that there is now.
None of the above is really hard, it's just extra work that needs to be done

@rgrp Deploying the app (as it is now) is just a matter of following @opsb's README.
I currently have it running @ http://node-webshot-server-simong.herokuapp.com/?url=google.com

@rufuspollock
Copy link
Owner Author

@simong it would be nice to have the app at webshot.okfnlabs.org. A CNAME has been set up from webshot.okfnlabs.org to your herokuapp so all you need to do is:

heroku domains:add webshot.okfnlabs.org

Could you also do (so other labs folks can have access as needed):

heroku sharing:add sysadmin@okfn.org

Caching etc

@simong @opsb I'm wondering about caching - what happens if a website changes in a week - i'd want to get the webshot from today not last week. If one wants to default for ease of use perhaps one some kind of refresh or latest flag to force a redo (e.g. ?refresh=1.

If one is storing the content into s3 (for permanence and caching) we might want to have some structure like: url/width/height/date. You could then just drop the date if you don't need it.

Feature idea - specify your own filename

Relating to caching stuff but somewhat different (and more of a feature) would be allowing users to specify a short name to save their screenshot at (a bit like bit.ly but for screenshots). E.g. you could do ?filename=... and then you could get that screenshot forever at:

webshot-service/f/{filename}.png

Queue

Let's keep it simple for the moment. If we get a lot of traffic we can start worrying about it but i think it should be fine for now.

@rufuspollock
Copy link
Owner Author

@simong thanks for adding the domain alias.

I wonder whether it would be worth making the base page of the site be a proper homepage with a short intro and instructions (perhaps with a form where you can post a url and an instructions about the "api" ...)

@rufuspollock
Copy link
Owner Author

@simong any thoughts re the above? Also where's your repo - if I or others would like to contribute it would be nice to know what repo to fork :-)

@rufuspollock
Copy link
Owner Author

OK, official repo from for use in reporting issues/suggestions with webshot.okfnlabs.org etc is now at https://github.com/okfn/webshot/issues

@rufuspollock
Copy link
Owner Author

FIXED. Marking as fixed - we now have a functional service (major props to @simong and especially @opsb) and we have a repo where we can raise specific issues - viz https://github.com/okfn/webshot/issues - feel it is now time to mark as FIXED. w00t!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants