-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try out Python ETL scripts on Windows box in city #24
Comments
I can stop by and see if I can get this set up. Basically we'd just need to download/install: Cygwin http://www.cygwin.com/ Ugh, writing this all out makes me a sad panda. Maybe we will use Heroku. |
I’m not sure if you guys have used cygwin before but I personally did not have a good experience. I think it might be easier to setup putty and a DO instance for $5/mo. On March 3, 2014 at 10:24:14 AM, Dave Guarino (notifications@github.com) wrote: I can stop by and see if I can get this set up. Basically we'd just need to download/install: Cygwin http://www.cygwin.com/ Ugh, writing this all out makes me a sad panda. Maybe we will use Heroku. — |
Hey Dave! I can download and install Cygwin, python, and pip right now. We can experiment on if the script works and if there is anything else we still need to do to get this all up and running. I'm free at work all day today and all day tomorrow .... when works best for you to stop by? Thanks everybody!!! |
@daguar I can volunteer my personal server to do this. |
heroku++ |
(as it turns out, Heroku is a difficult platform to do Unixy things on, like |
@ted27 i think it honestly might be work than its worth because that isn't exactly heroku's use case. you could probably do it with a messaging queue and worker dyno but that's $30 bucks a month. i think using a digital ocean instance or someone's personal server (like mine!) is the best way of going forward. |
Agreed it's not really Heroku's use-case, but you can do for free with the job scheduler ( https://devcenter.heroku.com/articles/scheduler ); the dyno cost is simply the time it takes the job to run, so a nightly 5-minute task like this one will be way under the limit, and we could throw a simple Python single-page service with code that just displays the contents of the S3 bucket it's saving to. @ted27: Thanks for getting started with an attempt on Heroku! I tried deploying and got |
I had no idea heroku had a free job scheduler. Cooool. Thanks for the
|
Yeah, it's pretty badass. This is actually the exact use-case of Docker. But I'm a little more comfortable having it on a service we know could be there forever and be free, so I think futzing around with Heroku is the right call. |
PS, @ted27: one of the issues with Heroku is the ephemeral and non-writeable disk. You can write to /tmp however. This means that (a) the scripts can't write any files to the folder they're located in [which is how it's currently written], (b) any data written to /tmp will not be there after the script completes So the job I'd probably set up would be:
Alternately, the scripts could be modified to always work in /tmp, but I think having the default be just writing to the current folder is the more scripty and reasonable-to-expect way for it to work. |
Agreed @daguar about the general methodology. By combining various buildpacks I was able to get the wget and unzip to run. However, you have to compile everything yourself. And, openssl (required to wget files from an SSL server) is not compiled in by default. So, yeah, lots of manual tweaking to get it working. But the main benefit of Heroku (as I see it) is that we have shared ownership of a project - i.e. you can invite people to collaborate on the project with you. That way we don't rely on any single person's server or recurring attention for the data to populate! |
@ted27: Oh boy. So does your Heroku instance have it running now? (Just If you'll be around tonight we can hack on this and get it working. |
@migurski also pointed me to this, his notes on getting packaged binaries w/ Heroku: https://github.com/codeforamerica/heroku-buildpack-pygeo/blob/master/Build.md |
Heroku has |
Also, Python has baked-in support for zip files. It’s pretty easy to use so it’s possible you could skip compiling binaries altogether. |
@migurski -- Thanks; and, yeah, most of this is my laziness (I wrote these scripts super quickly, and I know actually all of it could be done in pure Python, even.) I used |
Quick attempt to get unzip built on Heroku:
…and
|
…and the result, which should Just Work™: http://dbox.teczno.com/unzip.gz |
Okay, @ted27 I've replaced wget with curl in my repo if you want to rebase. Will take a look at Python vs. unzip buildpack shortly. |
I think I can actually save us all from the Heroku+S3 steps and run this on Lauren's comp, which now has Vagrant + an Ubuntu VM! Documenting (incomplete) setup here: daguar/netfile-etl#2 |
We've got it working on Lauren's comp!!! Next steps for this are: Dave:
Lauren:
|
Lauren -- Adding this issue because I think a next step is to see if you can run my scripts that do the Netfile data ETL on your Windows box:
https://github.com/daguar/netfile-etl
An alternative is to run them on an external Unix-y server (like Heroku or elsewhere) and then set up a job to download them every day to a computer within the city.
The text was updated successfully, but these errors were encountered: