Divide scraping from saving #109

Closed
Vanuan opened this Issue Jul 10, 2014 · 5 comments

Projects

None yet

3 participants

@Vanuan
Contributor
Vanuan commented Jul 10, 2014

pupa is a great framework for scraping data. But I don't like the ever changing database it uses. Yesterday it used mongodb, today it uses postgis, tommorow it'll use something else. It would be great if one could export scraped data to CSV file (or provide some other portable destination source).

@paultag
Member
paultag commented Jul 11, 2014

That's what the JSON is for :)

Additionally, we have zero intent to change DBs again (at least as far as I
know)
On Jul 10, 2014 7:23 PM, "John Yani" notifications@github.com wrote:

pupa provides great framework for scraping data. But I don't like the ever
changing database it uses. Yesterday it used mongodb, today it uses
postgis, tommorow it'll use something else. It would be great if one could
export scraped data to CSV file (or provide some other portable destination
source).


Reply to this email directly or view it on GitHub
#109.

@jamesturk
Member

To elaborate further, we do already use JSON as an intermediary, you can
run w/ --scrape to avoid the DB import if you desire (but we don't commit
strongly to intermediate format stability at this time)

Sorry if using pupa has been frustrating as we make changes, but as Paul
said we aren't changing DBs again as we build towards a stable release.
There was never a truly stable release of a mongoDB pupa, it was still in
the proof of concept phase. pupa 0.4 is imminent and will put us on a path
towards stability. The DB format now is based on the OCD enhancement
proposal process and so changes will be intentionally slow/require
justifications so people can rely upon it.

On Thu, Jul 10, 2014 at 8:00 PM, Paul Tagliamonte notifications@github.com
wrote:

That's what the JSON is for :)

Additionally, we have zero intent to change DBs again (at least as far as
I
know)
On Jul 10, 2014 7:23 PM, "John Yani" notifications@github.com wrote:

pupa provides great framework for scraping data. But I don't like the ever
changing database it uses. Yesterday it used mongodb, today it uses
postgis, tommorow it'll use something else. It would be great if one could
export scraped data to CSV file (or provide some other portable
destination
source).


Reply to this email directly or view it on GitHub
#109.


Reply to this email directly or view it on GitHub
#109 (comment).

@Vanuan
Contributor
Vanuan commented Jul 11, 2014

Thanks, very informative!

@Vanuan Vanuan closed this Jul 11, 2014
@Vanuan
Contributor
Vanuan commented Jul 11, 2014

Still, having to install the whole Django thing, setup a database only to write text files is kind of overkill.

@paultag
Member
paultag commented Jul 11, 2014

@Vanuan I agree, which is why you shouldn't need a Django database to scrape! :)

Just --import :)

If it currently does, it's a bug - we have an open issue #87 about that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment