Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive function causing a stack overflow #23

Closed
esagara opened this issue Nov 5, 2013 · 5 comments
Closed

Recursive function causing a stack overflow #23

esagara opened this issue Nov 5, 2013 · 5 comments

Comments

@esagara
Copy link
Contributor

esagara commented Nov 5, 2013

https://github.com/propublica/upton/blob/master/lib/upton.rb#L314-L326

Will cause a stack overflow with large paginations >2300 or so. Possible solution:

def get_instance(url, pagination_index=0, options={})
  resp = self.get_page(url, @debug, options)
  i = pagination_index.to_i
  while !resp.empty?
    next_url = self.next_instance_page_url(url, i += 1)
    next_resp = self.get_page(next_url, @debug, options)
    break if next_url == url
    resp += next_resp
  end
  resp
end
@jeremybmerrill
Copy link
Contributor

Makes total sense and at first glance I think your solution will work. Can
you send me a pull request?

On Tue, Nov 5, 2013 at 6:34 PM, Eric Sagara notifications@github.comwrote:

https://github.com/propublica/upton/blob/master/lib/upton.rb#L314-L326

Will cause a stack overflow with large paginations >2300 or so. Possible
solution:

def get_instance(url, pagination_index=0, options={})
resp = self.get_page(url, @debug, options)
i = pagination_index.to_i
while !resp.empty?
next_url = self.next_instance_page_url(url, i += 1)
next_resp = self.get_page(next_url, @debug, options)
break if next_url == url
resp += next_resp
end
resp
end


Reply to this email directly or view it on GitHubhttps://github.com//issues/23
.

@jeremybmerrill
Copy link
Contributor

Yo Eric, did your PR close this issue?

@esagara
Copy link
Contributor Author

esagara commented Dec 19, 2013

I think so, there was an issue somewhere in that where it would get caught
in a loop. I am having another problem though. The
sleep_time_between_requests does not seem to be working. Have you played
around with it at all? Perhaps I am missing something in the syntax.

Eric

On Wed, Dec 18, 2013 at 3:54 PM, Jeremy B. Merrill <notifications@github.com

wrote:

Yo Eric, did your PR #24 close
this issue?


Reply to this email directly or view it on GitHubhttps://github.com//issues/23#issuecomment-30879699
.

@esagara
Copy link
Contributor Author

esagara commented Dec 19, 2013

Is it possible that the below line is not evaluating to true? I can see
that both @verbose and @sleep_time_between_requests are being passed to the
scraper, but the sleep time is not being implemented from what I can tell.

https://github.com/propublica/upton/blob/master/lib/upton.rb#L223

Eric

On Wed, Dec 18, 2013 at 9:49 PM, Eric Sagara esagara@gmail.com wrote:

I think so, there was an issue somewhere in that where it would get caught
in a loop. I am having another problem though. The
sleep_time_between_requests does not seem to be working. Have you played
around with it at all? Perhaps I am missing something in the syntax.

Eric

On Wed, Dec 18, 2013 at 3:54 PM, Jeremy B. Merrill <
notifications@github.com> wrote:

Yo Eric, did your PR #24 close
this issue?


Reply to this email directly or view it on GitHubhttps://github.com//issues/23#issuecomment-30879699
.

@jeremybmerrill
Copy link
Contributor

I'm not sure exactly what's happening, but I noted it in #28.

Will look into it greater depth shortly. I'll write a test too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants