Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

AWS::Files::each not called for Enumerable methods #596

Closed
kranzky opened this Issue Nov 9, 2011 · 26 comments

Comments

Projects
None yet
6 participants

kranzky commented Nov 9, 2011

I have a bucket that contains 5000 files. Here's the problem:

connection = Fog::Storage.new(:provider => 'AWS', ...)
directory = connection.directories.get('mybucket')
files = []
directory.files.each { |f| files << f.key }
puts files.count
files = directory.files.map { |f| f.key }
puts files.count

This outputs:

5000
1000

This causes AssetSync, which calls AWS::Files::map, to fail when there are more than 1000 files in a bucket.

Owner

geemus commented Nov 9, 2011

I would avoid using map, and instead do the first version. The idea is to avoid loading everything into memory at once, which each does by doing one page at a time. Map sort of necessarily loads more at once. Granted this should be better documented, but I'm not sure that trying to get all of the enumerable methods to paginate is the correct path.

kranzky commented Nov 9, 2011

That's fine; I was just tracing a bug in AssetSync. I don't know much about how Fog is implemented, but the presence of an each method implies (to me, and obviously to the author of AssetSync) that Enumerable methods, all of which build on top of each, are fair game.

Owner

geemus commented Nov 9, 2011

I agree that it isn't great. I guess we could try to implement all enumerable methods (or stop implementing each), but both of those has pretty obvious downsides. Maybe it just needs to be better documented.

kranzky commented Nov 10, 2011

Wouldn't including Enumerable magically implement all the methods?

Owner

geemus commented Nov 10, 2011

Unfortunately not. It used to be the case that if you referenced files it would actually load everything, but it was too terrible for performance with larger buckets. So instead we changed it to just load the first page and then added this special version of each that would iterate over it one page at a time. So simply including them wouldn't cut it. They would need to be implemented individually in a manner probably pretty similar to the way that each is now. That said, the way each is now already seems kind of hairy (not that it doesn't work, I just find it a bit hard to follow). So I think the only way would be reimplement things, since loading all pages automatically in to memory is kind of off the table (I think).

kranzky commented Nov 10, 2011

Ah, including Enumerable doesn't work because Fog::Collection derives from Array. I'm pretty convinced that the paginated each would work with the Enumerable methods otherwise. And totally agree that you want the pagination; that's awesome.

kranzky commented Nov 11, 2011

FWIW, here's an example of how this could work; note that this does not load all files into memory, but does allow you to iterate over all files:

class MakeEnumerable
  include Enumerable
  def initialize(files)
    @files = files
  end
  def each(&block)
    @files.each &block
  end
end

connection = Fog::Storage.new(:provider => 'AWS', ...)
directory = connection.directories.get('mybucket')
files = MakeEnumerable.new(directory.files).map { |f| f.key }
puts files.length

This outputs:

5000
Owner

geemus commented Nov 11, 2011

Interesting. That does seem relatively clean, but I think there is a weird edge case related to calling each without a block that we need to consider as well (see https://github.com/fog/fog/blob/master/lib/fog/aws/models/storage/files.rb#L46). I don't remember the context or what exactly it does. I guess maybe since you have it set to just pass the block through it might already do the right thing.

Owner

geemus commented Nov 11, 2011

I'd be happy to look at a pull request that fixed this up, as I'm not sure I'll get to it right away. Fixing it somehow would be good for sure, though I'm not sure how best to implement it.

Contributor

epdejager commented Jan 16, 2013

As an easy workaround for related Issue #584, one could do something like

all_files = @files.select { |f| f.key == 'whatever.txt' } #returns all files, bummer
files_a = @files.to_a
files_a.select { |f| f.key == 'whatever.txt' } #works as expected
Owner

geemus commented Jan 16, 2013

@epdejager - cool, thanks for the example code!

Contributor

jc00ke commented Jan 19, 2013

Wow, I just wasted 2 hours trying to figure this out. I independently discovered @epdejager's solution, but we should not have to do that. A custom #each is fine, but FFS don't break other Enumerable methods like #select, etc.

puh-puh-puh-please

Contributor

rbriski commented Mar 1, 2013

Lazy enumerators have been accepted into Ruby 2.0:
http://railsware.com/blog/2012/03/13/ruby-2-0-enumerablelazy/

Here's some work that's independent of Ruby versions:
https://github.com/yhara/enumerable-lazy/blob/master/lib/enumerable/lazy.rb

A solution may be to leave the current enumerables in place but deprecated them. Then add the .lazy notation for lazy evaluation of enumerables in the future.

Contributor

rbriski commented Mar 1, 2013

After a little more digging, it looks like the "backports" gem addresses this problem:
https://github.com/marcandre/backports

Would it be feasible to deprecate the current implementation of enumerables and just point people to backports if they want the feature?

Or we could include backports as a dependency and integrate it into the code.

Owner

geemus commented Mar 1, 2013

@rbriski - hmm, not super familiar with those (just browsed the links). I think in fog we want lazy by default (rather than opt-in lazy). Perhaps we should look at something like this so that we can stop being in the business of managing this (which is pretty far from fog's focus/core competency).

Contributor

rbriski commented Mar 7, 2013

What would you think if I added backports as a dependency and rewrote the enumerables to use .lazy implicitly. I'd add a configuration option to turn off lazy-loading and bring back the old behavior of loading only the first page in case people had come to depend on that broken functionality.

Seems like we're moving in the right direction without upsetting the current state of affairs too much. I'm just not sure how you feel about adding a dependency.

We could just copy the lazy evaluation code from backports but then we lose any future updates or bug fixes. Possibly not a problem as people eventually move to Ruby 2.0. Then again I work at a company that uses an 6 year old version of Python.

Owner

geemus commented Mar 8, 2013

@rbriski - it's tough, though I like the idea of not having to manage this backports scares the hell out of me. It just touches so much stuff that I'm not confident it wouldn't introduce other errors and/or make it more difficult to debug other things. Then again, as you mention, we are probably stuck with older version support for the foreseeable future (and probably beyond).

@wendorf wendorf added a commit to pivotal-cf/bookbinder that referenced this issue Mar 13, 2014

@wendorf wendorf Use Fog::AWS::Files#each instead of map because only the former pagin…
…ates correctly.

Note: This is rude behavior on fog's part. See fog/fog#596

[fixes #67513444]
1e9d40a
Member

plribeiro3000 commented Nov 10, 2014

Since Fog::Collection already does lazy evaluation (see here) as commented above, can this issue be closed?

Owner

geemus commented Nov 11, 2014

Closing for inactivity. Please comment if you have questions/concerns. Thanks!

@geemus geemus closed this Nov 11, 2014

Contributor

jc00ke commented Nov 11, 2014

Are #select and other Enumerable methods still affected by this?

Member

plribeiro3000 commented Nov 11, 2014

@jc00ke I don't think so as i mentioned before. Do you think this could still be an issue?

Contributor

jc00ke commented Nov 11, 2014

@plribeiro3000 per my earlier complaint yes, I think it's a big issue. If it's been fixed elsewhere, cool, but if not, I hope it will be.

Member

plribeiro3000 commented Nov 11, 2014

@jc00ke If you look at the link i mentioned you will see the lazy evalution code. I think this shouldn't be a issue anymore. Can you confirm that?

Contributor

jc00ke commented Nov 11, 2014

Confirm, cool!

Now, to not inherit from a core class :trollface: j/k, glad to see #select works as expected. Thanks!

Member

plribeiro3000 commented Nov 11, 2014

@jc00ke 👍

Owner

geemus commented Nov 14, 2014

Thanks for looking through and confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment