Need a way to disable the identity map #285

Closed
stve opened this Issue Jul 9, 2012 · 9 comments

Comments

Projects
None yet
2 participants
Collaborator

stve commented Jul 9, 2012

The identity map is resulting in a growing memory usage in TweetStream: intridea/tweetstream#87. As new streaming responses are received, each response is added to the identity map when Twitter::Status objects are created. In this case, the identity map is only going to continue to grow, resulting in the "leak".

It seems most logical to have a means to either disable it entirely or have a way to create Twitter::Base/Twitter::Identifiable objects without hooking into the identity map. The way the current code is structured doesn't easily allow for option 2 without some rejiggering. Prior to working on a solution, I wanted to put it forth here for discussion.

Owner

sferik commented Jul 9, 2012

I agree, there should be a way to initialize Twitter::Base objects without adding them to the identity map. Arguably, this should be the default behavior, so that there are fewer surprises for people upgrading from earlier versions.

Here's what I'd propose:

  • Twitter::Base.new initializes a new object without adding it to the identity map.
  • Twitter::Base.fetch retrieves an object from the identity map or returns nil if it's not found (this is no different from the current behavior but see the note below for a possible change).
  • Twitter::Base.store stores an object in the identity map and returns it (effectively, the current behavior of Twitter::Base.initialize).
  • Twitter::Base.fetch_or_new should be renamed to Twitter::Base.fetch_or_store.
  • Twitter::Base.from_response should be modified to use Twitter::Base.fetch_or_store.

Note: Since we're sort of aping the Hash API (but with class methods instead of instance methods), what would you think of making Twitter::Base.fetch behave more like Hash#fetch, so that it raises a KeyError if the key is not found, unless a second argument or a block is passed.

Here's the Rubinius implementation of Hash#fetch, in case we decide to go that route.

Collaborator

stve commented Jul 10, 2012

I liked your proposed changes, very clear and a logical separation. The changes to Twitter::Base were fairly minimal. I think mimicking Hash#fetch is a good idea, is what you had in mind something like this?

def self.fetch_or_store(attrs={})
  self.fetch(attrs) do
    self.store(attrs)
  end
end

If nothing was passed to fetch, it would raise?

Owner

sferik commented Jul 10, 2012

Something like that. I think Twitter::Base.store should only be responsible for storing the object, not creating it.

def self.fetch_or_store(attrs={})
  self.fetch(attrs) do
    object = self.new(attrs)
    self.store(object)
  end
end
Collaborator

stve commented Jul 10, 2012

That makes sense, I don't have it implemented that way, changing it now.

Owner

sferik commented Jul 10, 2012

I'm not in love with the name fetch_or_store but I'm not sure I have a better alternative. If we wanted to be maximally verbose/descriptive, the method would be named something like fetch_or_create_and_store. Maybe fetch_or_create would be better, since it indicates that it returns an object either way, but I'm not convinced. I guess we could do something like this:

def self.create(attrs={})
  object = self.new(attrs)
  self.store(object)
end

def self.fetch_or_create(attrs={})
  self.fetch(attrs) do
    self.create(attrs)
  end
end

What do you think?

Collaborator

stve commented Jul 10, 2012

I know what you mean about fetch_or_store being a bit of an odd name. I think create could get confusing too (i.e., what is the difference between create/new).

Owner

sferik commented Jul 10, 2012

I suppose defining a separate create class method could be confusing but I'd argue it's less confusing that what we have now. The idea is borrowed from ActiveRecord, where new means create an object in memory and create means create and store that object. I think most people understand the distinction in Rails, but I've been using Rails for a long time, so maybe it is confusing for beginners. Even so, Rails is so widely known within the Ruby community that I think it's acceptable to assume most of our users understand this distinction. It's much (much, much) more likely that a beginner will encounter Rails before encountering this library. Of course, we run the risk of people mistaking Twitter::Base objects for ActiveRecord::Base objects. Users might mistakenly believe that calling create will somehow store the object in their database instead of in an identity map, but I still think this is the right interface. At the moment, the identity map is implemented as an in-memory hash but it's conceivable that the IdentityMap implementation could use a SQLite database instead. In fact, early versions of this gem depended on SQLite. #history

Collaborator

stve commented Jul 10, 2012

I was thinking along the same lines that an adaptable interface could be swapped in to use with the IdentityMap.

I'm coming around to fetch_or_create and create, shall we roll with it?

Owner

sferik commented Jul 10, 2012

Let's roll. 🎢

One more thing: fetch_or_create should be aliased to fetch_or_new (at least until version 4) so that we don't break compatibility mid-version.

@sferik sferik closed this Jul 16, 2012

@stve stve referenced this issue in tweetstream/tweetstream Jul 17, 2012

Closed

Memory Usage #87

sferik added a commit that referenced this issue Sep 18, 2012

Disable identity map by default
The identity map was causing more issues than expected:
* #260
* #285
* #302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment