Skip to content
This repository

router: transform extract keys from symbol to strings #134

Closed
wants to merge 1 commit into from

5 participants

Julien Ammous Ilya Grigorik dan sinclair Philip (flip) Kromer Hedgehog
Julien Ammous

if you use this in your api:

map("/item/:id", MyAPIClass)

It was accessible as params[:id], now you can access it as params['id'].
The reason for this changes is to match the behavior of the Params middleware (which fills the params hash with string keys) and allow the use of the new and shiny Param middleware to validate that those keys are present without resorting to this:

use Goliath::Rack::Validation::Param, :key => [:id]

whi is rather ugly and not really intuitive.

Ilya Grigorik
Owner

I don't think we want to duplicate the values though? Ideally, these should be aliased.

Julien Ammous

You mean keeping the symbol keys and adding the key string to the hash ?

Ilya Grigorik
Owner

I mean just no duplicating the values.. something like: params[key.to_s] if params[key].nil?

Imagine you have a massive post body. Last thing we want to do is duplicate the same data twice within the hash.

Julien Ammous

ok I thought I misunderstood since I do not see where I am duplicating the values:

h = {
  :id => "34",
  :somedata => "Nothing but random crap to fill some space"
}

h2 = Hash[ h.map{|k,v| [k.to_s, v] } ]

p h2
# {"id"=>"34", "somedata"=>"Nothing but random crap"}

h[:somedata] << " and even more !"

p h2
# {"id"=>"34", "somedata"=>"Nothing but random crap and even more !"}


p h[:somedata].object_id
p h2['somedata'].object_id
# 70287446366400
# 70287446366400

Unless I missed something there is only one string here, the only things I change (and duplicate in a sense) are the keys so it would only be a problem if you use a really big string as key what I doubt anyone would do.

Ilya Grigorik
Owner

I think we're talking about the same thing, but in different ways.. I guess what I'm after is that we shouldn't need to duplicate anything at all. I think what you're after is effectively this: http://as.rubyonrails.org/classes/HashWithIndifferentAccess.html

Note that within Goliath you can access query string / post body params via both symbol and string methods also. So, I agree we should make that consistent for router as well, but this should be handled by the hash itself, there is no need to duplicate keys or values.

(unless I'm still missing something here :))

Julien Ammous

I know about HashWithIndifferent access yes but my goal was not to allow both way to access, in my first example if I write this:

map("/item/:id", MyAPIClass)
use Goliath::Rack::Validation::Param, :key => 'id'

It will always fails to find the id key since it will look for params['id'] which does not return the same as params[:id].
even if we can access the query string and body params indifferently with symbols or string keys the params hash does not inherit this behavior which causes my problem above.

If you have another suggestion to fix the problem I am all ears :)

dan sinclair
Owner
dj2 commented

I think the consistent solution to this would be to fix Issue #138 and make the hash actually indifferent.

Then, move the indifferent_params and indifferent_hash method up to the class level and can use those two methods in the builder when we create and merge the router params.

Should keep everything consistent and keep the router in sync with params in the future.

Julien Ammous

performance wise wouldn't it be better to just settle on strings ? As long as the behavior is documented I doubt it would be a problem.
Allowing the params hash to be accessed by symbols and strings is nice but one may not be aware that accessing it with the wrong type may be 2 or 3 times slower, even if the numbers are really small the cost might not be worth it between writing params[:id] and params["id"] ;)

I just ran some numbers to compare the implementations and unless I have an errors lying around the results are interesting to say the least : https://gist.github.com/1632199

Philip (flip) Kromer
Collaborator

I'd recommend extlib's Mash over activesupport's HashWithBBQ

Activesupport pulls in a whole raft of things -- its more cherry-pickable than it was, but more heavyweight than extlib. It also forces you to not use anything else pre-3.0 from the rails suite.

If performance/simplicity were an issue, you could eject the convert_value biz from mash -- it's for recursively munging a hash into a mash, and looks like a multiple-times tax on any assignment.

Julien Ammous

I updated my gist to include Mash, its results are even worse than activesupport.
ActiveSupport is now way more modular than before allowing you to decide what to require but I don't see any point using it in goliath, not for this anyway.

Julien Ammous

so, what should we do about this ?

I tried Hashie as another alternative but it is even slower than Mash.

Ilya Grigorik
Owner

We shouldn't need any third party dependencies for this. We already have the code for this (indifferent_params), but it looks like there is a bug associated with it atm -- once that's fixed, this problem is resolved as well.

Julien Ammous

I really like this project and there are more ideas I want to implement but having an issue staying open for weeks without real activity is not really motivating...

I posted in the linked issue my attempts and I don't really think the hash as it is can really be made completely indifferent without adding a noticeable overhead so in the end unless someone has an idea on how to do this my solution still stands.

dan sinclair
Owner

I apologize for the slow response. After rereading the thread I'm leaning towards just saying env keys are strings. Then fixing any code saying they're symbols. @igrigorik what's your opinion on that?

Ilya Grigorik
Owner

I'm not averse to it.. there is something to be said for the simplicity of it. (to have everything as strings). @schmurfy, @mrflip: thoughts?

Having said that, for discussion above, we don't need any crazy dependencies, wouldn't the below just do the trick:

class IHash < Hash
  def [](key)
    case key
    when Symbol then super(key) || super(key.to_s) 
    when String then super(key) || super(key.to_sym)
    else super(key)
    end
  end

  def []=(key,v)
    super(key.to_sym, v)
  end
end


h = IHash.new
h[:a] = 1
h['b'] = 2

p [h[:a], h['a']]
p [h['b'], h[:b]]

h.merge!({:b => 3, 'c' => 4})


p [h[:a], h['a']]
p [h['b'], h[:b]]
p [h['c'], h[:c]]
Julien Ammous

if we go the hash with indifferent access way I like your solution the most for now.

But Having this abstraction means we have some "complex" (more complex than the current really simple indifferent_hash at least) code ran each time we access the hash versus only one time when the url is parsed with the current indifferent_hash and the inputs are converted to string keys.

The difference are really small I don't deny that but it adds up.

That said I will rally the majority on this one as long as we have a way to use the validation on parameters coming from the router.

PS: I know I may be annoying on this one ;)

dan sinclair
Owner

So, my vote is to just mandate strings and make sure the router is updated to send strings instead of symbols.

Ilya Grigorik
Owner

So, a few options, none are slam dunks:

1) Introduce "indifferent hash": preserves current API, will add (minor) overhead to each lookup
2) Force all keys in env to be strings, deprecate symbol syntax (there is something to be said for simplicity)

In theory, (2) could be softened by adding a deprecate warning and retiring in a later version, but given that goliath is <1.0, I'd actually vote for doing the "quick and painful" route.

We have one vote for (2) from @dj2. @mrflip, @schmurfy, what about you guys?

Julien Ammous

My vote also goes for (2) but the current indifferent_hash implementation could be kept, as long as we ensure that the keys inserted in it are strings they will be accessible by either a string or a symbol (the overhead of the current indifferent_hash is really small and can be nullified if you use string keys).

Wether indifferent_hash is kept or not is good for me, I only access the keys with string anyway.

Ilya Grigorik
Owner

Alright, let's go with (2).

@schmurfy: I'd vote to remove the previous indifferent_hash code entirely - if we're making a break, let's do a clean break, no reason to accumulate vestigial code.

With that, I think we can close this issue, and instead open another thread for removing the current indifferent_hash code?

Julien Ammous

How do we handle the output from http_router ? We still need to convert the keys in strings.

Ilya Grigorik
Owner

Any config options for that in the http_router itself? If not, then that sounds like a straightforward fix on our side.

Hedgehog

At the risk of being too late...

In choosing 2) I understand the desire to be consistent, but I do wonder about the choice of consistency.
My understanding was that hash lookups using symbol keys is the most efficient, @schmurfy's gist seems to confirm this. Consequently, for some time now I've adopted the practice of using symbol keys through out my projects.

Given that Ruby lib's are used in a, increasingly, wider context doesn't it make sense to make the more efficient choice - symbols.

Certainly, it is nice that all Goliath hash keys are strings. Users will now have to keep track of which hashes came from Goliath and which did not - at least if you made the all symbols choice you could offer as some defense/justification a point about better performance. Right now it seems I have to adopt more expensive hash-lookups and ones that are inconsistent with the rest of my apps.

I understand not all keys can be easily mapped to symbols, but isn't the lesson of Rails success, that adopting some conversion-convention may be well received if it improves matters. Matters being performance +consistency. I suspect in most Goliath apps hash-lookup happens sufficiently frequently to compensate for the cost of any key-conversion. But that is a highly subjective claim.

Apologies if I'm way off-track and out-of-line.

Julien Ammous

The difference between string and symbol is really really small, my benchmarks show a 100ms difference for 1 million accesses which is negligible compared to the ease of use:

:'a strange key'
"a strange key"

There is a problem (which may not be always one): they are never garbage collected (once created you will always get the same instance), so if you use user inputs to create them you may have a surprise.

Given that in the goliath the params hash is filled from user inputs that's one reason for me to avoid them.

Hedgehog

I don't dispute any of that. I just observe:

  • a real really really small number multiplied by a really really really large number can be a large number.
  • the key format issue is what I had in mind when I made the observations about conventions and key-conversion costs:, i.e

    "a strange key" -> :a_strange_key

I think it is worth noting too that, wrt my first point, it is more likely that Goliath will attract case cases where the numbers orders of magnitude larger comapred to what Rails or Padrino might consider plausible use-cases.

I also acknowledge that telling people to go away and study, then implement, the 1,000,000 user comet application is quite legitimate too :)

Julien Ammous

@igrigorik I will check what http_router does and can do exactly this weekend.

@hedgehog
After my messages here I can't say I don't like to optimize code (maybe a little too much sometimes ;) ) but how would you handle the possible "out of memory" attacks opened by storing symbols in the params hash ?

1_000_000.times do |n|
  request("http://ip/?rand#{n}=1")
end

I have no idea how much memory is used by a symbol but you get the idea.
Since the current implementation stores the key as string everything is fine but if it was storing them as symbols...

Ilya Grigorik
Owner

@hedgehog @schmurfy Personally, I think the performance argument here is a red herring - yes, there is a difference between using a string and a symbol, but for all intents and purposes it is more or less negligible. The more important aspect to me here is the consistency, and I can be swayed to go with either all strings or all symbols -- which one it is less important to me then the fact that we decide on one and enforce it.

The primary argument against symbols would be edge cases in params like "my%20var=12". I can legitimately expect env['my var'] to return 12, but it's not intuitive to me what that variable should be if its converted to a symbol. :my_var is a possibility, but it requires that I verify this and/or consult the docs. And we can easily construct more complicated cases of that kind.. Which is why, at least for now, the string choice seems like a safer bet.

Hedgehog

@igrigorik, whether a red herring or not depends on context. However, performance was raised simply because that was the reason I adopted the convention of using symbols throughout multiple projects. That Goliath might compel me to handle both throughout my code (or implement key translators) is the real objection.
So we agree consistency is the more important aspect, I simply argue that we might benefit by thinking in terms of a wider context than the code that exists/is-used within Goliath.

I simply observed that, in a toss-up choice, performance could be cited as the tie-breaking criteria. Of course one could cite other use-cases/scenarios as the tie breaker.

No matter how you try slice this, providing a generic solution inside Goliath looks awkward, expensive and ultimately I think will be proved specific.

Let's solve this in the time-honored fashion:
Do not try to make everyone happy. Rather simply give everyone the opportunity to make themselves happy.

Then, perhaps the thing to do is to give freedom and responsibility to the devs, as well as settle on a default behavior/convention:

Default behavior/convetion:

  • Goliath requires parameters be explicitly be declared, and given as symbols. Example

      use Goliath::Rack::Validation::Param, :key => :payload
    
  • Goliath converts a parsed parameter string using the std Ruby .to_sym.

  • If the symbolized parameter is not one that been declared as a key it is dropped, with a log entry.

For parameters that do not fit this use case, devs can be given the opportunity (encouraged) to either:

  1. Preferred: Write a middleware or plugin that is a subclass of Goliath::Rack::Validation::Param and implements method <name> (akin to the way the dev has to implement a response method in their Goliath::API sub-class)
  2. Discouraged, but possible for edge-cases: Use a proxy service that sanitizes/inoculates the parameters strings before they are seen by Goliath

I'm not familiar enough with Goliath internals to know how onerous it would be to allow parameter handling to accept middleware or plugins. Or even which is best suited to this problem: middleware or plugin?

Learning curves/barriers-to-entry could be flattened by providing a non-trivial eample implementation that deals with a common problem.

Thoughts?

Hedgehog

@schmurfy, like every good coward - I run away from them :)

Seriously, I think the way forward here is to make Goliath parameters as flexible as Goliath responses: See my response to igrigorik above.

Is this feasible given the current code?

Julien Ammous

@igrigorik I checked the http_router code and judging by the code it won't output strings.

@hedgehog I really don't like forcing validation for the api, one thing I just love in ruby is how fast you can prototype things, each time I want to test something I just open a new file in TextMate, write my code, run it and it works ! (in textmate you dont even have to save the file it will create a temporary file for you)

I prefer to have something simple and then makes it more robust and more complex than having to start directly with the complex one (that's also one reason I hate java so much).

Hedgehog

@schmurfy, ack that simplicity is desirable, but often the most difficult to achieve. As I point out in my original comment, strings everywhere seems great when you are writing a simple proof-of-concept in the way you just describe. When 'things' start getting passed around many components of an app, that prototype convenience you described bites back. Hard.

I think we are agreed that it is not feasible for Goliath to accept arbitrary parameters if it is to use symbol keys.
If dropping the undeclared keys makes you uncomfortable, then how about:

  • Goliath converts all parsed parameter strings using the std Ruby .to_sym.
  • If that is not suitable in your use case, you can write a Parameters middleware/plugin, (pointing to some example)

This way there is no validation. Instead a simple, std convention.

Better?

dan sinclair
Owner

I'd say, go for strings instead of symbols. For every library that requires symbols there is probably another that requires strings. The decision is arbitrary but the strings case requires less conversions for us in Goliath.

Ilya Grigorik
Owner
If dropping the undeclared keys makes you uncomfortable, then how about:
- Goliath converts all parsed parameter strings using the std Ruby .to_sym.
- If that is not suitable in your use case, you can write a Parameters middleware/plugin, (pointing to some example)

Validation / param middleware is an optional convenience, we shouldn't require that you explicitly list every single param. I'm with Dan on this one: let's go with strings. The one other point of reference here is that rack defaults all of its parsed variables and ENV params as strings also.

Hedgehog

White flag :)

Ilya Grigorik
Owner

@hedgehog: :-) .. your points are well taken, I think this is just one of those "no matter what you choose ..." scenarios.

Hedgehog

It seems that way, but I saw it as exercising a point of project philosophy: try to make a choice that allows Goliath to cover everything out-the-box (choose string params) vs choose simple default behavior consistent with Goliath object IO being one part a wider context, and give a path for people to easily add whatever permutation of bells and whistles they require (param symbols + plugin/middelware).

I take your point about Rack params, but when I say consider Goliath in a 'wider context' I do mean a wider context than even Rack apps, i.e. Goliath as just one part of a generic ruby stack.
It is not criticism, but I do think many project naturally tend to be driven by the developer workflows, such as that outlined by @schmurfy. I know, I do it myself, but in one project in particular it has limited the potential use-case so much that it likely won't see the light-of-day. But man, was it ever the most convenient thing while I was working on it.... yep for my use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Jan 10, 2012
Julien Ammous schmurfy router: transform extract keys from symbol to strings 74077b5
This page is out of date. Refresh to see the latest.

Showing 1 changed file with 5 additions and 1 deletion. Show diff stats Hide diff stats

  1. +5 1 lib/goliath/rack/builder.rb
6 lib/goliath/rack/builder.rb
@@ -35,7 +35,11 @@ def self.build(klass, api)
35 35 route.to do |env|
36 36 builder = Builder.new
37 37 env['params'] ||= {}
38   - env['params'].merge!(env['router.params']) if env['router.params']
  38 +
  39 + if env['router.params']
  40 + # transform the keys into string
  41 + env['params'].merge!( Hash[env['router.params'].map{|k,v| [k.to_s, v] }] )
  42 + end
39 43
40 44 builder.params = builder.retrieve_params(env)
41 45 builder.instance_eval(&blk) if blk

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.