-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak performance of quantify lib #32
Comments
Just a thought: Why not defining all units an app may use in a yml file, which gets parsed exactly once during startup. The parser then creates Unit singletons for all defined units... |
Just finished refactoring Dimensions class with its notoriously slow #to_hash method: https://github.com/cblock/quantify/ This brings down spec runtime from ~16 secs to ~5 secs already. The benchmark is even better (Quantity#to_si uses #to_hash internally):
Let me know what you think. Should I send you a pull request for the main branch? |
Hey Carsten. Sounds great, certainly send a pull request. I like the idea of specifying units once on initialization. That was my intention really when using the config files, although given the 'default' configuration (currently, all units), a custom configuration might involve dumping a load of previously defined units which I don't think is very nice. Trouble is, there has to be some default units (and dimensions), for example, the SI units against which all other units are referenced. I think I would prefer a system where a smaller set of required units is always instantiated and others can be pulled in if desired, perhaps in a modular fashion (require 'cgs', require 'imperial', etc.), or individually defined. YAML format sounds fine. Currently, you can override the default config by running additional config in an initializer (or elsewhere). I've basically never been able to decide on the best solution or anticipate what would suit others best (convention or configuration?). The Dimension#to_hash method is mostly used in operations (multiply, divide, etc) between two dimensions. I think in most cases it is passed in to the private #enumerate_base_quantities method which alters the base quantities on the receiver. The memoized hash would probably need refreshing at the end of that method. I have thought about using singletons to represent units but never got to a satisfactory point. At present, an instance of each method is held in the Unit.units 'module' variable and then cloned anytime it is requested. So we get many instances of each unit existing. Are you suggesting that we have only one (singleton) instance of, say, a kg unit, which is used in all operations involving a kg? That would be nice, but I think there would be complications where compound units and unit operations come in to play, so it might take a bit of work. There is a lot of fairly messy code in Unit.rb which provides the versatile functionality for accessing units via names, pluralised names and symbols for single and compound units and units within strings. Unit::escape_multi_word_units is part of this functionality. It basically allow multi word units to be recognised within larger strings. I have never been very happy with most of this parsing and regex code. Although it works fairly well, it just seems way to messy. One implication of removing the ability to configure units at runtime would be that a number of these methods (which basically build lists and regexen) would be static and could therefore be memoized. Cheers Andrew |
Hi Andrew, refactoring dimensions.rb was quite a challenge for me because there are so many back and forth references between 1.) I like your idea of always loading a minimal set of default units (convention) and optionally adding more units through configuration. I guess the minimal set should be unity + si_units, right? Then lets make these ones (immutable) singletons. 2.) Make all units immutable. A unit instance should never change its dimensions during its life. 3.) Rethink some of the class concepts: For example, For my current web project (where quantify_mongoid is used) I suffer heavily from Unit.for() being really, really slow. The scenario is a batch import of thousands of records, each holding several quantities. To ensure that all imported quantities come with valid units, I have a Unit.for() call for each entry. My idea right now is to speed things up using (cached) Unit#unit_label_regex, Unit#unit_symbol_regex etc. If that's not enough I will digg into Unit.rb next to speed it up too. Cheers, |
One more thing that comes to my mind: If there's a |
And another thing: |
Hi Andrew,
currently I'm in the process of performance tweaking quantify_mongoid. That's when I noticed that Quantity#to_si and Unit#for operations are very slow. Tracking it down further (see https://github.com/cblock/quantify/tree/perf/perf) I found out that Dimensions#to_hash is both expensive in terms of cpu cycles and frequently called when invoking #to_si on a quantity.
Same goes for Module#escape_multi_word_units when invoking Unit#for on complex units.
My feeling is that, e.g. #to_hash could be momoized but when I started refactoring I also noticed that you allow reconfiguration of dimensions and units during runtime. This in turn makes it necessary to invalidate memoized values for #to_hash and other operations.
So my question is: Is it really necessary to support reconfiguration during runtime? I found that you use that in quantify_amee to unregister some units. But wouldn't it be better to load everything once during initialization (e.g. into Singletons) and then not change it until app shuts down?
Also: I'm happy to help out refactoring quantify lib into that direction but I'm not too sure where to start. Any suggestions are greatly appreciated.
The text was updated successfully, but these errors were encountered: