-
Notifications
You must be signed in to change notification settings - Fork 209
Robust Aggregate Functions #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robust Aggregate Functions #211
Conversation
|
Wow, thanks a lot for this work. The patch is a lot to grok, but your description makes sense. On your commit that introduces a linked list, the commit message says that it's being introduced because of finalization problems when using a Ruby array. Can you explain this? I'm not seeing anything in the new mark / sweep functions that would indicate we can't use a Ruby array (but maybe I'm just not seeing it). |
|
Thank you for considering this merge request. Concerning the use of Ruby Arrays: you are probably right and a ruby array works fine. Being used to C but not to writing ruby extensions I already had a clear picture what the lifetimes should be and was uneasy about the GC possibly doing something I do not expect. I implemented it, checked with valgrind, still had problems (see below) and then -- you might call it a short-circuit reaction -- rewrote things in a way I am more familiar with. Unfortunately, I don't have the original code anymore. I rewrote this from memory, so please take it with a grain of salt. It looked like this: struct _sqlite3Ruby {
sqlite3 *db;
VALUE aggregators; /* = rb_ary_new(); rb_gc_mark() */
};
typedef struct rb_sqlite3_aggregator_wrapper {
/* back-ref to the aggregators array this wrapper lives in.
* Not marked to avoid cyclic reference. Stupid optimization in hindsight. */
VALUE parent_array;
/* the AggregateHandler class we are wrapping here */
VALUE handler_klass; /* rb_gc_mark() */
/* Ruby array of in-flight instances of the AggregateHandler klass.
* I did not get to the point where I actually allocated the instances. */
VALUE instances; /* = rb_ary_new(); rb_gc_mark() */
} aggregator_wapper_t;
/* AggregatorWrapper ruby class, hidden from scripts */
VALUE cAggregatorWrapper;
VALUE rb_sqlite3_aggregator_alloc(VALUE self) {
struct rb_sqlite3_aggregator_wrapper *x;
return TypedData_Make_Struct(self, aggregator_wrapper_t, /* type */, x);
}
/* called by sqlite3_create_function_v2's xDestroy-callback. */
void rb_sqlite3_aggregator_destroy(void *void_aw) {
aggregator_wrapper_t *aw = void_aw;
long i;
if (aw->parent_array == Qnil) return;
/* lookup my position in the aggregators array and remove reference to me */
for (i = 0; i < RARRAY_LEN(aw->parent_array); i++) {
aggregator_wrapper_t *x;
VALUE ruby_x = rb_ary_entry(aw->parent_array, i);
TypeData_Get_Struct(ruby_x, aggregator_wrapper_t, /* type */, x);
if (x == aw) {
rb_ary_delete_at(aw->parent_array, i);
aw->parent_array = Qnil;
return;
}
}
rb_fatal("not in aggregators");
}
/* called by sqlite3_rb_close() after sqlite3_close() and in deallocate() */
void rb_sqlite3_aggregator_destroy_all(sqlite3RubyPtr ctx) {
long i;
if (ctx->aggregators = Qnil) return;
for (i = 0; i < RARRAY_LEN(ctx->aggregators); i++) {
aggregator_wrapper_t *aw;
VALUE ruby_aw = rb_ary_entry(ctx->aggregators, i);
TypeData_Get_Struct(ruby_aw, aggregator_wrapper_t, /* type */, aw);
/* make sure rb_sqlite3_aggregator_destroy does not try to access
* the aggregators array any more. */
aw->parent_array = Qnil;
}
ctx->aggregators = Qnil;
}
void init() {
rb_gc_register_address(&cAggregatorWrapper);
cAggregatorWrapper = rb_class_new(rb_cObject);
rb_define_alloc_func(cAggregatorWrapper, rb_sqlite3_aggregator_alloc);
}Well, valgrind complained but I did not find a bug. I came up with the explanation that I may not access "member objects" in the finalizer The funny part is: I again got the same valgrind error after switching to my linked list, but this time I was sure my code works as intended. Only then I realized that That said I currently cannot say with certainty whether the ruby array approach is correct or not. Personally, I prefer the linked lists because it avoids the Data_Wrap_Struct / Data_Get_Struct hassle and lifetimes are explicit. If, however, the linked lists are not your taste, I would be willing to rewrite this to ruby arrays and try again with valgrind. You may wonder why I packed all the VALUEs in C-Structs instead of accessing them with |
Objects won't move. I'm working on a patch for the GC that will allow them to move, but it is conservative: anything marked with
Ruby scripts can only get access to instance variables that start with an "@", but the C API allows you to use any name you wish. It's probably not documented, but
I think using Arrays and IVs would drop the patch size so I'd prefer them. If fighting with the GC turns out to be too much hassle, I'd take this patch too. Again, thanks for the work on this. I appreciate it! |
For the upcoming tests, I need a third column in the 'foo' table -- or alternatively a second table. Extending the existing table in `test_integration.rb` let some other tests fail. Given that `test_integration.rb` is already lengthy, I thought it would make things clearer if I separate the aggregate tests into their own file with their own setup/teardown methods.
Added tests to demonstrate three issues with the way sqlite3-ruby currently handles aggregate functions: (1) Defining a second aggregate function and using both results in memory violation. The problem is that `rb_iv_set(self, "@agregator", aggregator);` (database.c:399) just overwrites the last reference to the first AggregateHandler, but SQLite still holds a pointer to it and will follow the pointer if the first aggregate function is used in a query. The most straight-forward fix is to insert the aggregator in a ruby array and never release it -- similar to how its done for functions. (2) Using the same aggregate function twice in a query mixes up values from both columns. For example: `SELECT MYAGGREG(a), MYAGGREG(b) FROM foo;` Leads to: Myaggreg.step(a1); Myaggreg.step(b1); Myaggreg.step(a2); Myaggreg.step(b2); ... ; Myaggreg.finalize(), Myaggreg.finalize() The SQLite API expects the caller to differentiate between these chains of invocation via `sqlite3_aggregate_context()`, but current sqlite3-ruby does not account for that. #44 has been a work around for this in the special case that the first aggregation is finalized before the second started (separate queries). #161 does the analog for the function Proxy. (3) Documentation implies that library users shall explicitly set the arity of the function. Consequently the library user is likely to expect that this is passed down to SQLite, so he may define multiple functions with the same name but different arity and SQLite to use the most appropriate one (documented sqlite feature). Unfortunately, sqlite3-ruby does not pass the arity to SQLite, which is surprising given that different arity-values are accounted for in the C code. The problem is that sqlite3-ruby does not call the AggregateHandlers "arity" method but instead queries the arity of the "step" method from Ruby (sqlite3_obj_method_arity). Confusingly, this is not the "step" provided by the library user, but the "step" of the Proxy Wrapper classes introduced by Database.create_aggregate or Database.create_aggregate_handler. Both of these are just `step(*args)`, so the arity reported to SQLite is always -1. Things not addressed: - #164 impossible to call FunctionProxy.set_error (and FunctionProxy.count). because @driver does not exist (anymore) - Database.create_aggregate and the tests contain a 'text_rep' field, that is never passed down to SQLite. It is not advertised in the documentation, though.
To manage the active references to the aggregate handler and its instances, I need some list/array/vector. I have used Ruby's array for that, but wrapping the rb_sqlite3_aggregator_wrapper with a new, hidden Ruby class turned out more complex than I initially expected. Especially the undefined finalization order of the GC posed problems on shutdown. Consequently I decided that it would be better to handle the datastructures completely in C there I have full control over the destruction order.
As I already wrote in an earlier commit message, the memory access violation occurs after the library user installs a second aggregate handler. In that case, Database.@aggregate is overwritten and therefore no references to the AggregateHandler remain. The simplest fix would be to change @Aggregator (or @agregator) into a Ruby array of all AggregateHandlers ever installed. I decided against that because I think its cleaner when SQLite controls the lifetime of the handlers. To that end the sqlite developers extended the sqlite3_create_function API with the xDestroy callback. The extended "v2" Version is available since 3.7.3 (2010-10-08). I installed a fallback to the old version, should the new function be unavailable. While it is possible to remove elements from an @Aggregator ruby array while in xDestroy callback, this was clumsy and difficult to do correctly under undefined finalization order. I came to the conclusion that its easier to manage the lifetime completely in C and only provide the appropriate rb_gc_mark function to SQLite3::Database. The remaining issue is to make the VALUEs for the AggregateHandlers (and later their instances) visible to the mark function. To that end, I introduced the aggregator_wrapper struct that holds the VALUE and the linked list pointers. I replaced @Aggregator with the _sqlite3Ruby.aggregators linked list head. I used valgrind to check that memory accesses are sane now. Before this patch, I saw loads from previously free'd memory that disappeared. There are still lots of warnings about use initialized values, but this is unrelated and most likely an internal issue with MRI 2.0.0. As part of this patch, I moved the aggregate function related code into its own files "aggregator.[ch]".
I missed this in the initial set of unit tests. #44 handles this case already. Keep in mind though that is just the effect of the order in which SQLite calls the step() and finalize() functions and could theoretically change.
This is clearly an array of VALUE, not of VALUE*. As it should hold sizeof(VALUE) == sizeof(VALUE*) this does not really matter though.
This commit addresses the issue that using the same aggregate function twice in a select statement mixes up the execution contexts in the AggregateHandler. While resetting the aggregate Handler in the finalize() method emulates the context lifetime in the other cases (group by, independent select), resetting is a fragile approach. Instead sqlite3_aggregate_context() is the function in the SQLite API that should be used to separate contexts. sqlite3_create_context() is a bit weird though: it is memory allocator that returns the same memory when called from a context that already called sqlite3_create_context once. Equally strange is that there is no destroy() callback -- sqlite will clear up the memory when it does not need it any more and won't tell us. Because a second allocator is difficult to combine with Ruby's GC and because I do not have the lifetime control, I use it only to store a pointer to a heap allocated aggregator_instance_t structure. This structure holds the VALUE for the Ruby instance that belongs to the aggregate function's execution context. The structure forms a linked list rooted in the aggregate_wrapper_t for the function it as an instance of, so the instance can be GC marked. SQLite's API left me with no choice but to assume that every execution context ends with a single call to finalize(). At that point I unlink and destroy the aggregate_instance_t. I never observed SQLite doing anything else. There are checks in place to catch if SQLite does call finalize() a second time. If it does not ever call finalize(), we leak memory. If someone wants to use sqlite3_result_error, he or she should check if the invocation of sqlite3_result_error is also the right place to destroy the aggregate_instance_t. This commit comes with one rather big change: now sqlite3-ruby allocates the instances of the aggregate handler in the C code. This forced me to change the interface, because the existing interface passed only the instance object, not the factory, to the C code. To that end, I introduced a new private method Database.define_aggregator2 that takes the factory as it sole parameter. Its like Database.create_aggregate_handler but without the FunctionProxy "ctx" object that serves no useful purpose. I wrote a new Ruby shim for the old Database.define_aggregator and adopted Database.create_aggregate and Database.create_aggregate_handler to the new interface between Ruby and C. Generally I was careful to keep the interface as it is. There is a tiny API change that I could not help avoid: As Database.define_aggregator only receives one object, but it needs one per execution context, the only option i saw was to use the provided one as template and give every new instance a clone. This means, however, that the template must provide a suitable clone. As every ordinary ruby object already provides one, this should not pose problems in practice, though. And of course, people who were getting wrong results from their aggregate functions suddenly get the right one, which may also come as a surprise to them.
Arities outside [-1,127] invoke undefined behavior.
A tiny optimization for the common case that an aggregate function takes only one value.
We call +new+, +step+ and +finalize+ in callback context from sqlite3_step. If one of these function raises, we have to deal with the following issues: - all three functions (may) leak memory from xcalloc in this case - the checks I inserted in 325ac74 result in a rb_fatal() if +new+ raises. - SQLite surely does not expect its callbacks to far-jump out of sqlite3_step. I don't if that causes any inconsistencies in SQLite's internal data structures. We address these issues by protecting these three funcalls with rb_protect, which suppresses the exceptions. Statement.step is the point where suppressed exceptions should be re-raised. As I could not find an easy way to transport the status value from rb_protect to Statement.step (sqlite3_set_error_code would be possible but fragile), I instead re-raise the exception from rb_errcode(). In principal, this may raise any exception in $!, but I don't see how this would affect normal client code. I also set sqlite3_result_error in finalize to indicate an exception, although the user should see the raised exception instead. It would certainly be more consequent to call sqlite3_result_error as soon as we catch an exception. I guess this would stop SQLite from doing any more useless work and abort the query. But, as I already wrote, I am unsure about the consequences for memory lifetime in that case, so it seems safer to me to defer any error until finalize.
Silence some warnings.
Call StringValueCStr before xcalloc but last of all calls into ruby. This way we do not leak memory should name.to_s raise or return a String with interior null.
This has been undocumented behavior before: while create_aggregate and create_aggregate_handler effectively used an arity of -1, the library user could actually define functions with a different arity by calling define_aggregator directly. I was not aware of this behavior and therefore broke it by forcing the arity to -1. Now I added the required shim and a test.
All data is now managed in plain ruby objects referenced from ruby arrays. Instance variables are prefixed with "-" to make them inaccessible from scripts. I ran into the finalization issue in `deallocate()`: When `deallocate()` calls `sqlite3_close()` the object passed to `sqlite3_create_function` must still be alive. Otherwise the `xDestroy()` handler will attempt to destroy something that is not around any more. Unfortunately, when `deallocate()` is invoked, the GC might already have released the aggregator wrappers, because the SQLite3::Database ruby object is already dead at that point. The only way to circumvent this, that I came up with, would have been to reference the aggregator wrappers from a global root, e.g. by referencing a ruby array with `rb_gc_register_address()`. Given that the aim was to cut code size, I considered it more appropriate to do without the xDestroy callback all together. That is, I call the old sqlite3_create_function API again.
rb_class_new generatos class with undefined allocator in ruby 1.9. Calling Class.new is also a one-liner and works in ruby 1.9 and 2.x.
|
I changed the code to use regular Ruby objects and instance variables. I prefixed the instance variables with "-" istead of "@", to make them inaccessible from Ruby. All references to the objects are now managed in Ruby arrays. That way, I could remove both the linked list and the mark operations. I also had to rebase against master. I ran into the finalization issue in static void deallocate(void * ctx)
{
sqlite3RubyPtr c = (sqlite3RubyPtr)ctx;
sqlite3 * db = c->db;
if(db) sqlite3_close(db);
xfree(c);
}When Given that the aim was to cut code size, I considered it more appropriate to do without the I don't know whether this is future-proof with regards to your moving GC. While I can keep objects alive by referencing them from an array, that does not forbid moving them. I don't currently I stumbled over the sunny/shady distinction. To my understanding the objects created in C code are shady because they have been at some point accessible to C. If all shady objects are immovable, this code should be fine. If not, I think the original code would break under a compacting GC the same way as my current code does. If the moving GC issue is not a blocker for you, I am done. |
I admit this is an invasive with a lot of modifications to the C code, but I did not see a simpler way to make aggregate functions work correctly. The following is a high-level overview. If you want to know more details, please read the commit messages.
What is currently wrong with user defined aggregate functions?
Defining a second aggregate function and using both results in memory violation.
Currently sqlite3-ruby just holds a reference
@agregatoron the aggregator defined last. If the user defines two functions with different names, SQLite will hold the callback pointers to both of them but Ruby's GC will clean up the aggregator defined first. If the user subsequently uses the first function in a query, invalid memory access results. I observed crashes and weird behavior in practice.The most straight-forward fix would to insert all aggregators defined in a ruby array and release them only at Database close. This is how it is currently done for (ordinary) functions.
Using the same aggregate function twice in a query mixes up values from both columns.
Assume we have defined an aggregate function "ACCUMULATE" that does the same as SUM:
You would expect
[3, 30]but you actually get[33, nil]. The reason is that sqlite3-ruby does not properly account for individual execution contexts of aggregate functions. This has already raised issues earlier:GROUP BYqueries in the most natural execution order). It will reset the Aggregate Handler (replace its instance with a new one) afterfinalizeThe assumption that SQLite will first bring one execution context to termination in
finalizeand then continue the next execution context is not generally valid. It may be violated by more complexGROUP BYqueries. It is definitely violated in the example above, as SQLite callsstep(1); step(10); step(2); step(20); finalize; finalize.The way of handling separate execution context that is intended by the SQLite API is
sqlite3_aggregate_context()which is a memory allocation function that returns the same memory when called twice from the same context. Admittedly a weird interface. The primary contribution of this pull request is to change sqlite3-ruby's aggregate instance handling to depend onsqlite3_aggregate_context().Arity value is not respected
Documentation implies that library users shall explicitly set the arity of the function. Consequently the library user is likely to expect that this is passed down to SQLite, so he may define multiple
functions with the same name but different arity and SQLite to use the most appropriate one (documented sqlite feature). Unfortunately, sqlite3-ruby does not pass the arity to SQLite, which is surprising given that different arity-values are accounted for in the C code. The problem is that sqlite3-ruby does not call the AggregateHandlers "arity" method but instead queries the arity of the "step" method from Ruby (sqlite3_obj_method_arity). Confusingly, this is not the "step" provided by the library user, but the "step" of the Proxy Wrapper classes introduced by
Database.create_aggregateorDatabase.create_aggregate_handler. Both of these are juststep(*args), so the arity reported to SQLite is always -1.Update: with
Database.define_aggregatorthe library user was able to define functions with arity other than -1, although this was undocumented behavior.Exceptions in callbacks are not appropriately handled
stepandfinalizeexecute in callback context fromsqlite3_step.If one of these function raises, execution far-jumps immediately back to ruby code. SQLite certainly does not expect the callbacks it invokes to do such a thing. I don know if this results an any inconsistencies internal to SQLite. What i know is that theparamsarray allocated byxcallocinstepwon't be released in this case.What does this pull request contain?
sqlite3_aggregate_context()define_aggregator2that is similar tocreate_aggregate_handlerfor that purpose.finalize. Release aggregate handlers viasqlite3_create_function_v2's xDestroy callback if available.arityvalue down to SQLite as the documentation impliesStatement.step.Database.define_aggregatora documented feature.Does this change via API?
I was careful to keep the API as it was and rewrote the
create_aggregateandcreate_aggregate_handlershims to usedefine_aggregator2. I also implemented a ruby shim fordefine_aggregator. The only observable difference that I have not listed as a bug above that I am aware of is that theaggregatorpassed todefine_aggregatormust implementclone, as all regular Ruby objects do. This is because I need to clone it to create its individual instances for the execution contexts.I also have a fallback in place for the old
sqlite3_create_functionshould the new one be unavailable.How is this tested?
I wrote a lot of Unit tests. I also checked memory with valgrind. Valgrind was able to detect the memory violation described above. After my patches only the reads from uninitialized memory warnings remain that ruby seems to cause no matter what it runs. I tested on OSX 10.10 with Apples Ruby 2.0.0 and on ArchLinux with Ruby 2.3.3.
What is not addressed by this patch?