… introduced in PostgreSQL 9.2beta3 PostgreSQL decided that PQsetRowProcessor was a bad API, and replaced it with PQsetSingleRowMode in 9.2beta3. I agree with the choice, as the new API is simpler and fairly easy to use. Currently, this requires a patch to ruby-pg to recognize PGRES_SINGLE_TUPLE as an OK result status. That should hopefully be committed soon. This also requires the master branch of Sequel, as it depends on some recent refactoring in the Sequel postgres adapter.
This adds a C-based PostgreSQL array parser. The original implementation is from the pg_array_parser library, but I've heavily modified it. This C-based parser is 5-500 times faster than the pure ruby parser that Sequel uses by default. 5 times faster for an empty array, and 500 times faster for an array with a single 10MB string. Because the pg_array extension can be loaded before or after sequel_pg, handle the case where it is loaded after by switching the Creator class to use the C based parser instead of the pure ruby parser.
PostgreSQL 9.2 adds a new libpq function called PQsetRowProcessor that allows the application to set a function that is called with every row loaded over the socket. This is different than the standard PostgreSQL API, which collects the entire result in memory before returning control to the application. This API makes it possible to easily process large result sets that would ordinarily not fit in memory. Integrating this API into sequel_pg wasn't simple. Because you need to set the row processing function before execution of the query (and unset it afterward), the dataset needs to pass additional information to the database, indicating that streaming should be used. It captures the block given to fetch_rows and passes both itself and the block to the database. Because part of the libpq API is tied to row+column indexing into the result set (PGresult), that part can't be reused by the row processing code. So I had to add spg__rp_value, which is sort of a duplicate of spg__col_value, but using the row processing API. spg__rp_value is probably not going to be as fast, as the row processing API doesn't use NUL terminated strings, so in many cases spg__rp_value has to create a ruby string where spg__col_value does not. The column info parsing can be reused between the regular and row processing code, so split that into a separate function called spg_set_column_info. Because sequel_pg needs to work on older libpq versions, add a have_func call to extconf.rb to determine if the row processing API is available. If it is, Sequel::Postgres.supports_streaming? will be true. The libpq row processing API supports passing information to the function via a void* API. To make memory management easy, a C struct is initialized on the C stack and a pointer to it is passed to the row processing function. Control is yielded to the block, with an ensure block to unset the row processing function when the block completes (or raises an error). So the row processing function should only be called when the memory referenced by the pointer is valid. The row processing function is reset to the standard one before the C function returns. This code needs the master branch of Sequel, since it overrides the new Database#_execute method in the postgres adapter. The reason for that refactoring was to enable streaming to work easily with prepared statements. Currently, streaming is disabled if you attempt to use the map, select_map, to_hash, and to_hash_groups optimizations. In all of these cases, you are returning something containing the entire result set, so streaming shouldn't be important. It is possible to implement support for streaming with these optimizations, but it requires a lot of custom code that I don't want to write. As it is, I had to add special support so that streaming worked when the optimize_model_load setting was used. Streaming is not enabled by default. You have to specifically require sequel_pg/streaming, then extend the database instance with Sequel::Postgres::Streaming, then call Dataset#stream on the dataset you want to stream. Alternatively, you can extend the database's datasets with Sequel::Postgres::Streaming::AllQueries, in which case streaming will be used by default for all queries that are streamable.
…1.9.2 Also, switch build process to piggy-back on ruby-pg's static building.
This makes to_hash_groups around 3 times faster for a key value symbol and about 50% faster for a key value array, compared to the previous code. Combined with the other optimizations, sequel_pg can speedup to_hash_groups by about 7.5x over the default Sequel code.
…stamps This is a fairly slow code path, but since infinite timestamps are fairly rare, that shouldn't matter.
…8-1.9.2 stdlib date library via Rational Unfortunately, with the 1.8-1.9.2 stdlib date class, you can't add a float to a DateTime instance and get microsecond accuracy, even if the float is very small. For a current DateTime instance, accuracy appears to be +/- 33 usecs. To work around this issue, check if the 1.8-1.9.2 stdlib date implementation is being used via the @ajd instance variable. If so, use a slower but more accurate Rational implementation. While here, register spg_SQLTime and spg_Postgres with the garbage collector to avoid segfaults if the constants are manually undefined. Also, rename a misleading macro to reflect it uses microseconds instead of milliseconds.
…off by default This setting is off by default as it isn't completely compatible. But it can speed up model loading by 30-50% for simple tables. Basically, this is just like the standard way of returning a hash, except that instead of yielding the hash, a model object is allocated, the hash is assigned as the values, and the model object is yielded.
…r_map, and #select_hash Previously, sequel_pg only modified the internals of #fetch_rows, which is the most important method as all other retrieval goes through it. However, the #map, #to_hash and related methods are inefficient in Sequel, as they build a temporary hash for each row, only to throw each temporary hash away. This commit optimizes these methods to not create the temporary hash, by having fetch_rows return a number of different types (e.g. single values, arrays of values, only a single hash). In my testing, there is about a 2x speedup using these optimized methods over the previous case where only fetch_rows was optimized, and only a very small performance hit for the general usage of fetch_rows.
This commit deals with the changes made in Sequel commit da78be4bb92c723f1f9d5d9dfee3850b1cd56699, and you must be running at least with that commit for this new code to work if you select any columns with types that this extension does not handle natively. I'm going to push a change to the gemspec that requires Sequel 3.24.0 after that is released.
Previously, it built against 8.4.4 and had issues with the new blob serialization used by default in 9.0.1.
Switch to filling the conversion proc array before iterating over results, instead of checking inside every column value to see if a conversion proc is available. For speed reasons, don't check for conversion procs for char, varchar, and text columns, since those should be returned as strings.