Skip to content
This repository
tree: b347adef5a
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 63 lines (50 sloc) 2.914 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
In no particular order:

* MySQL 4.1 introduces a number of enhancements that we should
   incorporate into RMySQL. E.g.,
    (a) a boolean type (finally!) -- we'll need to get
        rid of the coercion to integers in mysqlWriteTable().
    (b) prepared statements, which we could implement in RMySQL
        with a modest effort by mimicking the implementation in
        ROracle.
    (c) multiple sql statments per call (what we call "scripts").
        For this we'll need to define in DBI a method like
        "nextResult" to move over a collection of result sets
        created by a "script".

* Embedding MySQL. Add code to handle the embedded MySQL (this
   should be pretty easy). This does bring up a configuration
   issue. We may be able to easily define a macro based on which
   library we're linking against (libmysqlclient or libmysqld).
   
* Transactions have been available since 4.0 (and optionally before
   then). We need to implement them in RMySQL.

* dbApply. Move the dbApply C code into the fetch C code. Extend
   dbApply to work with multiple fields and report which field(s)
   has/have changed group.

* Allow users to specify whether to transfer all the data to the
   MySQL client or not (say, we could add a cache=T/F argument to
   fetch). Currently when we execute a query we leave all
   the data in the MySQL server and fetch row by row. This has
   performance implications (dbExec returns quickly but we incur many
   tcp round trips), and we run the risk of the server dropping some
   of the output records. We could instead have the server send the
   entire result set to the client MySQL, and have the MySQL library
   (not R/Splus) cache the result set for us (this can cause a big
   increase in the amount of memory that R/Splus is using, although
   R itself is not managing the MySQL result set cache proper).
   Some advantages of this approach is that small and medium size
   result sets could be sped up quite a bit, and the R/S code that
   builds up the data.frame where the output goes need not be built
   up dynamically but in one go.

* Also, we should allow the user to specify whether to lock or
   not the tables we're querying.

* Need to add a test for libz to the configure script.

* Need to look at asynchronous fetches (this brings the issue of
   threads and thread safety).

* Move the dbApply code to it's own file dbApply.c so that we can
   conditionally compile RMySQL on Splus (currently we abort because
   I haven't abstracted out the callback mechanism).

* Should the default for fetch(rs, n = 0) should be n=-1? The issue
   is safety (chunking) vs. convenience for small data sets).
   A possibility would be to set it to -1, but define an absolute threshold
   "max.records" that would cause to stop the fetching at that value
   and issue a warning message.

* Benchmark batch size and exec time...

Something went wrong with that request. Please try again.