Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Ensure that types are preserved fetching from database #8448

Merged
merged 5 commits into from Nov 9, 2018

Conversation

sminnee
Copy link
Member

@sminnee sminnee commented Oct 4, 2018

This ensures that numeric fields appear in PHP as int/float values
rather than strings, which allows the development of more type-safe PHP
code.

This doesn’t work on the legacy mysql driver and this will now throw
a notice-level error. It requires mysqlnd.

Fixes #7039, and other bugs such as #6709

Also requires silverstripe/silverstripe-postgresql#91

Taken from open-sausages@f4e6aae

@sminnee
Copy link
Member Author

sminnee commented Oct 4, 2018

OK @dhensby @tractorcow I made quite a bit of progress with this once I realised that the underlying issue is that, when using PDO, you're expected to also use their transaction methods.

Unfortunately their transaction methods don't supported nested transactions. So, the current code "fakes" the nested transactions by making any internal transaction a no-op. Right now this is breaking the tests that do things with transactions.

So, there's a questions: how badly do we need nested transactions that allow you to roll-back only the inner transaction? I would probably advocate for a nested transaction model where if an inner transaction had an issue, the top-level transaction would also be rolled back. In real world code where transaction rollback is primarily to prevent data corruption in the case of failures, this should do.

We can then amend our tests to suit this expectation.

Related: does DBAL support nested transactions?

@sminnee
Copy link
Member Author

sminnee commented Oct 8, 2018

OK I've had a look at https://www.doctrine-project.org/projects/doctrine-dbal/en/2.8/reference/transactions.html#transaction-nesting about DBAL, and they do "fake" transaction nesting similar to what I have implemented here for PDODatabase. I'm going to tidy up this implementation so that it matches their semantics and then get our test suite working with that.

@sminnee sminnee force-pushed the int-types branch 2 times, most recently from 402d110 to e051c34 Compare October 9, 2018 04:07
@sminnee
Copy link
Member Author

sminnee commented Oct 10, 2018

Merging silverstripe/silverstripe-postgresql#91 before this and re-executing the tests should get everything to be green.

@tractorcow
Copy link
Contributor

@dhensby and I have both tried turning on native types a few times, and we both got close before we ran into unresolvable roadblocks. I believe there were different issues for each connector type;

I'm of the mind that we may not be able to do this at the connector level. Instead I started to think about doing write casting on the DBField level.

At the moment we do this:


        // Set $this->record to $record, but ignore NULLs
        $this->record = array();
        foreach ($record as $k => $v) {
            // Ensure that ID is stored as a number and not a string
            // To do: this kind of clean-up should be done on all numeric fields, in some relatively
            // performant manner
            if ($v !== null) {
                if ($k == 'ID' && is_numeric($v)) {
                    $this->record[$k] = (int)$v;
                } else {
                    $this->record[$k] = $v;
                }
            }
        }

Perhaps instead we implement a reverse cast (something like, but maybe not exactly, $this->dbOject($k)->setValue($v);) and the specific subfields internally force the correct type before assigning these to the dataobject record.

Fixing this once at the model level (db agnostic) means we don't need to patch each DB connector, which by the way you might find breaks lots of things. :) (e.g. prepared statement caching, segfaults, all the kind of fun things you get when working with db layer).

@tractorcow
Copy link
Contributor

Ah yes, I found my own proof of concept of such an issue a while back.

https://gist.github.com/tractorcow/390c9075020b6e2fdfa807c835403afa

Maybe you can fix it. :)

@sminnee
Copy link
Member Author

sminnee commented Oct 11, 2018

Yeah the big breakthrough was realising that doing transactions without the PDO's built-in transaction methods was a big idea.

I've solved the problem now — it depends on a postgresql PR to get the build passing, but the code is written.

I just need feedback on whether it's solved in the right way.

@sminnee
Copy link
Member Author

sminnee commented Oct 11, 2018

I managed to get MySQL working without requiring post-processing.

For PostgreSQL, I needed to do some post-processing

My instinct is that MySQL is the most commonly used connector and so a little more awkwardness for PGSQL is okay. I haven't tested SQLite yet but I don't see it as insurmountable. We don't currently have a passing test set for SQLite regardless.

@sminnee
Copy link
Member Author

sminnee commented Oct 11, 2018

Aah, I stand corrected: MySQL uses 1/0 in a tinyint to represent a boolean.

I could ensure that booleans are consistently represented as 1/0 rather than true/false. Upon reflection, I think this would be my preference.

Or I could provide some postprocessing of MySQL data, casting any tinyint(1) data to boolean. This would add postprocessing to MySQL where currently none is necessary, though. I'd rather leave MySQL without processing if I can avoid it.

Happy to hear other views? @dhensby @tractorcow

@sminnee
Copy link
Member Author

sminnee commented Oct 16, 2018

OK I've gone with formally declaring booleans to be 1 and 0, following MySQL's lead (which doesn't have a native boolean type)

@sminnee
Copy link
Member Author

sminnee commented Oct 16, 2018

@robbieaverill @kinglozzer might be good to broaden the review pool for this? See also silverstripe/silverstripe-postgresql#91

@kinglozzer
Copy link
Member

I’ll try to find time to do a proper code review later, immediate thoughts/queries:

My preference would be for booleans to represented as actual booleans, but probably at a higher (likely DBField) level. i.e. an SQLSelect might return a 0 or 1, but $myDataObject->BooleanField would return an actual boolean. That can always come after this PR of course :)

//This page should NOT exist, we had 'read only' permissions
$this->assertFalse(is_object($fail) && $fail->exists());
} else {
$this->markTestSkipped('Current database is doesn\'t support transactions');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably do an early return instead to save all the indentation

@sminnee
Copy link
Member Author

sminnee commented Oct 18, 2018

I’m not familiar with the finer points of transactions and what our API currently supports, but is this acceptable for a mid-4.x-release-line change? https://github.com/silverstripe/silverstripe-framework/pull/8448/files#diff-517cba5a1e63143651649368ef337664R333

The nature of this changes is that it moves away from genuine support for nested transactions to simulated nested transactions, which means that child transactions can't be rolled back without rolling back the entire transaction, which is essentially the same behaviour as savepoints.

This is because PDO is dislikes running transaction-control statements as regular queries; it really wants you to use its own transaction control commands instead. Specifically, they cause bugs when running unbuffered queries, and unbuffered queries are necessary to get correct type data.

The two broad paths that I considered were:

  • Deprecate 'true' nested queries
  • Deprecate PDO

I'm not joking about the 2nd one. Although on the surface of it using a built-in abstraction library is "better", in practise, PDO has been a deeply disappointing system at many turns, providing underwhelming lowest-common-denominator functionality and confusing bugs and inconsistencies.

However, I've opted for the first solution in this PR, mainly because it matches what we would need to do in order to move to DBAL, which Dan has previously raised as an option for the future. So, PDO lives on. ;-)

My preference would be for booleans to represented as actual booleans, but probably at a higher (likely DBField) level. i.e. an SQLSelect might return a 0 or 1, but $myDataObject->BooleanField would return an actual boolean. That can always come after this PR of course :)

Yeah, we can't easily do this as the database level because MySQL doesn't return "type = boolean" data. Creating a DataObject-level casting of values to booleans would require the instantiation of a whole lot of DBBoolean objects and I'm worried about the performance impact, so I've stayed away from that.


/**
* @inherit
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit these, the @inheritDoc tag on its own is functionally equivalent to not having a doc block at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does removing this cause more strict linters to bark, though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes the slevomat standard to, yeah, but that'd be an issue to raise with them - this is fine in terms of the (proposed) PSR-5 spec and how PHPDocs get rendered =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sounds like that standard prevents you from doing this anyway: slevomat/coding-standard#399

src/ORM/Connect/PDOConnector.php Show resolved Hide resolved
* Driver
* @var string
*/
private $driver = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use protected over private

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily disagree, although I should point out a previous discussion where you suggested that protected variables are part of our semver-governed "Public API". I'm worried about doing both.

I think that we should either treat protected properties that aren't explicitly documented as public API as "internal", or we use privates more often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess it's a topic for another conversation, but at the moment our codebase is almost all protected over private so I guess we should keep it consistent?


/**
* @inherit
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove


/**
* @inherit
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

src/ORM/Connect/MySQLDatabase.php Outdated Show resolved Hide resolved
src/ORM/Connect/MySQLDatabase.php Outdated Show resolved Hide resolved
src/ORM/Connect/MySQLDatabase.php Show resolved Hide resolved
.travis.yml Show resolved Hide resolved
@sminnee
Copy link
Member Author

sminnee commented Oct 18, 2018

Somehow preserve the existing transaction behaviour - in a “legacy transaction manager” perhaps?

Yeah, we could potentially provide support for savepoint-based transactions on

  • Non-PDO connectors
  • PDO connectors with a "legacy" flag enabled, which would break all the type-correctness stuff

I could do this with a SavepointManager interface as a companion to TransactionManager, or something.

@sminnee
Copy link
Member Author

sminnee commented Oct 18, 2018

@kinglozzer this c34c73f restores savepoint functionality, but if you want to use it on a PDO database, then you need to set this yaml config:

SilverStripe\ORM\Connect\PDOConnector:
  legacy_types: true

In effect it reverses the effect of this PR.

@kinglozzer
Copy link
Member

Awesome, thanks @sminnee

Copy link
Member

@kinglozzer kinglozzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, but I’m happy with the approach. Can you add some upgrade docs about this change?

@@ -22,16 +22,52 @@ class PDOQuery extends Query
* Hook the result-set given into a Query class, suitable for use by SilverStripe.
* @param PDOStatement $statement The internal PDOStatement containing the results
*/
public function __construct(PDOStatement $statement)
public function __construct(PDOStatement $statement, PDOConnector $conn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DB::get_conn()->getDriver() an acceptable alternative to changing this method signature? I don’t feel too strongly about it given this is fairly internal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It plugs back to global state; in my view this approach is better. The alternative would be to create PgPDOQuery extends PDOQuery and put this functionality there...

foreach ($columnMeta as $i => $meta) {
// Coerce floats from string to float
// PDO PostgreSQL fails to do this
if (isset($meta['native_type']) && preg_match('/^float/', $meta['native_type'])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this will be called on an unknown number of rows for each query, could/should we replace this with strpos($meta['native_type'], 'float') === 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that preg_match is the fastest match. But we could also potentially get a list of all the options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected:

php test-preg.php 
preg... 3.5830228328705
strpos... 1.2603948116302
in_array... 1.4446079730988
<?php

define('LOTS', 10000000);

profile('preg', function() {
  for ($i=0; $i<LOTS; $i++) {
    preg_match('/^float/', 'float8');
    preg_match('/^float/', 'float16');
    preg_match('/^float/', 'varchar');
    preg_match('/^float/', 'int');
  }
});

profile('strpos', function() {
  for ($i=0; $i<LOTS; $i++) {
    strpos('float8', 'float');
    strpos('float16', 'float');
    strpos('varchar', 'float');
    strpos('int', 'float');
  }
});

profile('in_array', function() {
    $array = ['float8', 'float16', 'float4'];
  for ($i=0; $i<LOTS; $i++) {
    in_array('float8', $array);
    in_array('float16', $array);
    in_array('varchar', $array);
    in_array('int', $array);
  }
});

function profile($label, $callback) {
    echo $label . "... ";
    $t1 = microtime(true);
    $callback();
    $t2 = microtime(true);
    echo ($t2-$t1) . "\n";
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly in_array is faster once you add the condition === 0 to strpos and over various versions of PHP: https://3v4l.org/dVGKg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still pretty close; i'm going with === 0 as it reduces the risk that i'm missing an entry in my array ;)

preg... 4.3494780063629
strpos... 1.7755749225616
in_array... 1.6038949489594

/**
* Fetch a record form the statementw ith its type data corrected
* Necessary to fix float data retrieved from PGSQL
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing param/return types

@sminnee
Copy link
Member Author

sminnee commented Nov 5, 2018

OK @kinglozzer I've rebased and addressed a couple of points of your feedback, but left the changes to PDOQuery::__construct()'s method signature in place, as I thought that avoiding access to global state was more important than preserving this method signature. Notably, doing DB::get_conn() would block the creation of more than on dbConn object which is currently possible.

@sminnee
Copy link
Member Author

sminnee commented Nov 5, 2018

It would be cool if we could merge silverstripe/silverstripe-postgresql#91 so that I can retrigger these tests and get a green build before we merge.

@kinglozzer
Copy link
Member

Thanks @sminnee! I’ve merged silverstripe/silverstripe-postgresql#91 and will restart the build, if you can write up some brief docs then I think we’re good to merge

@sminnee
Copy link
Member Author

sminnee commented Nov 5, 2018

Cool, so regarding docs, does this make sense?

Anything else?

@sminnee
Copy link
Member Author

sminnee commented Nov 5, 2018

Looks like I need to fiddle with the deprecation notification version in testReadOnlyTransaction, too.

@sminnee
Copy link
Member Author

sminnee commented Nov 5, 2018

@dhensby @tractorcow and the rest of the @silverstripe/core-team, FYI I've implemented what was previously stated as impractical, this is your last chance to review before it likely gets merged.

@kinglozzer
Copy link
Member

I think that just about covers it @sminnee, maybe a quick note in the changelog linking to the “PHP Types” section would be good too. Will leave this a few days anyway to give others the chance to review 👍

Sam Minnee added 2 commits November 9, 2018 10:31
This ensures that numeric fields appear in PHP as int/float values
rather than strings, which allows the development of more type-safe PHP
code.

This doesn’t work on the legacy mysql driver and this will now throw
a notice-level error. It requires mysqlnd.
sminnee pushed a commit to silverstripe/silverstripe-postgresql that referenced this pull request Nov 8, 2018
This test has been added for all database types in framework
in silverstripe/silverstripe-framework#8448
@sminnee
Copy link
Member Author

sminnee commented Nov 8, 2018

OK @kinglozzer and @silverstripe/core-team - docs added, tests passing, ready to merge for 4.4.x-dev from my perspective!

@kinglozzer kinglozzer merged commit b3c4c61 into silverstripe:4 Nov 9, 2018
@kinglozzer
Copy link
Member

🎉 nice work @sminnee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants