Ensure MSSQL engine treats all float-like values as float, not decimal #1088

rebkwok · 2023-02-28T13:40:17Z

Adds a custom Float type for the mssql engine, which ensures that all bound parameters are cast to float.

I haven't added the spec test that fails in #1065, because it doesn't seem like a useful test to be included in the documentation, but I have verified that it passes with this change.

evansd

I agree that this shouldn't have a spec test, but I think it might be worth an integration test (along the lines of the one we added to test the date type issue earlier) to enforce this behaviour.

evansd · 2023-02-28T14:49:10Z

databuilder/query_engines/mssql_dialect.py

+        # This ensures that any float-like numbers are always cast to
+        # floats, otherwise MSSQL will sometimes make a best-guess and return a
+        # Decimal type, which behaves differently to other engine.
+        return cast(bindvalue, type_=self)


I didn't initially think of doing this via CAST, although this ought to work fine.

For what it's worth, MSSQL has a way of expressing float literals using scientific notation:
https://learn.microsoft.com/en-us/sql/t-sql/data-types/constants-transact-sql?view=sql-server-2017#float-and-real-constants

So I think we could do this by adding a process_bind_param() method which formats the Python float as a string and then sticks E0 on the end (if the string doesn't contain E already).

I think this would result in slightly cleaner SQL, but there's probably not much practical difference.

@evansd I've tried this, and I couldn't get it to work. If I format it as a string (in process_bind_param()), it complains about types:

sqlalchemy.exc.OperationalError: (pymssql._pymssql.OperationalError) (8117, b'Operand data type nvarchar is invalid for sum operator.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n') [SQL: SELECT sum(%(sum_1)s) + %(sum_2)s AS anon_1] [parameters: {'sum_1': '0.5E0', 'sum_2': '0.25E0'}]

If I return it as the literal 0.5E0 (couldn't actually figure out how to do that, but I made it return the constants as a test), it still ends up with the truncated value (I guess because it means the same as 0.5, so it hasn't actually been converted to anything different).

Ah, yeah I see. I guess there's no way round that then given the limitations of pymssql. Thanks for investigating.

rebkwok · 2023-02-28T15:25:11Z

I agree that this shouldn't have a spec test, but I think it might be worth an integration test

@dave I tried to add an integration test similar to the date one you pointed at, but I don't think I'm writing it correctly. I thought this was doing the equivalent of the spec test, but it's passing without the new Float type:

def test_float_literals_have_correct_type(mssql_engine):
    case_statement = sqlalchemy.case(
        (sqlalchemy.literal(2) == 1, 0.1), else_=0.75
    )
    query = sqlalchemy.select(case_statement.label("output"))
    with mssql_engine.sqlalchemy_engine().connect() as conn:
        results = list(conn.execute(query))
    assert results[0].output == 0.75

evansd · 2023-02-28T17:28:19Z

@rebkwok After a spectacular amount of wild-goose-chasing I have reduced the failing test case down to this:

def test_float_literals_have_correct_type(mssql_engine):
    sum_literal = sqlalchemy.func.sum(0.5)
    query = sqlalchemy.select(sum_literal + 0.25)
    with mssql_engine.sqlalchemy_engine().connect() as conn:
        results = list(conn.execute(query))
    assert results[0][0] == 0.75

It was SUM that was doing the work all along here, not CASE.

evansd · 2023-03-01T10:12:25Z

tests/integration/query_engines/test_mssql_dialect.py

+def test_float_literals_have_correct_type(mssql_engine):
+    sum_literal = sqlalchemy.func.sum(0.5)
+    query = sqlalchemy.select(sum_literal + 0.25)
+    with mssql_engine.sqlalchemy_engine().connect() as conn:
+        results = list(conn.execute(query))
+    assert results[0][0] == 0.75


Just based on the confusion we had recently when doing some test archaeology, I think we should add some more comments here to explain what this otherwise very weird looking test is doing. Something like:

Suggested change

def test_float_literals_have_correct_type(mssql_engine):

sum_literal = sqlalchemy.func.sum(0.5)

query = sqlalchemy.select(sum_literal + 0.25)

with mssql_engine.sqlalchemy_engine().connect() as conn:

results = list(conn.execute(query))

assert results[0][0] == 0.75

def test_float_literals_have_correct_type(mssql_engine):

# When using the `pymssql` driver without special float handling the "0.5" below

# gets typed as a decimal and then the result of SUM gets typed as fixed precision

# decimal.

sum_literal = sqlalchemy.func.sum(0.5)

# When added to a decimal of greater precision, the result gets rounded and ends up

# being 0.8

query = sqlalchemy.select(sum_literal + 0.25)

with mssql_engine.sqlalchemy_engine().connect() as conn:

results = list(conn.execute(query))

# By explicitly casting floats in our custom dialect we can get the correct result

assert results[0][0] == 0.75

Ah, I merged too soon. I'll add this

rebkwok changed the title ~~feat: Custom MSSQL Float type~~ Ensure MSSQL engine treats all float-like values as float, not decimal Feb 28, 2023

evansd approved these changes Feb 28, 2023

View reviewed changes

feat: Custom MSSQL Float type

caf7f50

Add integration test for mssql float literals

243495d

rebkwok force-pushed the mssql-float-type branch from 6c7c5e8 to 243495d Compare March 1, 2023 09:35

rebkwok merged commit f9207c0 into main Mar 1, 2023

rebkwok deleted the mssql-float-type branch March 1, 2023 10:11

evansd approved these changes Mar 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure MSSQL engine treats all float-like values as float, not decimal #1088

Ensure MSSQL engine treats all float-like values as float, not decimal #1088

rebkwok commented Feb 28, 2023

evansd left a comment

evansd Feb 28, 2023

rebkwok Mar 1, 2023

evansd Mar 1, 2023

rebkwok commented Feb 28, 2023

evansd commented Feb 28, 2023

evansd Mar 1, 2023

rebkwok Mar 1, 2023

Ensure MSSQL engine treats all float-like values as float, not decimal #1088

Ensure MSSQL engine treats all float-like values as float, not decimal #1088

Conversation

rebkwok commented Feb 28, 2023

evansd left a comment

Choose a reason for hiding this comment

evansd Feb 28, 2023

Choose a reason for hiding this comment

rebkwok Mar 1, 2023

Choose a reason for hiding this comment

evansd Mar 1, 2023

Choose a reason for hiding this comment

rebkwok commented Feb 28, 2023

evansd commented Feb 28, 2023

evansd Mar 1, 2023

Choose a reason for hiding this comment

rebkwok Mar 1, 2023

Choose a reason for hiding this comment