Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering by a Specific Date in Snowflake will bring inaccurate results #39769

Closed
ignacio-mb opened this issue Mar 7, 2024 · 9 comments · Fixed by #40530 or #41864
Closed

Filtering by a Specific Date in Snowflake will bring inaccurate results #39769

ignacio-mb opened this issue Mar 7, 2024 · 9 comments · Fixed by #40530 or #41864
Assignees
Labels
.Backend .Correctness Database/Snowflake .Escalation Misc/Timezones Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness Querying/Processor Type:Bug Product defects
Milestone

Comments

@ignacio-mb
Copy link
Contributor

Describe the bug

Filtering by "Specific date" filter in Snowflake will bring wrong results

To Reproduce

  1. Create a table in Snowflake with the SQL below the steps
  2. Create a GUI * from table and put it in a dashboard
  3. Add a Relative dates filter and connect it to the card
  4. Select "Specific dates" as a filter, "On" and filter by "November 12th, 2023"
  5. See that the card will show records from November 13th as well
    Screenshot 2024-03-07 at 11 39 29 AM
SQL for creating table and adding data with date time column

CREATE TABLE sample_table (
id INTEGER,
user_name VARCHAR,
created_at TIMESTAMP
);

-- Insert sample data for each hour of each day from November 10 to November 15
INSERT INTO sample_table (id, user_name, created_at)
SELECT
ROW_NUMBER() OVER (ORDER BY day, hour) AS id,
'User_' || ROW_NUMBER() OVER (ORDER BY day, hour) AS user_name,
DATEADD('MINUTE', (hour + day * 24) * 60, '2023-11-10 00:00:00') AS created_at
FROM
(
SELECT 0 AS day UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) AS days
CROSS JOIN
(
SELECT 0 AS hour UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10 UNION ALL
SELECT 11 UNION ALL
SELECT 12 UNION ALL
SELECT 13 UNION ALL
SELECT 14 UNION ALL
SELECT 15 UNION ALL
SELECT 16 UNION ALL
SELECT 17 UNION ALL
SELECT 18 UNION ALL
SELECT 19 UNION ALL
SELECT 20 UNION ALL
SELECT 21 UNION ALL
SELECT 22 UNION ALL
SELECT 23
) AS hours;

Expected behavior

Should only bring the results from that day

Logs

query in snowflake

SELECT "PUBLIC"."SAMPLE_TABLE_2"."ID" AS "ID", "PUBLIC"."SAMPLE_TABLE_2"."USER_NAME" AS "USER_NAME", "PUBLIC"."SAMPLE_TABLE_2"."CREATED_AT" AS "CREATED_AT" 
FROM "sample"."PUBLIC"."SAMPLE_TABLE_2" 
WHERE ("PUBLIC"."SAMPLE_TABLE_2"."CREATED_AT" >= ?) AND ("PUBLIC"."SAMPLE_TABLE_2"."CREATED_AT" < ?) LIMIT 2000

-- {"pulseId":null,"serverId":"adbf7a69-c037-43f2-9b43-bb98001f852f","client":"Metabase","queryHash":"233e5c2a605bb015ded3bc42a539d1c9607a48eae959eed9d316e10ce199d526","queryType":"query","cardId":114,"dashboardId":35,"context":"dashboard","userId":1,"databaseId":34}

Information about your Metabase installation

- Metabase 48.8

Severity

P2

Additional context

No response

@ignacio-mb ignacio-mb added Type:Bug Product defects Misc/Timezones Querying/Parameters & Variables Filter widgets, field filters, variables etc. Database/Snowflake Querying/GUI Query builder catch-all, including simple mode .Needs Triage labels Mar 7, 2024
@ignacio-mb ignacio-mb added Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Escalation and removed .Needs Triage labels Mar 11, 2024
@ignacio-mb
Copy link
Contributor Author

ignacio-mb commented Mar 18, 2024

This appears to happen also in SQL Server. However, I was not able to reproduce. For reference, here is the conversation with the customer.

@harrywynn

This comment was marked as spam.

@camsaul camsaul self-assigned this Mar 19, 2024
camsaul added a commit that referenced this issue Mar 21, 2024
@ranquild ranquild added Querying/Processor .Backend and removed Querying/Parameters & Variables Filter widgets, field filters, variables etc. Querying/GUI Query builder catch-all, including simple mode labels Mar 25, 2024
@camsaul camsaul added this to the 0.49.2 milestone Mar 27, 2024
@Tony-metabase
Copy link
Contributor

@camsaul I don't think this was fixed:

image

I still get the same behaviour in both dashboard and the question ... I am running 1.49.6 and if i convert the question to SQL then it works as expected

image

@Tony-metabase Tony-metabase reopened this Apr 23, 2024
@perivamsi
Copy link
Contributor

Here's a repro on Stats

The same question converted to SQL does not have this issue

@camsaul camsaul removed this from the 0.49.2 milestone Apr 25, 2024
@camsaul
Copy link
Member

camsaul commented Apr 25, 2024

So in the repo on stats, the SQL we're generating is

SELECT
  "PUBLIC"."orders"."id" AS "id",
  "PUBLIC"."orders"."user_id" AS "user_id",
  "PUBLIC"."orders"."product_id" AS "product_id",
  "PUBLIC"."orders"."subtotal" AS "subtotal",
  "PUBLIC"."orders"."tax" AS "tax",
  "PUBLIC"."orders"."total" AS "total",
  "PUBLIC"."orders"."discount" AS "discount",
  "PUBLIC"."orders"."created_at" AS "created_at",
  "PUBLIC"."orders"."quantity" AS "quantity"
FROM
  "v3_sample-dataset"."PUBLIC"."orders"
WHERE
  (
    "PUBLIC"."orders"."created_at" >= '2019-02-11 00:00 -08:00':: timestamp_tz
  )
  AND (
    "PUBLIC"."orders"."created_at" < '2019-02-12 00:00 -08:00':: timestamp_tz
  )
LIMIT
  2000

created_at in this case is a timestamp with time zone so this query appears to be correct with session timezone set to US/Pacific as we do in stats... not sure why it's giving the wrong answers

@camsaul
Copy link
Member

camsaul commented Apr 25, 2024

Ok apparently it's actually running

SELECT
  "PUBLIC"."orders"."id" AS "id",
  "PUBLIC"."orders"."created_at" AS "created_at"
FROM
  "2024_04_25_e15f5997_60ab_4c7d_8fde_164a001da04c_test-data"."PUBLIC"."orders"
WHERE
  ("PUBLIC"."orders"."created_at" >= ?)
  AND ("PUBLIC"."orders"."created_at" < ?)
ORDER BY
  "PUBLIC"."orders"."id" ASC
LIMIT
  1048575

with

{:params (#t "2019-02-11T00:00-08:00" #t "2019-02-12T00:00-08:00")}

and this is definitely not working correctly

@camsaul
Copy link
Member

camsaul commented Apr 25, 2024

So we're passing these parameters as java.sql.Timestamps (because Snowflake JDBC doesn't/didn't support the new java.time classes?) and apparently the Snowflake JDBC driver is ignoring the associated Calendar

TRACE sql-jdbc.execute :: Set param 1 -> #t "2019-02-11T00:00-08:00"
TRACE execute.legacy-impl :: (.setTimestamp 1 ^java.sql.Timestamp #inst "2019-02-11T00:00:00.000000000-00:00" <GMT-08:00 Calendar>)
TRACE sql-jdbc.execute :: Set param 2 -> #t "2019-02-12T00:00-08:00"
TRACE execute.legacy-impl :: (.setTimestamp 2 ^java.sql.Timestamp #inst "2019-02-12T00:00:00.000000000-00:00" <GMT-08:00 Calendar>)

@camsaul
Copy link
Member

camsaul commented Apr 25, 2024

🤦

#11718

So it seems like I already fixed this four years ago (or at least thought I did) but we inadvertently broke the fix at some point in the past because these methods are never actually used, these methods are for :metabase.driver.snowflake/use-legacy-classes-for-read-and-set which is not actually a thing.

https://github.com/metabase/metabase/blob/b93e99d/modules/drivers/snowflake/src/metabase/driver/snowflake.clj#L618-L632

Open question why the #11036 tests are still passing

@camsaul
Copy link
Member

camsaul commented Apr 25, 2024

Apparently I'm a dumb dumb and fixed it for parameters/ZonedDateTime but not for OffsetDateTime or OffsetTime 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.Backend .Correctness Database/Snowflake .Escalation Misc/Timezones Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness Querying/Processor Type:Bug Product defects
Projects
None yet
7 participants