Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

FilmonK · 2023-10-20T22:11:46Z

Describe the bug

Issues using the Group By clause in conjunction with aggregate functions for JSON columns that have been unfolded.

To Reproduce

Using a postgres table with JSON unfolded.

Save it as a question and Summarize with a Group By clause

This works as expected

The SQL that is generated

SELECT
  (
    "public"."job_data_json"."job_info" # >> array [ 'state' ] :: text [ ]
  ) :: text AS "job_info → state",
  COUNT(*) AS "count"
FROM
  "public"."job_data_json"
GROUP BY
  "job_info → state"
ORDER BY
  "job_info → state" ASC

You are also able to write a native query against the table to get the same results as above.

SELECT
    job_info->>'state' AS state,
    COUNT(*) AS state_count
FROM job_data_json
GROUP BY(job_info->>'state')
ORDER BY 1

Convert that same question into a model and create the same aggregation as above.

The SQL that is generated

SELECT
  ("source"."job_info" # >> array [ 'state' ] :: text [ ]) :: text AS "job_info → state",
  COUNT(*) AS "count"
FROM
  (
    SELECT
      "public"."job_data_json"."id" AS "id",
      (
        "public"."job_data_json"."job_info" # >> array [ 'city' ] :: text [ ]
      ) :: text AS "job_info → city",
      (
        "public"."job_data_json"."job_info" # >> array [ 'salary' ] :: text [ ]
      ) :: double precision AS "job_info → salary",
      (
        "public"."job_data_json"."job_info" # >> array [ 'state' ] :: text [ ]
      ) :: text AS "job_info → state",
      "public"."job_data_json"."job_info" AS "job_info"
    FROM
      "public"."job_data_json"
  ) AS "source"
GROUP BY
  "job_info → state"
ORDER BY
  "job_info → state" ASC

Expected behavior

The expectation is for the aggregation in the model to behavior similar to the question

Logs

No response

Information about your Metabase installation

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "21+35-LTS",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "21",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "21+35-LTS",
    "os.name": "Mac OS X",
    "os.version": "13.5.1",
    "user.language": "en",
    "user.timezone": "America/Chicago"
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "mongo",
      "presto-jdbc",
      "athena",
      "h2"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "15.3"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.5.4"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2023-10-11",
      "tag": "v0.47.4",
      "branch": "?",
      "hash": "c96dc65"
    },
    "settings": {
      "report-timezone": "UTC"
    }
  }
}

Severity

P2 - It's a block for those bringing in JSON with the intention to build models based on that data

Additional context

No response

The text was updated successfully, but these errors were encountered:

kulyk · 2023-11-08T12:08:08Z

Can still reproduce on master. Steps:

Connect a PSQL database with a JSON column
New > Model > Notebook editor > Raw Data > PSQL > Table with JSON column
Add Count aggregation
Breakout by an unfolded JSON column
Save the model
Reload the page
The question errors

ERROR: column source.jobInfo does not exist
  Position: 125

bshepherdson · 2024-02-14T22:19:01Z

I can reproduce this. The trouble seems to be that the nfc_path for the column is set for the field proper, but not present on the result_metadata for a model. This is the field that indicates JSON unfolding; see metabase.driver.postgres calling lib.field/json-field?.

In the legacy QP the `:fields` of the outer query has the ID, `[:field 100 {}]` so the JSON unfolding is written out again. With this change, the `source-alias` is used for columns coming from previous stages. Fixes #34930. Fixes #35636.

…9446) In the legacy QP the `:fields` of the outer query has the ID, `[:field 100 {}]` so the JSON unfolding is written out again. With this change, the `source-alias` is used for columns coming from previous stages. Fixes #34930. Fixes #35636.

…9446) (#39536) In the legacy QP the `:fields` of the outer query has the ID, `[:field 100 {}]` so the JSON unfolding is written out again. With this change, the `source-alias` is used for columns coming from previous stages. Fixes #34930. Fixes #35636. Co-authored-by: Braden Shepherdson <braden@metabase.com>

…9446) In the legacy QP the `:fields` of the outer query has the ID, `[:field 100 {}]` so the JSON unfolding is written out again. With this change, the `source-alias` is used for columns coming from previous stages. Fixes #34930. Fixes #35636.

…stages (#39446)" (#39642) In the legacy QP the `:fields` of the outer query has the ID, `[:field 100 {}]` so the JSON unfolding is written out again. With this change, the `source-alias` is used for columns coming from previous stages. Fixes #34930. Fixes #35636.

FilmonK added Type:Bug Product defects .Needs Triage Priority:P2 Average run of the mill bug Querying/Models aka Datasets labels Oct 20, 2023

FilmonK mentioned this issue Oct 20, 2023

Include JSON data within Sample Database #34931

Open

ranquild added .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2 .Backend labels Oct 31, 2023

kulyk self-assigned this Nov 8, 2023

kulyk removed their assignment Nov 8, 2023

darksciencebase added .Team/QueryProcessor :hammer_and_wrench: and removed .Needs Triage labels Nov 21, 2023

ranquild added Querying/Processor and removed Querying/Models aka Datasets labels Jan 15, 2024

bshepherdson added the Difficulty:Medium label Feb 14, 2024

ignacio-mb added Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Escalation and removed Priority:P2 Average run of the mill bug labels Feb 29, 2024

bshepherdson self-assigned this Feb 29, 2024

This was referenced Feb 29, 2024

Distribution on cached models that were originally JSONB does not work #35636

Closed

Query processor on cached models with JSON makes odd choices #35637

Open

bshepherdson mentioned this issue Mar 1, 2024

[QP] Use the alias for unfolded JSON columns from previous stages #39446

Merged

bshepherdson closed this as completed in #39446 Mar 4, 2024

bshepherdson mentioned this issue Mar 5, 2024

backport "[QP] Use the alias for unfolded JSON columns from previous stages (#39446)" to 48 #39642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

FilmonK commented Oct 20, 2023

kulyk commented Nov 8, 2023

bshepherdson commented Feb 14, 2024

Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

Comments

FilmonK commented Oct 20, 2023

Describe the bug

To Reproduce

Expected behavior

Logs

Information about your Metabase installation

Severity

Additional context

kulyk commented Nov 8, 2023

bshepherdson commented Feb 14, 2024