Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

Closed
FilmonK opened this issue Oct 20, 2023 · 2 comments · Fixed by #39446
Closed

Errors due to Models with Unfolded JSON Columns and a GROUP BY Clause #34930

FilmonK opened this issue Oct 20, 2023 · 2 comments · Fixed by #39446
Assignees
Labels
.Backend Difficulty:Medium .Escalation Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness Querying/Processor .Team/QueryProcessor :hammer_and_wrench: Type:Bug Product defects .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2

Comments

@FilmonK
Copy link

FilmonK commented Oct 20, 2023

Describe the bug

Issues using the Group By clause in conjunction with aggregate functions for JSON columns that have been unfolded.

To Reproduce

Using a postgres table with JSON unfolded.

Screenshot 2023-10-20 at 4 26 26 PM

Save it as a question and Summarize with a Group By clause

Screenshot 2023-10-20 at 4 30 52 PM

This works as expected

Screenshot 2023-10-20 at 4 50 57 PM

The SQL that is generated

SELECT
  (
    "public"."job_data_json"."job_info" # >> array [ 'state' ] :: text [ ]
  ) :: text AS "job_info → state",
  COUNT(*) AS "count"
FROM
  "public"."job_data_json"
GROUP BY
  "job_info → state"
ORDER BY
  "job_info → state" ASC

You are also able to write a native query against the table to get the same results as above.

SELECT
    job_info->>'state' AS state,
    COUNT(*) AS state_count
FROM job_data_json
GROUP BY(job_info->>'state')
ORDER BY 1


Convert that same question into a model and create the same aggregation as above.
Screenshot 2023-10-20 at 4 36 48 PM



The SQL that is generated

SELECT
  ("source"."job_info" # >> array [ 'state' ] :: text [ ]) :: text AS "job_info → state",
  COUNT(*) AS "count"
FROM
  (
    SELECT
      "public"."job_data_json"."id" AS "id",
      (
        "public"."job_data_json"."job_info" # >> array [ 'city' ] :: text [ ]
      ) :: text AS "job_info → city",
      (
        "public"."job_data_json"."job_info" # >> array [ 'salary' ] :: text [ ]
      ) :: double precision AS "job_info → salary",
      (
        "public"."job_data_json"."job_info" # >> array [ 'state' ] :: text [ ]
      ) :: text AS "job_info → state",
      "public"."job_data_json"."job_info" AS "job_info"
    FROM
      "public"."job_data_json"
  ) AS "source"
GROUP BY
  "job_info → state"
ORDER BY
  "job_info → state" ASC


Expected behavior

The expectation is for the aggregation in the model to behavior similar to the question

Logs

No response

Information about your Metabase installation

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "21+35-LTS",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "21",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "21+35-LTS",
    "os.name": "Mac OS X",
    "os.version": "13.5.1",
    "user.language": "en",
    "user.timezone": "America/Chicago"
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "mongo",
      "presto-jdbc",
      "athena",
      "h2"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "15.3"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.5.4"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2023-10-11",
      "tag": "v0.47.4",
      "branch": "?",
      "hash": "c96dc65"
    },
    "settings": {
      "report-timezone": "UTC"
    }
  }
}

Severity

P2 - It's a block for those bringing in JSON with the intention to build models based on that data

Additional context

No response

@FilmonK FilmonK added Type:Bug Product defects .Needs Triage Priority:P2 Average run of the mill bug Querying/Models aka Datasets labels Oct 20, 2023
@ranquild ranquild added .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2 .Backend labels Oct 31, 2023
@kulyk kulyk self-assigned this Nov 8, 2023
@kulyk
Copy link
Member

kulyk commented Nov 8, 2023

Can still reproduce on master. Steps:

  1. Connect a PSQL database with a JSON column
  2. New > Model > Notebook editor > Raw Data > PSQL > Table with JSON column
  3. Add Count aggregation
  4. Breakout by an unfolded JSON column
  5. Save the model
  6. Reload the page
  7. The question errors
ERROR: column source.jobInfo does not exist
  Position: 125

@kulyk kulyk removed their assignment Nov 8, 2023
@darksciencebase darksciencebase added .Team/QueryProcessor :hammer_and_wrench: and removed .Needs Triage labels Nov 21, 2023
@ranquild ranquild added Querying/Processor and removed Querying/Models aka Datasets labels Jan 15, 2024
@bshepherdson
Copy link
Contributor

I can reproduce this. The trouble seems to be that the nfc_path for the column is set for the field proper, but not present on the result_metadata for a model. This is the field that indicates JSON unfolding; see metabase.driver.postgres calling lib.field/json-field?.

@ignacio-mb ignacio-mb added Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Escalation and removed Priority:P2 Average run of the mill bug labels Feb 29, 2024
@bshepherdson bshepherdson self-assigned this Feb 29, 2024
bshepherdson added a commit that referenced this issue Mar 1, 2024
In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
bshepherdson added a commit that referenced this issue Mar 1, 2024
In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
bshepherdson added a commit that referenced this issue Mar 4, 2024
In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
bshepherdson added a commit that referenced this issue Mar 4, 2024
…9446)

In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
github-actions bot pushed a commit that referenced this issue Mar 4, 2024
…9446)

In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
bshepherdson added a commit that referenced this issue Mar 4, 2024
…9446)

In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
metabase-bot bot added a commit that referenced this issue Mar 4, 2024
…9446) (#39536)

In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.

Co-authored-by: Braden Shepherdson <braden@metabase.com>
bshepherdson added a commit that referenced this issue Mar 5, 2024
…9446)

In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
bshepherdson added a commit that referenced this issue Mar 5, 2024
…stages (#39446)" (#39642)

In the legacy QP the `:fields` of the outer query has the ID,
`[:field 100 {}]` so the JSON unfolding is written out again.

With this change, the `source-alias` is used for columns coming from
previous stages.

Fixes #34930. Fixes #35636.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.Backend Difficulty:Medium .Escalation Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness Querying/Processor .Team/QueryProcessor :hammer_and_wrench: Type:Bug Product defects .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants