Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump not applying transformation with no error emitted #176

Closed
wnatrifork opened this issue Aug 24, 2024 · 13 comments · Fixed by #177 or #178
Closed

Dump not applying transformation with no error emitted #176

wnatrifork opened this issue Aug 24, 2024 · 13 comments · Fixed by #177 or #178
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@wnatrifork
Copy link

It seems possible to encounter a case when despite a transformation is defined it is not applied and no error is emitted. Shouldn't user be warned if a transformer is not being applied?

greenmask_config.yml file:

dump:
  pg_dump_options:
    dbname: host=localhost user=wna dbname=bireporting
    load-via-partition-root: true
storage:
  type: directory
  directory:
    path: /Users/wna/Developer/
  transformation:
    - schema: public
      name: backlog_entry_value
      transformers:
        - name: RandomFloat
          params:
            column: "value"
            min: 0.0
            max: 10.0
            precision: 2
./greenmask dump --config=greenmask_config.yml
./greenmask list-dumps --config=greenmask_config.yml
+---------------+---------------------------+-------------+----------+-----------------+----------+-------------+-------------------+
|      ID       |           DATE            |  DATABASE   |   SIZE   | COMPRESSED SIZE | DURATION | TRANSFORMED |      STATUS       |
+---------------+---------------------------+-------------+----------+-----------------+----------+-------------+-------------------+
| 1724488905081 | 2024-08-24T10:41:45+02:00 | bireporting | 26.5 MiB | 8.5 MiB         | 00:00:00 | false       | done              |
+---------------+---------------------------+-------------+----------+-----------------+----------+-------------+-------------------+

A part of metadata.json file. Notice transformers is null.

{
    "startedAt": "2024-08-24T10:41:45.081384+02:00",
    "completedAt": "2024-08-24T10:41:45.816925+02:00",
    "originalSize": 27828141,
    "compressedSize": 8912949,
    "transformers": null,
    "database_schema":
    [
        {
            "schema": "public",
            "name": "backlog_entry_value",
            "oid": 16409,
            "columns":
            [
                {
                    "idx": 2,
                    "name": "value",
                    "type_name": "numeric(10,2)",
                    "canonical_type_name": "numeric",
                    "type_oid": 1700,
                    "num": 3,
                    "not_null": false,
                    "length": 655366,
                    "type_length": -1,
                    "is_generated": false,
                    "overridden_type_name": "",
                    "overridden_type_oid": 0,
                    "overridden_type_size": 0
                },
    [...]

I was not able to find the root cause of the issue in reasonable time.

@wwoytenko wwoytenko self-assigned this Aug 24, 2024
@wwoytenko wwoytenko added the help wanted Extra attention is needed label Aug 24, 2024
@wwoytenko
Copy link
Contributor

Hi! Thank you for reporting. You've set the transformation in the wrong object (in storage) instead of dump. It should be this way

dump:
  pg_dump_options:
    dbname: host=localhost user=wna dbname=bireporting
    load-via-partition-root: true
  transformation:
    - schema: public
      name: backlog_entry_value
      transformers:
        - name: RandomFloat
          params:
            column: "value"
            min: 0.0
            max: 10.0
            precision: 2
storage:
  type: directory
  directory:
    path: /Users/wna/Developer/

Try it

@wwoytenko
Copy link
Contributor

However, I agree that it would be helpful to report this issue. I'll consider how to fix it.

@wnatrifork
Copy link
Author

Thank you! Such a silly mistake.

Does this mean numeric(precision,scale) type is not supported?

2024-08-24T11:20:11+02:00 ERR ValidationWarning={"hash":"","meta":{"AllowedTypes":["float4","float8"],"ColumnName":"value","ParameterName":"column","SchemaName":"public","TableName":"backlog_entry_value","TransformerName":"RandomFloat","TypeName":"numeric(10,2)"},"msg":"unsupported column type","severity":"error"}
2024-08-24T11:20:11+02:00 FTL cannot make a backup error="context error: fatal validation error"

@wwoytenko
Copy link
Contributor

Yes, it does but only the v0.2b1 version or the current development version in the main. For v0.2b1 version and current main you can use RandomNumeric transformer that has wider configuration options such as decimal.

For the latest version 1.14 you can try to override the type of the table in the config using the next

transformation:
    - schema: public
      name: backlog_entry_value
      columns_type_override:
        value: "numeric" 
      transformers:
        - name: RandomFloat
          params:
            column: "value"
            min: 0.0
            max: 10.0
            precision: 2

The details are here

@wwoytenko
Copy link
Contributor

@wnatrifork FYI ^

@wwoytenko
Copy link
Contributor

I've added a fix in MR #177. So in a new release if the unknown option was provided it will be reported.

@wnatrifork
Copy link
Author

Thank you. I using a compiled version of the tool from main branch:

wna@Wojciechs-MBP: greenmask % ./greenmask --version
greenmask version a2ecc2cdae5af10e9443ccdab61987f31bd3ade6 2024-08-20T11:49:00Z

I have used random numeric transformer as advised:

  transformation:
    - schema: public
      name: backlog_entry_value
      transformers:
        - name: RandomNumeric
          params:
            column: "value"
            min: 0.0
            max: 10.0
            decimal: 2

and received following error when trying to restore:

2024-08-24T12:15:57+02:00 WRN unable to restore table error="error post streaming handling: error from postgres connection: numeric field overflow A field with precision 10, scale 2 must round to an absolute value less than 10^8. (code 22003)"

I converted the dump from directory format to human readable SQL and found:

--
-- Data for Name: "backlog_entry_value"; Type: TABLE DATA; Schema: "public"; Owner: "wna"
--

COPY "public"."backlog_entry_value" ("unique_id", "created", "value", "author_name", "author_email", "backlog_entry_id") FROM stdin
dc673f92-404d-4873-ba36-779da5947748	2020-05-25 08:22:27.179153	-55142080548242225025.74	<redacted_username>	<redacted_email>	33cd6668-6dd7-4059-95a1-e5ae03027c04

value is not kept in <min, max> bounds. This seems to be a bug.

@wwoytenko
Copy link
Contributor

@wnatrifork The MR #177 is merged. I hope this will fix this implicit issue

@wwoytenko wwoytenko reopened this Aug 24, 2024
@wwoytenko
Copy link
Contributor

Thank you for reporting. I will check shortly

@wwoytenko
Copy link
Contributor

@wnatrifork
That's definitely a bug. It will be fixed today/tomorrow.

@wwoytenko wwoytenko added the bug Something isn't working label Aug 24, 2024
@wwoytenko
Copy link
Contributor

@wnatrifork MR #178 should close your issue

@wwoytenko
Copy link
Contributor

@wnatrifork try now and report please

@wnatrifork
Copy link
Author

I confirm RandomNumeric transformer keeps value within defined bounds. Thank you

wwoytenko added a commit that referenced this issue Oct 13, 2024
Now if user provides the unknown key the error will be thrown

Closes #176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
2 participants