Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymizing error if there is a JSONB column in a table #12

Closed
koptelovav opened this issue Mar 6, 2020 · 2 comments
Closed

Anonymizing error if there is a JSONB column in a table #12

koptelovav opened this issue Mar 6, 2020 · 2 comments

Comments

@koptelovav
Copy link
Contributor

I have a strange error:

pganonymizer.exceptions.BadDataFormat: invalid input syntax for type json
DETAIL:  Token "'" is invalid.
CONTEXT:  JSON data, line 1: {'...
COPY source, line 29, column ui_settings: "{'firstTime': True}"

YAML file:

tables:
  - accounts:
      fields:
        - name:
            provider:
              name: fake.name
        - email:
            provider:
              name: fake.email
        - phone:
            provider:
              name: fake.phone_number
        - title:
            provider:
              name: choice
              values:
                - "Mr"
                - "Mrs"
                - "Dr"
                - "Prof"
                - "Ms"

truncate:
  - django_session

ui_settings column values:
{"firstTime": true, "licenseBannerHasBeenShown": true}
{"firstTime": true}
{}

What am I doing wrong?

@hkage
Copy link
Contributor

hkage commented Mar 6, 2020

Thank you for your feedback.

I actually haven't tested the anonymization on JSON based fields. I need to do some further investigations with a test setup first.

The error itself is thrown when the anonymizer writes the content of a table into a binary CSV stream and then copies the data into a temporary table, using psycopg2's cursor.copy_from method. Maybe the JSON syntax breaks the streamed data.

I will take a look at this.

@hkage
Copy link
Contributor

hkage commented Jan 25, 2021

Hi, I am sorry I didn't had the time to fix this yet. Mainly because we could not use the pganonymizer for our production databases yet. Currently our PostgreSQL versions don't support JSON columns natively, but I will try to get a closer look into that issue.

Otherwise if you have the time and idea to fix it I appreciate a contribution any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants