Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Added JSON converter in KlioBigQueryEventOutput #165

Merged
merged 1 commit into from
Feb 11, 2021

Conversation

gfalcone
Copy link
Contributor

@gfalcone gfalcone commented Feb 10, 2021

Description :
When trying to output my pipeline to BigQuery, my klio job fails because it does not parse the BQ schema in the YAML file into a valid dictionary :

(klio) DMFR0900:data-thumbnails-classification p.genissel$ klio job run --direct-runner --template execution_date="2021-01-10 00:00:00"
INFO:root:Found worker image: gcr.io/dailymotion-rawlogs/data-thumbnails-classification-worker:4c5784c5-dirty
Traceback (most recent call last):
  File "/usr/local/bin/klioexec", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/klio_core/utils.py", line 240, in wrapper
    func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/klio_exec/cli.py", line 80, in run_pipeline
    if _compare_runtime_to_buildtime_config(klio_config) is False:
  File "/usr/local/lib/python3.6/site-packages/klio_exec/cli.py", line 62, in _compare_runtime_to_buildtime_config
    buildtime_config = config.KlioConfig(_get_config(buildtime_config_path))
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/_utils.py", line 172, in init_from_dict
    self.__config_post_init__(config_dict)
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/core.py", line 51, in __config_post_init__
    self.job_config, job_name=self.job_name, version=self.version,
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/_utils.py", line 172, in init_from_dict
    self.__config_post_init__(config_dict)
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/core.py", line 142, in __config_post_init__
    self._parse_io(config_dict)
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/core.py", line 171, in _parse_io
    io.KlioIODirection.OUTPUT,
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/core.py", line 228, in _create_config_objects
    objs.append(subclass.from_dict(config, io_type, io_direction))
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/_io.py", line 168, in from_dict
    return super().from_dict(copy, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/_io.py", line 98, in from_dict
    return cls(*args, **copy, **kwargs)
  File "<attrs generated init klio_core.config._io.KlioBigQueryEventOutput>", line 14, in __init__
  File "/usr/local/lib/python3.6/site-packages/klio_core/config/_io.py", line 456, in check
    has_fields = value.get("fields")
AttributeError: 'str' object has no attribute 'get'

With this PR, I added a converter to the `attr.attrib` method to load the schema variable into a python dictionary when it's a string :)

Testing :

  • I successfully ran my job after using this version

Checklist for PR author(s)

  • [ X ] Format the pull request title like [cli] Fixes bugs in 'klio job fake-cmd'.
  • [ X ] Changes are covered by unit tests (no major decrease in code coverage %) and/or integration tests.
  • [ X ] Document any relevant additions/changes in the appropriate spot in docs/src.
  • [ X ] For any change that affects users, update the package's changelog in docs/src/reference/<package>/changelog.rst.

@CLAassistant
Copy link

CLAassistant commented Feb 10, 2021

CLA assistant check
All committers have signed the CLA.

@gfalcone gfalcone force-pushed the fix_bq_event_output branch from 551faa1 to 7fcc63b Compare February 11, 2021 09:25
@econchick
Copy link
Contributor

Thank you @gfalcone ! Mind giving an example of the klio-job.yaml that causes this?

@econchick
Copy link
Contributor

Ah nevermind @gfalcone - I was able to reproduce this with something like:

# <--snip-->
job_config:
  outputs:
    - type: bq
      project: some-project
      dataset: some-dataset
      table: some-table
      schema: '{"fields": [{"name": "entity_id", "type": "nullable", "mode": "nullable"}, {"value": "entity_id", "type": "nullable", "mode": "nullable"}]}'
# <--snip-->

Also, (for any lurkers), the integration tests do not run when the PR comes from a fork. I was able to run the integration tests locally with this change 👍🏻

Thanks @gfalcone ! I'll make a patch release (0.2.2) soon, should be available on PyPI in a few hours.

@econchick econchick merged commit 9eab9e7 into spotify:master Feb 11, 2021
@gfalcone
Copy link
Contributor Author

Hi @econchick, great, thank you for taking care of this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants