Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove addl properties #25

Merged
merged 6 commits into from
Jun 7, 2019
Merged

Remove addl properties #25

merged 6 commits into from
Jun 7, 2019

Conversation

fbertsch
Copy link
Contributor

@fbertsch fbertsch commented Jun 3, 2019

Fixes #22

@fbertsch fbertsch requested a review from acmiyaguchi June 5, 2019 17:51
Copy link
Contributor

@acmiyaguchi acmiyaguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a concrete example of what was happening in #22? I'm a bit unclear on what the behavior of filling in the probe section is before and after this patch.

mozilla_schema_generator/schema.py Outdated Show resolved Hide resolved
tests/test_schema.py Show resolved Hide resolved
mozilla_schema_generator/generic_ping.py Outdated Show resolved Hide resolved
@fbertsch
Copy link
Contributor Author

fbertsch commented Jun 5, 2019

@acmiyaguchi generally the problem was that previously, if were filling this in:

{
    "metrics": {
        "type": "object",
        "additionalProperties": { "type": "integer" }
    }
}

then we would end up with (for e.g. 1 probe):

{
    "metrics": {
        "type": "object",
        "properties": {
            "probe-1": { "type": "integer" },
        }
        "additionalProperties": { "type": "integer" }
    }
}

And the transpiler would do the correct thing, and ignore the additionalProperties field, giving precedence to the struct type.

However, if there were no fill-in probes, we would end up with something like:

{
    "metrics": {
        "type": "object",
        "additionalProperties": { "type": "integer" }
    }
}

Which the transpiler would see as the pseudo-map type, resulting in a schema element of STRUCT<key_value ARRAY<STRUCT<key STRING, value INT64>>> (or similar). However, when we eventually did have probes to fill-in, the type changes and is not schema evolution compatible.

With this change, the result of no probes to fill-in would be:

{
    "metrics": {
        "type": "object",
        "additionalProperties": False
    }
}

...which, upon inspection, still returns this from the transpiler:

[
  {
    "mode": "NULLABLE",
    "name": "metrics",
    "type": "STRING"
  }
]

Hmm, it seems to me the transpiler should output an empty row here, do you agree? We'd need to check that that is acceptable for BQ.

Previously, some sections would have an 'additionalProperties'
section that remained after fill-in; this was because they
didn't have any probes that filled the sections in.

Instead of keeping that additionalProperties, we now remove
it. This way the resulting schema is what we hope - an empty
struct.
* update test with schema key deletion
* add explanation for propogate
@acmiyaguchi
Copy link
Contributor

The type is a string because an empty object is not a well defined structure. There needs to be at least one field, even if it's just a dummy/placeholder value. I think instead, you should consider using the propogate=True method to drop the object entirely from the schema. We should also consider adding this as an option to the schema transpiler, since it might be desirable to omit rather than cast in this situation.

@fbertsch
Copy link
Contributor Author

fbertsch commented Jun 5, 2019

@acmiyaguchi that seems reasonable as well. I think an empty struct type would be most preferable (since it would match the expected schema of the ping) but deleting it would be second-best.

@acmiyaguchi
Copy link
Contributor

Looking at the BigQuery docs again, it looks like it is possible to have a empty struct.

I also think that we might want to set additionalProperties to true instead of false, otherwise validation will definitely fail.

@fbertsch
Copy link
Contributor Author

fbertsch commented Jun 5, 2019

I won't be using these for validation, currently we'll be using the generic glean schema for validation and the strict one for the BQ tables. We can update this later if that proves too lenient.

@fbertsch
Copy link
Contributor Author

fbertsch commented Jun 6, 2019

Given that BQ does not support empty structs, we'll have to remove these from the schema instead.

Previously we set additionalProperties to false. This leads
to structs with no elements, which are not allowed in BQ
tables.

This commit deletes those, and propagates that deletion to
parents. As such, sometimes entire metrics sections are
left out of ping schemas.

* Add documentation for propagate
@fbertsch
Copy link
Contributor Author

fbertsch commented Jun 6, 2019

@acmiyaguchi this now deletes additionalProperties and propagates that deletion. RFAL

tests/test_schema.py Outdated Show resolved Hide resolved
Copy link
Contributor

@acmiyaguchi acmiyaguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@fbertsch fbertsch merged commit 570e3b3 into master Jun 7, 2019
@fbertsch fbertsch deleted the remove_addl_properties branch June 7, 2019 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generated schemas should drop additionalProperties from filled-in sections
2 participants