Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30321: Validate the schema in ci_hsc_gen3 #9

Merged
merged 2 commits into from May 29, 2021
Merged

Conversation

hsinfang
Copy link
Collaborator

No description provided.

@hsinfang hsinfang force-pushed the tickets/DM-30321 branch 2 times, most recently from 37a158b to b2dec42 Compare May 25, 2021 18:04
Copy link
Contributor

@yalsayyad yalsayyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday Fred added 5 columns to the hsc Object table, that you'll need to copy into the hsc_gen2.yml now.

yml/hsc.yaml Outdated
description: patch ID
mysql:datatype: TEXT
mysql:datatype: BIGINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PatchIds, tractIds, and detectorIds are small human manageable integers (unlike sourceIds)
4 byte int, INTEGER is prob sufficient for all 3 unless you know something I don't

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think INTEGER should be sufficient for all 3, but I see these columns have the type of BIGINT in the output parquet files.

Since we aren't checking the data types in CI and for now we'd just force the types before the database ingest, we may also set this to be what we want (INTEGER) rather than what the parquet files have (BIGINT). Do you have a preference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing those to INTEGER despite of their types in the output parquet files

Keep a Gen2 copy for ci_hsc_gen2 for the time being
Currently, these columns are stored as int64 in the parquet outputs.
But in sdm_schemas we use what we want them to me, and downstream
codes can type cast.
@hsinfang hsinfang merged commit 9566c5a into master May 29, 2021
@hsinfang hsinfang deleted the tickets/DM-30321 branch May 29, 2021 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants