New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40582: Add support for serialization/deserialization of Arrow schemas to Parquet. #887
Conversation
a868326
to
539eb78
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #887 +/- ##
==========================================
+ Coverage 87.67% 87.70% +0.02%
==========================================
Files 272 272
Lines 36107 36188 +81
Branches 7552 7572 +20
==========================================
+ Hits 31656 31737 +81
Misses 3270 3270
Partials 1181 1181
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, we can write only ArrowSchema
but read any of the convertible schema types? That's fine with me, especially to minimize scope on this ticket, as long as what happens if someone tries to write one of the other schema types is a sufficiently graceful failure.
if description := astropy_table[name].description: | ||
field_metadata["doc"] = description | ||
if units := astropy_table[name].unit: | ||
field_metadata["units"] = str(units) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the point where we decide whether we want this to be "units" or "unit". I may have put "units" in the drp_tasks PR that spawned this, but if astropy uses "unit" maybe we should, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting question. Right now, our afw Fields use doc
/units
, which is what I thought you were going for here. Astropy tables use description
/unit
, which I'm using as exact convertibles. And right here we're defining the precedent for what should go into an arrow schema. So I think that either we should go with the afw convention or the astropy convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for astropy conventions, then.
I can add support for serializing the other ones on this ticket; that makes sense. Though I don't know exactly why you'd want to with anything but the arrow or perhaps astropy. |
Don't worry about it. I see that those didn't get associated with |
Okay, I'll leave this as just being able to serialize an Arrow schema (as described on the ticket) and in the future we can do others if it seems useful/necessary. |
Checklist
doc/changes