New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-35174: Add descriptions to diaObject columns #68
Conversation
64ceb0c
to
626e065
Compare
@yalsayyad I'm puzzled. It looks like this PR does change the order of the column clauses in the YAML file, but I thought @iagaponenko warned us against that? |
yml/dp02_dc2.yaml
Outdated
description: | ||
- name: diaObjectId | ||
"@id": "#DiaObject.diaObjectId" | ||
description: Decl-coordinate of the position of the object at time radecTai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per RFC-863, can we start referring to these as "Right Ascension" and "Declination" in the descriptions, and not propagating the "decl" naming if we can help it?
Ah, I misunderstood what @iagaponenko was saying. He's saying that he already did ingest steps 1-3 and doesn't want to redo them. I'll put them back in original order the dp02_dc2.yaml. |
Probably I won't redo the Parquet-to-CSV translation and the ingest ... unless a problem will be found in the Parquet files. |
95f46ae
to
65c4216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
65c4216
to
a2b88fd
Compare
and bring diaObjectId, ra, decl, radecTai, and nDiaSources to top of column list.
a2b88fd
to
125411f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the descriptions should all have punctuation at the end? A few do, but most do not.
I struggled with whether we should be more explicit about the "summary statistics of diaSource things" values, stating that they come from a bunch of diaSources. That's fairly obvious if you understand diaSource vs. diaObject, but maybe less so if you're new?
- name: diaObjectId | ||
"@id": "#DiaObject.diaObjectId" | ||
datatype: long | ||
mysql:datatype: BIGINT | ||
description: | ||
description: Unique id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth expanding on this to say "Unique identifier for this object" or something? Maybe it's obvious from context, so the description is mostly redundant with the name?
- name: gPSFluxErrMean | ||
"@id": "#DiaObject.gPSFluxErrMean" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Mean of the diaSource PSF flux errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these things that are "mean of diaSource X": should we expand that to e.g. "Mean of the individual diaSource X" or something? Someone not entirely familiar with diaSource vs. diaObject might need a little more explicitness, maybe?
- name: gPSFluxLinearIntercept | ||
"@id": "#DiaObject.gPSFluxLinearIntercept" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: y-intercept of a linear model fit to diaSource PSF flux vs time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine to not have the band mentioned in these kinds of repeated comments, since it's in the field name, but the band name does show up in the descriptions of e.g. gPSFluxSigma and gPSFluxChi2, so we're not fully consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, the intend was to only reference bands when quoting the name of another column, but I see that in e.g. gPSFluxChi2
, it references gPSFlux
which isn't a column. Will fix that.
I didn't add a blanket "for diaSources detected in z-band visits" for all the columns, though would if it would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know if that blanket statement is really necessary, but I'm not sure.
- name: gPSFluxMAD | ||
"@id": "#DiaObject.gPSFluxMAD" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Median absolute deviation of diaSource PSF flux. Does not include scale factor for comparison to sigma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence is a bit context free here: should we explicitly state that scale factor in the description (1.4-whatever assuming normal errors)?
Should we say "sigma" in that last sentence, or "standard deviation" (the former is the field name to compare to, the latter is the description of that field).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe " Multiply by 1.4826 to compare to a normal standard deviation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's good, instead of the last sentence.
- name: gPSFluxMaxSlope | ||
"@id": "#DiaObject.gPSFluxMaxSlope" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Maximum ratio of time ordered deltaFlux / deltaTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... for each diaSource"?
- name: gPSFluxMean | ||
"@id": "#DiaObject.gPSFluxMean" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Weighted mean of diaSource PSF flux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weighted by diaSource fluxErr, or what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about Mean of individual diaSource PSF fluxes weighted by the inverse square of their fluxErr
- name: gTOTFluxMean | ||
"@id": "#DiaObject.gTOTFluxMean" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Weighted mean of the PSF flux forced photometered at the diaSource position on the calibrated image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"calibrated science image"?
- name: gTOTFluxSigma | ||
"@id": "#DiaObject.gTOTFluxSigma" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Standard deviation of the PSF flux forced photometered at the diaSource position on the calibrated image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Standard deviation of the individual diaSource TOTFluxMean forced photometered measurements"?
- name: pixelId | ||
"@id": "#DiaObject.pixelId" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: HtmIndex20 of ra, decl coordinate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of ra,dec coordinate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if I'm referring to the columns in this table named ra, decl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm torn on this one: Maybe leave it decl, but make sure we note to fix the description in future files?
- name: radecTai | ||
"@id": "#DiaObject.radecTai" | ||
datatype: double | ||
mysql:datatype: DOUBLE | ||
description: | ||
description: Time at which the object was at a position ra/decl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ra/dec, or leave decl
here for now?
Checking on this: can we merge an updated version of this now, with the corrections in my comments above? Otherwise it would be easy for this to get lost. |
Checking on this again: can we get the updated version of this merged? I believe what was merged was before I left my comments above. I'd rather not have these updates get lost. |
@parejkoj I think @yalsayyad is the keeper of these changes, and I believe she is currently on vacation? |
Pinging this one again: can we get my proposed changes merged so they aren't lost? |
Repeat: this is is @yalsayyad's hands... Tagging her as reviewer, and removing myself (oh, I can do neither, apparently, since this PR was technically already "merged"...) |
@yalsayyad : I'll note that I am not attempting to integrate most of the documentation changes I proposed in my comments above into the changes I'm making in DM-37196. I still think my suggestions above should be made to imsim.yaml and hsc.yaml. |
@parejkoj just called my attention to this PR. Can I make sure I understand the situation here? John, you want to make sure that the comments you made above, post-merge, are not lost, but there's no current branch on which those comments have been implemented? (@JeremyMcCormick as well) |
I don't know whether my comments above were ever incorporated into anything that got merged or not. Some of them were done in later work (e.g. the decl->dec changes), but I don't think all? |
and bring the non-band specific columns:
diaObjectId, ra, decl, radecTai, and nDiaSources to top of column list.