Skip to content

tickets/SP-2740: add commissioning tutorial for lsstcam visits database#94

Merged
MelissaGraham merged 6 commits into
mainfrom
tickets/SP-2740
Nov 19, 2025
Merged

tickets/SP-2740: add commissioning tutorial for lsstcam visits database#94
MelissaGraham merged 6 commits into
mainfrom
tickets/SP-2740

Conversation

@MelissaGraham
Copy link
Copy Markdown
Contributor

No description provided.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rhiannonlynne
Copy link
Copy Markdown
Member

There are probably better ways to make these comments but I'll start here ..

# for col in df.columns:
#     print(col)

perhaps

list(df.columns)

?

@rhiannonlynne
Copy link
Copy Markdown
Member

"The database generally follows the current LSST scheduler output schema, but additional columns were added in post-processing. "

sort of ..
I think maybe I would say
"The database generally follows the current LSST scheduler output schema, but additional columns have been added where they are provided in ConsDb cdb_lsstcam.visit1 or cdb_lsstcam.visit1_quicklook tables which do not translate to the scheduler format."

(this was the best way I could describe - we translated columns from consdb to opsim format where applicable, added columns where computationally possible and useful, and kept all of the consdb output where it didn't translate).

@rhiannonlynne
Copy link
Copy Markdown
Member

Why did you pick seeingFwhmGeom to use, instead of seeingFwhmEff?
I ask because we only get one value from the consdb - psf_sigma.
We have found this translates more like seeingFwhmEff than anything else ...
the seeingFwhmGeom is added to the table based on translating seeingFwhmEff as usual but I'm starting to think it may not be a super relevant value to keep.
The DM outputs don't really seem to have the distinction between seeing effective and seeing geom.

@rhiannonlynne
Copy link
Copy Markdown
Member

One thing - you mention that it's possible to load up all of the values because there are only ~20k visits .. but I think this should be fine even with the total number of visits in the whole survey (2M). It's not that many.

@rhiannonlynne
Copy link
Copy Markdown
Member

I find it a bit confusing to describe something as "not WFD LSST science validation" ..
mostly because the DDF visits are LSST Science Validation, just not in the 'wide' area ..
so technically it's correct, but I feel it will lead to people thinking the DDFs are not part of the SV survey.
(especially with the variable name of 'not_SV'?)

@rhiannonlynne
Copy link
Copy Markdown
Member

rhiannonlynne commented Nov 13, 2025

Did you have any problems with

Ndays = int(np.floor(np.max(df['exp_midpt_mjd']))
            - np.floor(np.min(df['exp_midpt_mjd'])))

FWIW, doing
np.floor(MJD-0.5)
will give you an integer which does map properly to the dayobs/night rollover.
The method above probably works (especially when considering all of SV), except that I think it would not if the first visit was quite late at night (or vice versa?).
The MJD rollover is during the night, most of the time. The "x.5" value of the MJD rolls over at noon, at the same time as the 'dayobs' rollover. Somewhat interestingly, this is sort of equivalent to using "JD" instead of "MJD". (except for the 24,000 part).

Another way to make the stacked histogram of dates, and have x markers which look more like dates would be something like

t = Time("2025-04-17T12:00:00", scale='tai')
mjd_to_jd = t.mjd - t.jd
svisits.loc[:, 'jd'] = np.floor(svisits.obs_start_mjd - mjd_to_jd)
svisits.loc[:, 'jd'] = svisits.jd.astype(int)
jds = np.arange(svisits.jd.min(), svisits.jd.max()+1, 1)
jdsbins =  np.arange(svisits.jd.min(), svisits.jd.max()+2, 1)
days = [t.split('T')[0] for t in Time(jds, format='jd', scale='tai').isot]
bar_bottom = np.zeros(len(jds))
plt.figure(figsize=(10, 6))
for b in 'ugrizy':
    heights, _ = np.histogram(svisits.query("band == @b ").jd, bins=jdsbins)
    plt.bar(jds, heights, bottom=bar_bottom, width=1, color=band_colors[b], alpha=0.8, label=b)
    bar_bottom += heights
plt.legend()
_ = plt.xticks(jds[::7], labels=days[::7], rotation=90)
plt.grid(alpha=0.2)
plt.ylabel("Number of visits", fontsize='large')
plt.title("LSSTCam Science Visits")

(maybe just take the 'days' part to use for your marker labels ..)

@MelissaGraham
Copy link
Copy Markdown
Contributor Author

MelissaGraham commented Nov 14, 2025

Thanks Lynne, here's a summary of what got changed in the last commit.

Switched to lst(df.columns).

Left it at "The database generally follows the current LSST scheduler output schema, but additional columns were added in post-processing." That wording was copied from the "Summary 20250930" page. Plus, users of this tutorial won't know what "ConsDb cdb_lsstcam.visit1 or cdb_lsstcam.visit1_quicklook tables which do not translate to the scheduler format" means.

I picked seeingFwhmGeom over seeingFwhmEff based on their descriptions in the rubin scheduler schema. The former is defined as the FWHM of a measured PSF, where as the latter is the equivalent PSF for a point source -- I figured the actual measured PSF was the more natural choice, not a calculated equivalent. But I've switched to using seeingFwhmEff throughout.

For loading the whole file as a dataframe, I took out the "there are only 21647 visits in the database" part because the point is more the size, only 81M.

Re. "I find it a bit confusing to describe something as "not WFD LSST science validation" ... mostly because the DDF visits are LSST Science Validation" --> very true, it was confusing. This came about because in the Summary 20250930 page, the number of visits is quoted as "and 13240 for the primary wide SV area" (i.e., without DDFs). But I've updated the text in the notebook throughout to specify "primary wide SV area (without DDFs)", and to also refer more specifically to the two components of the Science Validation surveys as SV WFD and SV DDF, so that it is clear DDF is part of SV.

I know that MJD is not the same as an actual night, I just didn't bother and was taking the easy route of using MJD... but I switched to your code for that plot, thank you!

I still need to address the comments from Slack, though.

@rhiannonlynne
Copy link
Copy Markdown
Member

rhiannonlynne commented Nov 17, 2025

What you have here totally works to identify ToOs, but in the future we're trying to make the observation_reason identify what subset of ToO it is ..
Would it be possible to future-proof this:

Ntoo = len(df.query('observation_reason == "too"'))

by making it

Ntoo = len(df.query("observation_reason.str.contains('too')")

(also note I try to make the string itself be single quote while the whole query restriction is double quote .. this aligns better with the SQL equivalent where you need to write this as

"where observation_reason like '%too%'"

because the strings you're matching against are string literals and can otherwise be inferred to be column names.
I'm not sure how much pandas actually cares, but I figure it's better for making it like sql so that there is less confusion overall.

@MelissaGraham
Copy link
Copy Markdown
Contributor Author

Absolutely, thanks @rhiannonlynne. I changed it to Ntoo = len(df.query("observation_reason.str.contains('too')") (and similar where a similar query was done).

@rhiannonlynne
Copy link
Copy Markdown
Member

Looks great to me!

@MelissaGraham MelissaGraham merged commit 165b869 into main Nov 19, 2025
2 checks passed
@MelissaGraham MelissaGraham deleted the tickets/SP-2740 branch November 19, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants