Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30624: Document gen2 to gen3 refcat ingestion #260

Merged
merged 1 commit into from Oct 23, 2021

Conversation

leeskelvin
Copy link
Contributor

No description provided.

import glob
import astropy.table

refcat_dirs = [gaiadr2, panstarrsps1]
Copy link
Contributor

@parejkoj parejkoj Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be strings, and are not the names we have for these on lsst-devl (see /datasets/refcats/htm/v1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're meant to be objects, that the user has already assigned elsewhere. I hoped the intro text to this code block explained that. As this code is intended to be used on other machines, where the refcats may be saved in a non-standard location, I thought this was the safest bet.

If it would help, I could add some dummy lines above in the script along the lines of:

gaiadr2 = "/path/to/my/gaia/dir"
panstarrsps2 = "/path/to/my/panstarrs/dir"

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments. In particular, I think we want this as a separate section, as the note really breaks the text flow.

Have you run the script to test it?

doc/lsst.meas.algorithms/creating-a-reference-catalog.rst Outdated Show resolved Hide resolved
doc/lsst.meas.algorithms/creating-a-reference-catalog.rst Outdated Show resolved Hide resolved
doc/lsst.meas.algorithms/creating-a-reference-catalog.rst Outdated Show resolved Hide resolved
doc/lsst.meas.algorithms/creating-a-reference-catalog.rst Outdated Show resolved Hide resolved
doc/lsst.meas.algorithms/creating-a-reference-catalog.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the first note in the document (just before section 1) to point to the new section.

Several more rephrasing suggestions.


.. _lsst.meas.algorithms-refcat-ingest:

4. Ingest the files into the butler
===================================

When ``convertReferenceCatalog`` has finished, it will print the two commands you need to run to register the new refcat dataset type and ingest your converted output into it.
When ``convertReferenceCatalog`` has finished, a new directory (named ``gaia-refcat/`` in the example above) will now exist containing all reference catalogs in the LSST format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... containing the HTM-indexed files for the input catalog in the LSST format."


.. _lsst.meas.algorithms-refcat-existing:

5. What to do with existing reference catalogs?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I suggested it, but maybe: "Ingesting pre-existing gen2 reference catalogs"

5. What to do with existing reference catalogs?
===============================================

Already existing reference catalogs (for example, the PS1 or Gaia DR2 catalogs that were used in gen2 butlers) can also be ingested into the butler, negating the need to convert them to the LSST format yourself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "... can be directly ingested into a gen3 repo as they are already in the LSST format."

===============================================

Already existing reference catalogs (for example, the PS1 or Gaia DR2 catalogs that were used in gen2 butlers) can also be ingested into the butler, negating the need to convert them to the LSST format yourself.
To ingest an already existing converted reference catalog, first create a suitable filename to htm7-index astropy lookup table, and then follow the steps above to `ingest the files into the butler <lsst.meas.algorithms-refcat-ingest>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link at the end of this sentence doesn't work in the generated output. I think you need a :ref: in front.

Already existing reference catalogs (for example, the PS1 or Gaia DR2 catalogs that were used in gen2 butlers) can also be ingested into the butler, negating the need to convert them to the LSST format yourself.
To ingest an already existing converted reference catalog, first create a suitable filename to htm7-index astropy lookup table, and then follow the steps above to `ingest the files into the butler <lsst.meas.algorithms-refcat-ingest>`.

This is an example script that constructs a conversion lookup table for all converted FITS files in the `refcat_dir` directory, outputting to the `out_dir` directory:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that constructs..." -> that creates an ``.ecsv`` file for the ``butler ingest-files`` command, from all of the HTM indexed files in a given directory. We use the existing Gaia DR2 catalog on lsst-devl in this example:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example script that creates an .ecsv lookup table for the butler ingest-files command, from all of the HTM indexed files in a given directory (refcat_dir here). We use the existing Gaia DR2 catalog on lsst-devl in this example:


table.write(out_file)

Once this script is complete, finalize the reference catalog ingestion by following the `file ingestion instructions above <lsst.meas.algorithms-refcat-ingest>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"finalize"->"finish", maybe?

And the final link doesn't work in the output HTML. I think you need a :ref: before these.

https://developer.lsst.io/restructuredtext/style.html#linking

import glob
import astropy.table

refcat_dir = "/path/to/my/refcat/directory"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my comment above, I suggested using /datasets/refcats/htm/v1/gaia_dr2_20200414 here, to make it clear what exactly the this path should look like for an existing refcat. The rest of this example refers to lsst-devl (e.g. in the GaiaSource path, and the number of cores to use in the convert), so I don't think it hurts to do the same here.

This is an example script that constructs a conversion lookup table for all converted FITS files in the `refcat_dir` directory, outputting to the `out_dir` directory:

.. code-block:: python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put a docstring at the top:

"""Generate an astropy-readable .ecsv files for `butler ingest-files`, to ingest an existing gen2 refcat.
"""

out_dir = "/path/to/my/output/directory"

out_file = f"{out_dir}/{os.path.basename(refcat_dir)}.ecsv"
print(f"Saving to: {out_file}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I'd suggest putting this after the table.write as a "Wrote output to:" message, to signify that it's done.

@leeskelvin leeskelvin force-pushed the tickets/DM-30624 branch 2 times, most recently from b158aa6 to ae0c667 Compare October 21, 2021 20:42
@leeskelvin
Copy link
Contributor Author

Thanks John. I think I've addressed everything above. I hope that this now reads well; let me know.

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good. The links work in the built version.

@parejkoj
Copy link
Contributor

parejkoj commented Oct 22, 2021

Additional comment: we should probably make it clear in section 5 exactly what parts of the commands need to change. Maybe add something like this after the current final sentence:

In particular, you need to change the name of the registered dataset type to "gaia_dr2_20200414" (the refcat used in the python code block above), and the filename to the generated .ecsv file:

... prompt: bash
    butler register-dataset-type REPO gaia_dr2 SimpleCatalog htm7
    butler ingest-files -t direct REPO gaia_dr2 refcats gaia/filename_to_htm.ecsv

@leeskelvin
Copy link
Contributor Author

leeskelvin commented Oct 22, 2021

Final section now reads:

Once this script is complete, finish reference catalog ingestion by following the :ref:`file ingestion instructions above <lsst.meas.algorithms-refcat-ingest>`.
In particular, you need to change the name of the registered dataset type to "gaia_dr2_20200414" (the reference catalog used in the Python code block above), and the filename to the generated .ecsv file ("gaia_dr2_20200414.ecsv" in the Python code block above):

... prompt: bash
    butler register-dataset-type REPO gaia_dr2_20200414 SimpleCatalog htm7
    butler ingest-files -t direct REPO gaia_dr2_20200414 refcats gaia_dr2_20200414.ecsv

@leeskelvin leeskelvin merged commit c98d3c8 into master Oct 23, 2021
@leeskelvin leeskelvin deleted the tickets/DM-30624 branch October 23, 2021 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants