New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-19616: Make IngestIndexReferenceObjectsTask multiprocessing capable #168
Conversation
24e728e
to
7275c8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style issues were the only thing I found.
center = lsst.geom.SpherePoint(data[i][config.ra_name], data[i][config.dec_name], lsst.geom.degrees) | ||
radius = 1*lsst.geom.arcseconds # don't need to search far to get an exact match! | ||
refCat = refObjLoader.loadSkyCircle(center, radius, filterName).refCat | ||
# assert refCat['id'] == data[i][config.id_name], f"input: {data[i]}\n\noutput: {refCat['id']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assert
form sometimes needed, or can it be removed entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left over from an earlier version. Thanks for noticing.
@classmethod | ||
def makeSkyCatalog(cls, outPath, size=1000, idStart=1, seed=123): | ||
"""Make an on-sky catalog, and save it to a text file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unneeded extra line?
tests/ingestIndexTestBase.py
Outdated
extra_col3 = np.array([get_word(num) for num in np.random.randint(11, size=size)]) | ||
|
||
dtype = np.dtype([('id', float), ('ra_icrs', float), ('dec_icrs', float), | ||
('ra_err', float), ('dec_err', float), ('a', float), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match indent style inside the list of tuples. I'm surprised this didn't complain on build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too am surprised that didn't trigger the linter.
tests/test_htmIndex.py
Outdated
@@ -362,30 +198,24 @@ def testIngest(self): | |||
] | |||
config2.file_reader.delimiter = '|' | |||
# this also tests changing the delimiter | |||
# import os; print(os.getpid()); import ipdb; ipdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary comment line.
counter = 0 | ||
# global shared counter to keep track of number of files processed. | ||
fileProgress = 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the style guide (https://developer.lsst.io/python/style.html?highlight=module%20variables#style-guide-py-sci-pi-naming), I think these global variables should be all caps.
Table.read has a fast csv reader that is ~4x faster than genfromtxt. Take just the numpy array portion of it so it behaves like the old code. Add a format option to config, to support different styles of text file.
This refcator will make it easier to pull out the parallelizable portions, specifically `ingestOneFile()`, while separating the assignment of indices from the rest of the calculations.
Split the parallelized work into a separate module, so that it doesn't deal with pickling a Butler, or DataRefs. Add progress indicator to ingest manager. The previous test code did not check that the ingested catalog actually contained the same things as the input catalog. I've improved the tests to do that, and also pulled out some shared code so that I could write unit tests of parts of IngestIndexedReferenceTask and a separate test of the multiprocessing portion of the code. That new test is run outside of pytest, so that it doesn't interact with the pytest multiprocessing.
Centroid fields do not make sense in loadSkyCircle, but they are needed for loadPixelBox. So only add them to the schema for the latter.
Configs are validated before they are frozen, during Task creation, so this line was pointless.
7275c8b
to
95761a5
Compare
No description provided.