Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-19616: Make IngestIndexReferenceObjectsTask multiprocessing capable #168

Merged
merged 6 commits into from Jun 20, 2019

Conversation

parejkoj
Copy link
Contributor

No description provided.

@parejkoj parejkoj force-pushed the tickets/DM-19616 branch 2 times, most recently from 24e728e to 7275c8b Compare June 5, 2019 01:05
Copy link
Contributor

@czwa czwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style issues were the only thing I found.

center = lsst.geom.SpherePoint(data[i][config.ra_name], data[i][config.dec_name], lsst.geom.degrees)
radius = 1*lsst.geom.arcseconds # don't need to search far to get an exact match!
refCat = refObjLoader.loadSkyCircle(center, radius, filterName).refCat
# assert refCat['id'] == data[i][config.id_name], f"input: {data[i]}\n\noutput: {refCat['id']}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assert form sometimes needed, or can it be removed entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left over from an earlier version. Thanks for noticing.

@classmethod
def makeSkyCatalog(cls, outPath, size=1000, idStart=1, seed=123):
"""Make an on-sky catalog, and save it to a text file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unneeded extra line?

extra_col3 = np.array([get_word(num) for num in np.random.randint(11, size=size)])

dtype = np.dtype([('id', float), ('ra_icrs', float), ('dec_icrs', float),
('ra_err', float), ('dec_err', float), ('a', float),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match indent style inside the list of tuples. I'm surprised this didn't complain on build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too am surprised that didn't trigger the linter.

@@ -362,30 +198,24 @@ def testIngest(self):
]
config2.file_reader.delimiter = '|'
# this also tests changing the delimiter
# import os; print(os.getpid()); import ipdb; ipdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary comment line.

counter = 0
# global shared counter to keep track of number of files processed.
fileProgress = 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the style guide (https://developer.lsst.io/python/style.html?highlight=module%20variables#style-guide-py-sci-pi-naming), I think these global variables should be all caps.

Table.read has a fast csv reader that is ~4x faster than genfromtxt. Take
just the numpy array portion of it so it behaves like the old code.

Add a format option to config, to support different styles of text file.
This refcator will make it easier to pull out the parallelizable portions,
specifically `ingestOneFile()`, while separating the assignment of indices
from the rest of the calculations.
Split the parallelized work into a separate module, so that it doesn't deal
with pickling a Butler, or DataRefs.

Add progress indicator to ingest manager.

The previous test code did not check that the ingested catalog actually
contained the same things as the input catalog. I've improved the tests
to do that, and also pulled out some shared code so that I could write
unit tests of parts of IngestIndexedReferenceTask and a separate test
of the multiprocessing portion of the code. That new test is run outside of
pytest, so that it doesn't interact with the pytest multiprocessing.
Centroid fields do not make sense in loadSkyCircle, but they are
needed for loadPixelBox. So only add them to the schema for the latter.
Configs are validated before they are frozen, during Task creation, so
this line was pointless.
@parejkoj parejkoj merged commit a0476e3 into master Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants