Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38444 Add Sasquatch datastore #824

Merged
merged 1 commit into from May 3, 2023
Merged

DM-38444 Add Sasquatch datastore #824

merged 1 commit into from May 3, 2023

Conversation

natelust
Copy link
Contributor

@natelust natelust commented Apr 19, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Patch coverage: 2.30% and project coverage change: -0.86 ⚠️

Comparison is base (f655029) 87.76% compared to head (b30b1b3) 86.90%.

❗ Current head b30b1b3 differs from pull request most recent head b2c8c2c. Consider uploading reports for the commit b2c8c2c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #824      +/-   ##
==========================================
- Coverage   87.76%   86.90%   -0.86%     
==========================================
  Files         268      269       +1     
  Lines       35175    35497     +322     
  Branches     7407     7462      +55     
==========================================
- Hits        30871    30850      -21     
- Misses       3150     3494     +344     
+ Partials     1154     1153       -1     
Impacted Files Coverage Δ
...n/lsst/daf/butler/datastores/sasquatchDatastore.py 0.00% <0.00%> (ø)
...hon/lsst/daf/butler/datastores/chainedDatastore.py 88.88% <55.55%> (-0.82%) ⬇️
python/lsst/daf/butler/core/exceptions.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@natelust natelust force-pushed the tickets/DM-38444 branch 3 times, most recently from 7c40db8 to 92c8ee8 Compare April 20, 2023 13:55
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all this. I've made lots of comments and will take another look.

There needs to be some text added to doc/lsst.daf.butler/datastores.rst explaining the datastore and, importantly, the configuration options.

@@ -0,0 +1 @@
Introduced a special datastore to upload metric measurements to a Sasquatch instance when a MetricMeasurementBundle is stored with a butler.put call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Introduced a special datastore to upload metric measurements to a Sasquatch instance when a MetricMeasurementBundle is stored with a butler.put call.
Introduced a special datastore to upload metric measurements to a Sasquatch instance when a `MetricMeasurementBundle` is stored with a `butler.put()` call.

Also can the full name of the datastore be included so people know where to find it?

python/lsst/daf/butler/datastores/sasquatchDatastore.py Outdated Show resolved Hide resolved
self._dispatcher.dispatchRef(inMemoryDataset, ref)
except SasquatchDispatchPartialFailure:
raise DatasetPutError(
"One or more records may have failed to upload, or only partially succeeded"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a chained datastore this message is going to be swallowed. Do you want a log entry to be written somewhere? Should you include the sasquatch error text? Should you raise from e? (I'm looking forward to python 3.11 where we can add this message to the other exception)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed around how exceptions are being handled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is going to need some discussion

super().__init__(config, bridgeManager)

# Name ourselves either using an explicit name or a name
# derived from the (unexpanded) root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# derived from the (unexpanded) root
# derived from the (unexpanded) root.


# Name ourselves either using an explicit name or a name
# derived from the (unexpanded) root
self.name = self.config.get("name", "{}@{}".format(type(self).__name__, self.config["root"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an example yaml configuration in this PR so I'm not sure what root means here. Are you always explicitly naming with the name or should the URL be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cribbed this from another datastore prior to fully understanding how it all worked and fit together, now that I do I think I need to re-visit that. I think the URL is reasonably unique

class SasquatchDatastore(GenericBaseDatastore):
"""Basic Datastore for writing to an in Sasquatch instance.

This Datastore is write only, meaning that it can dispatch data to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This Datastore is write only, meaning that it can dispatch data to a
This Datastore is currently write only, meaning that it can dispatch data to a

"""Basic Datastore for writing to an in Sasquatch instance.

This Datastore is write only, meaning that it can dispatch data to a
sasquatch instance, but at the present can not be used to retrieve values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sasquatch instance, but at the present can not be used to retrieve values.
Sasquatch instance, but at the present can not be used to retrieve values.

return

def validateKey(self, lookupKey: LookupKey, entity: DatasetRef | DatasetType | StorageClass) -> None:
# Docstring is inherited from base class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Docstring is inherited from base class
# Docstring is inherited from base class.

return

def getLookupKeys(self) -> set[LookupKey]:
# Docstring is inherited from base class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Docstring is inherited from base class
# Docstring is inherited from base class.

def export_records(self, refs: Iterable[DatasetIdRef]) -> Mapping[str, DatastoreRecordData]:
# Docstring inherited from the base class.

# In-memory Datastore records cannot be exported or imported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# In-memory Datastore records cannot be exported or imported
# Sasquatch Datastore records cannot be exported or imported.

@timj
Copy link
Member

timj commented Apr 20, 2023

An additional point I've just noticed, you need to make sure that self.bridge.insert(ref) is called at some point during your put. This populates the dataset_locations table to indicate that the ref is associated with the named datastore. In theory this can be used externally to find out which datastores know about a given dataset. If you do this you will also have to include a trash method and emptyTrash that moves the row out of dataset_locations even if sasquatch still has the dataset (which is also what the datastore.forget method can do).

@natelust
Copy link
Contributor Author

Do we want to insert a record into a table about who knows about this data? I would prefer not to at this moment because a) we cant fetch anything from Sasquatch anyway b) we cant do anything about the fact that someone might delete the record on the sasquatch side. Have a table that is wrong seems worse than not having a table entry c) the worst that will happen if someone runs something again is the same value will just be uploaded again.

@timj
Copy link
Member

timj commented Apr 27, 2023

Regarding recording that the datastore has accepted it, I'm fine with us punting for now but we do need to address it at some point.

@natelust natelust force-pushed the tickets/DM-38444 branch 2 times, most recently from c830e06 to b30b1b3 Compare May 2, 2023 13:53
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clean ups.

@@ -37,3 +37,13 @@ class ValidationError(RuntimeError):
"""Some sort of validation error has occurred."""

pass


class DatasetPutError(RuntimeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used anywhere now so can be deleted.

Some minor changes were made in chained datastore to support the
read only SasquatchDatastore going into analysis_tools. A note
of the datastore was added to the datastore document.
@natelust natelust merged commit f80a4f1 into main May 3, 2023
11 checks passed
@natelust natelust deleted the tickets/DM-38444 branch May 3, 2023 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants