Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement / dataset metadata #1382

Conversation

WilliamHPNielsen
Copy link
Contributor

There is currently no way to query for the metadata associated with a particular dataset. One can only look up metadata by tag, which requires some auxiliary knowledge about the metadata.

Changes proposed in this pull request:

  • Add a metadata property to a DataSet object to see all associated metadata.

This little improvement is needed for #1352 in order to ensure correct copying of metadata.

@astafan8

@WilliamHPNielsen
Copy link
Contributor Author

WilliamHPNielsen commented Nov 13, 2018

Comment: you might wonder why snapshot is missing from RUNS_TABLE_COLUMNS. See the test. I think I made the correct choice, but perhaps we should upgrade the schema later.

@codecov
Copy link

codecov bot commented Nov 13, 2018

Codecov Report

Merging #1382 into master will increase coverage by 0.09%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1382      +/-   ##
==========================================
+ Coverage   73.29%   73.38%   +0.09%     
==========================================
  Files          79       79              
  Lines        9256     9282      +26     
==========================================
+ Hits         6784     6812      +28     
+ Misses       2472     2470       -2

Copy link
Contributor

@astafan8 astafan8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! (hope this much testing is enough ;) )

"""
Get all metadata associated with the specified run
"""
# TODO: promote snapshot to be present at creation time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this TODO mean upgrading the schema to have snapshot column in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I think it's just a blunder that it is not already there.

SELECT "{tag}"
FROM runs
WHERE run_id = ?
AND "{tag}" IS NOT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was trying to find when this extra condition would give problems, but couldn't.

@@ -342,6 +346,10 @@ def run_timestamp_raw(self) -> float:
def description(self) -> RunDescriber:
return self._description

@property
def metadata(self) -> Dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should metadata be added to the persistent traits list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the persistent traits do not exist yet, only on another feature branch.

@@ -526,3 +526,23 @@ def test_get_description(some_paramspecs):
loaded_ds = DataSet(run_id=1)

assert loaded_ds.description == desc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that i understood your comment about snapshot. so you've made it both: non metadata AND not RUNS_TABLE_COLUMNS, right? and both should be fixed with upgrading the schema when snapshot is a necessary column of the runs table, right? if so, i agree. We should also do smth with metadata later so that arbitrary columns do not get created for metadata tags.

SELECT "{tag}"
FROM runs
WHERE run_id = ?
AND "{tag}" IS NOT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will DataSet.add_metadata({'mytag': None}) produce a new column mytag with a NULL value in it? if yes, then either such an add_metadata call should not be allowed, or "{tag}" IS NOT NULL part shall be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. Since a single run with metadata tag creates the column and thereby implicitly NULL-populates it for all other runs, I think the sensible solution is to prohibit None as a valid metadata value. And if we do so, that prohibition should be enforced loud and clear.

@WilliamHPNielsen WilliamHPNielsen merged commit a8baf7a into microsoft:master Nov 14, 2018
@WilliamHPNielsen WilliamHPNielsen deleted the improvement/dataset_metadata branch November 14, 2018 11:53
giulioungaretti pushed a commit that referenced this pull request Nov 14, 2018
Merge: 0204c8c 4658bfc
Author: William H.P. Nielsen <whpn@mailbox.org>

    Merge pull request #1382 from WilliamHPNielsen/improvement/dataset_metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants