Skip to content

Metadata intake#87

Closed
lucasortizny wants to merge 47 commits into
facebookresearch:mainfrom
lucasortizny:metadata-intake
Closed

Metadata intake#87
lucasortizny wants to merge 47 commits into
facebookresearch:mainfrom
lucasortizny:metadata-intake

Conversation

@lucasortizny
Copy link
Copy Markdown
Contributor

@lucasortizny lucasortizny commented Jul 8, 2022

Here is my pull request for the metadata intake. Some notes to recap:

  • The implementation in server.py is in handle_setup_v01. As requested, handle_setup has been left intact.
  • The master table has two new columns: participant_id (unique, nullable, max size is 50 characters in length, auto-generated UUID4 if not specified) and extra_metadata (4096 characters in length).
  • The extra metadata information will be serialized/deserialized -> JSON and placed into the Database in that way for future reference.
  • The Database schematic I originally created will need to be updated with the new information.

Please let me know if there are any questions! This should help cover #32

@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @lucasortizny!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 8, 2022
@facebook-github-bot
Copy link
Copy Markdown
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

1 similar comment
@facebook-github-bot
Copy link
Copy Markdown
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Comment thread aepsych/config.py Outdated
),
**kwargs,
)
# Convert config into a dictionary (but as it is, excludes "common")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that right? The way ConfigParser works, things that are in common are mirrored to all other sections. What this method is doing (I think) is omitting these mirrored copies (basically if something is in common, and also shows up in another section, just ignore the one in the other section since it's a mirror).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I absolutely did not know that. I knew default sections in ConfigParsers were special but now it makes sense why this line is here!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok -- can you update the comment to be correct or remove?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll update it, haven't pushed any commits yet

Comment thread aepsych/database/db.py Outdated
return None

def record_setup(self, description, name, id=None, request=None) -> str:
def record_setup(self, description, name, extrametadata=None, id=None, request=None, participant_id=None) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you underscore extra_metadata?

Comment thread aepsych/server/server.py Outdated
### if the metadata does not exist, we are going to log nothing
else:
self._db_master_record = self.db.record_setup(
description="default description",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be experiment_name and experiment_description instead of these hardcoded defaults?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line(s) in particular? @mshvartsman is it the else condition where string values are given instead?

Copy link
Copy Markdown
Contributor

@mshvartsman mshvartsman Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure what I was thinking. If there's no metadata we can keep it to some default string.

Comment thread aepsych/server/server.py Outdated
id=experiment_id,
)
### make a temporary config object to derive parameters because server handles config after table
tempconfig = Config(**request["message"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not modify self.configure to take in a Config object instead of the raw message? Then you don't need to create it twice (the first thing self.configure does is create a Config anyway).

Copy link
Copy Markdown
Contributor Author

@lucasortizny lucasortizny Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mshvartsman ! I absolutely agree in that it would be a great idea to just pass the config object. One thing I am concerned about is that self.configure is also called by the legacy handle_setup function with a string. I would have to write another version of this function to adjust to the new handle_setup_v01 but again it is important to think about whether you'd like to see another secondary version of an existing function. Please let me know if you'd like me to proceed. (TLDR: I don't want to modify the original handle_setup in any way unless absolutely necessary).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modifying the original handle_setup is maybe okay here. To the extent that there's a contract here it's that legacy handle_setup's behavior doesn't change, which will still be true if you change where the string-to-config conversion happens. I don't think anything else is calling self.configure. @crasanders what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's fine to change handle_setup to pass a Config into self.configure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okie dokie, will do!

Comment thread aepsych/server/server.py

self._db_master_record = self.db.record_setup(
description="default description",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are particularly necessary capitalization / whitespace changes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion one way or the other about capitalization, but do make sure that it's consistent. It might be useful to define DEFAULT_NAME and DEFAULT_DESC constants.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@mshvartsman
Copy link
Copy Markdown
Contributor

Great start -- left some nits in the code. Can you put together a minimal example that constructs some messages with metadata, processes them through versioned_handler, and then gets that metadata back out? That would be good as a test, and also as an example for users.

Also, once #89 lands you should run ufmt on your changes.

Comment thread aepsych/database/tables.py Outdated
experiment_name = Column(String(256))
experiment_description = Column(String(2048))
experiment_id = Column(String(10), unique=True)
participant_id = Column(String(50), unique=True, nullable=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want participant_id to be nullable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wanted it that way but then I eventually wrote it so a UUID would be generated anyways so this line no longer makes sense so I will remove the nullability!

Comment thread aepsych/server/server.py Outdated
id=experiment_id,
)
### make a temporary config object to derive parameters because server handles config after table
tempconfig = Config(**request["message"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's fine to change handle_setup to pass a Config into self.configure.

Comment thread aepsych/server/server.py

self._db_master_record = self.db.record_setup(
description="default description",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion one way or the other about capitalization, but do make sure that it's consistent. It might be useful to define DEFAULT_NAME and DEFAULT_DESC constants.

@lucasortizny
Copy link
Copy Markdown
Contributor Author

Attached is the current Database schematic updated to this PR's changes. I am going to continue this PR by writing unit tests for the new functionality in the DB.
Database Mapping July 14th

@lucasortizny
Copy link
Copy Markdown
Contributor Author

The reason why config was changed to usedconfig is because the metadata-intake functionality introduces a "config" parameter in the configure function in line 677 that I don't want to run into variable naming issues.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@crasanders has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

crasanders pushed a commit to crasanders/aepsych that referenced this pull request Aug 18, 2022
Summary:
Here is my pull request for the metadata intake. Some notes to recap:

- The implementation in server.py is in handle_setup_v01. As requested, handle_setup has been left intact.
- The master table has two new columns: participant_id (unique, nullable, max size is 50 characters in length, auto-generated UUID4 if not specified) and extra_metadata (4096 characters in length).
- The extra metadata information will be serialized/deserialized -> JSON and placed into the Database in that way for future reference.
- The Database schematic I originally created will need to be updated with the new information.

Please let me know if there are any questions! This should help cover facebookresearch#32

Pull Request resolved: facebookresearch#87

Differential Revision: D38840357

Pulled By: crasanders

fbshipit-source-id: 3549bdf887df4f09829cc9088d3a303b29932dfa
crasanders pushed a commit to crasanders/aepsych that referenced this pull request Aug 19, 2022
Summary:
Here is my pull request for the metadata intake. Some notes to recap:

- The implementation in server.py is in handle_setup_v01. As requested, handle_setup has been left intact.
- The master table has two new columns: participant_id (unique, nullable, max size is 50 characters in length, auto-generated UUID4 if not specified) and extra_metadata (4096 characters in length).
- The extra metadata information will be serialized/deserialized -> JSON and placed into the Database in that way for future reference.
- The Database schematic I originally created will need to be updated with the new information.

Please let me know if there are any questions! This should help cover facebookresearch#32

Pull Request resolved: facebookresearch#87

Differential Revision: D38840357

Pulled By: crasanders

fbshipit-source-id: e5e6ce31bca2e959349204afaec99e5d6e6270b1
crasanders pushed a commit to crasanders/aepsych that referenced this pull request Aug 24, 2022
Summary:
Here is my pull request for the metadata intake. Some notes to recap:

- The implementation in server.py is in handle_setup_v01. As requested, handle_setup has been left intact.
- The master table has two new columns: participant_id (unique, nullable, max size is 50 characters in length, auto-generated UUID4 if not specified) and extra_metadata (4096 characters in length).
- The extra metadata information will be serialized/deserialized -> JSON and placed into the Database in that way for future reference.
- The Database schematic I originally created will need to be updated with the new information.

Please let me know if there are any questions! This should help cover facebookresearch#32

Pull Request resolved: facebookresearch#87

Differential Revision: D38840357

Pulled By: crasanders

fbshipit-source-id: 6f07bd944e4184f5e951383f28ded5fff4850e42
crasanders pushed a commit to crasanders/aepsych that referenced this pull request Aug 26, 2022
Summary:
Here is my pull request for the metadata intake. Some notes to recap:

- The implementation in server.py is in handle_setup_v01. As requested, handle_setup has been left intact.
- The master table has two new columns: participant_id (unique, nullable, max size is 50 characters in length, auto-generated UUID4 if not specified) and extra_metadata (4096 characters in length).
- The extra metadata information will be serialized/deserialized -> JSON and placed into the Database in that way for future reference.
- The Database schematic I originally created will need to be updated with the new information.

Please let me know if there are any questions! This should help cover facebookresearch#32

Pull Request resolved: facebookresearch#87

Test Plan: New unit tests

Differential Revision: https://internalfb.com/D38840357

Pulled By: crasanders

fbshipit-source-id: 328b4cbc6ccd03a1e9869c70dbf18f875846327d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants