Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added default sourcefile behavior to garble.py and households.py #35

Merged
merged 18 commits into from
Oct 12, 2022

Conversation

jsrockhill
Copy link
Collaborator

@jsrockhill jsrockhill commented Oct 6, 2022

households.py and garble.py now both look for the newest file (based on filename timestamp) in the temp-data directory if no sourcefile argument is passed from the command line. For ticket #183028439

jsrockhill and others added 14 commits October 6, 2022 10:19
… output and results from linkage-agent-tools
…it to utils. Updated linkid_to_patid.py to validate metadata before translating linkids
…d updated linkid_to_patid.py to work with households
… output and results from linkage-agent-tools
…it to utils. Updated linkid_to_patid.py to validate metadata before translating linkids
…d updated linkid_to_patid.py to work with households
@jsrockhill jsrockhill changed the title Added default sourcefile population to garble.py and households.py Added default sourcefile behavior to garble.py and households.py Oct 6, 2022
README.md Outdated Show resolved Hide resolved
garble.py Outdated
if args.sourcefile:
source_file = Path(args.sourcefile)
else:
oldest_ts = datetime.fromtimestamp(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is supposed to be newest? The code looks right, just the variable name is backwards. Same thing in households.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and I'm happy to change the name. I chose "oldest" because we're looking for the 'largest' timestamp/furthest from 0 but understand the confusion.

garble.py Outdated
for filename in filter(
lambda x: "pii" in x and len(x) == 23, os.listdir("temp-data")
):
timestamp = datetime.strptime(filename[4:-4], "%Y%m%dT%H%M%S")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth changing but the YYYYMMDD format means you could just take the maximum. The benefit of parsing I suppose is that you crash if there's anything unexpected in there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting I do max() on the timestamps as strings? I see that how/why this would work (and would even work with the current format because the %H%M%S format gives 24-hr time but would prefer at least parsing the time to make sure it's a time and we're not accidentally comparing an errant file that ended up in temp-data. A hypothetical "pii-data-report-v2.json" which would also be picked up by this code and would str-compare to be newer than all of the timestamps because of how python compares strings. That said I could replace a few lines of this logic with a max() of the parsed timestamp objects, because datetime has comparators built in to it. Does that make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd prefer something that's more crash-resistant, I could filter on a regex to ensure the filename is exactly what we're looking for but to me that felt like a little bit overkill, but I'm happy to put it in if you think it's warranted.

jsrockhill and others added 3 commits October 11, 2022 15:16
Co-authored-by: Dylan Hall <dehall@mitre.org>
… handle timestamp comparison in line with comments from @dehall. Also made a few updates to README
@dehall dehall merged commit ee996ea into master Oct 12, 2022
@dehall dehall deleted the update-garble-default branch October 13, 2022 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants