Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data generator 2.1 #70

Closed
wants to merge 38 commits into from
Closed

Data generator 2.1 #70

wants to merge 38 commits into from

Conversation

omri374
Copy link
Contributor

@omri374 omri374 commented Jan 20, 2023

This is a draft PR for the changes for the Presidio Sentence Faker (PR #50). We'll keep it alive until we finalize the design of the evaluator piece, and merge those together to make sure they are consistent.
cc @Robbie-Palmer @melmatlis @tranguyen221

Robbie-Palmer and others added 30 commits August 2, 2022 10:00
Map NATIONALITY entity to NRP not to LOCATION
Change dict literals to dict constructors to improve readability
Asa higher level abstraction over PresidioDataGenerator for utilising all templates and providers in this library
…fault AnalyzerEngine

Validate chosen language is available in provided AnalyzerEngine
Map NATIONALITY entity to NRP not to LOCATION
Change dict literals to dict constructors to improve readability
Asa higher level abstraction over PresidioDataGenerator for utilising all templates and providers in this library
…fault AnalyzerEngine

Validate chosen language is available in provided AnalyzerEngine
Update PresidioDataGenerator tests to make stronger assertions about contents of results
Update PresidioFakeRecordGenerator to use ReligionProvider
…-record-generator

# Conflicts:
#	presidio_evaluator/data_generator/__init__.py
#	presidio_evaluator/data_generator/faker_extensions/providers.py
#	presidio_evaluator/data_generator/presidio_data_generator.py
Co-authored-by: melmatlis <93650751+melmatlis@users.noreply.github.com>
…ator

# Conflicts:
#	presidio_evaluator/data_generator/presidio_data_generator.py
Rename PresidioFakeRecordGenerator to PresidioSentenceFaker to distinguish it from `RecordGenerator`
Robbie-Palmer and others added 8 commits January 17, 2023 16:13
Move functions for loading data from FakeNameGenerator.com in faker format into new datasets.py module
Move logic for choosing templates out of SentenceFaker into PresidioSentenceFaker
Remove generic read file function
Add missing HospitalProvider
Update Recognizer tests to use PresidioSentenceProvider
Make single module to hold all sentence semantic dependency logic for Faker, including SentenceFaker, RecordGenerator and RecordsFaker
…aker

Rename faker_to_presidio_entity_type to ENTITY_TYPE_MAPPING
Make presidio_templates_file_path and presidio_additional_entity_providers available from package
Update Data Generation README to outline choices
@omri374 omri374 closed this Feb 6, 2023
@omri374 omri374 deleted the data-generator-2.1 branch February 6, 2023 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants