-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add update_column
and add_column
to SingleTableMetadata
#915
Add update_column
and add_column
to SingleTableMetadata
#915
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but I think this is looking really good so far!
sdv/metadata/single_table.py
Outdated
matches = re.findall('(%.)|(%)', formated_date) | ||
if matches: | ||
matches = ', '.join(item for match in matches for item in match if item) | ||
matches = ''.join(char if not char.isalnum() else '.' for char in matches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this regex doing? Can we just use the same logic as in the actual transformer? As in try to parse the datetime and if it comes back with an error than we catch that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such way to tell if a strftime
is valid (or at least I didn't find one). This regex
detects if there are %
left on the formated_date
, which is a way to tell whether or not the user passed an extra %
which is the character that is usually used to specify any datetime like formating string.
The last line maybe I can remove it since it's just to clear the string a bit. I don't want ot confuse the end user since it can be a bit confusing if we find a %
next to a 1
that comes from the datetime.now()
.
instance._validate_unexpected_kwargs('age', 'numerical') | ||
instance._validate_unexpected_kwargs('age', 'numerical', representation='int') | ||
instance._validate_unexpected_kwargs('start_date', 'datetime') | ||
instance._validate_unexpected_kwargs('start_date', 'datetime', datetime_format='%Y-%d') | ||
instance._validate_unexpected_kwargs('name', 'categorical') | ||
instance._validate_unexpected_kwargs('name', 'categorical', order_by='alphabetical') | ||
instance._validate_unexpected_kwargs('name', 'categorical', order=['a', 'b', 'c']) | ||
instance._validate_unexpected_kwargs('synthetic', 'boolean') | ||
instance._validate_unexpected_kwargs('phrase', 'text') | ||
instance._validate_unexpected_kwargs('phrase', 'text', regex_format='[A-z]') | ||
instance._validate_unexpected_kwargs('phone', 'phone_number') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we parameterize this test instead of calling all these different cases like this? We can make the passed kwargs, and the error message parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yes let me re-do it
Codecov Report
@@ Coverage Diff @@
## V1.0.0.dev #915 +/- ##
==============================================
+ Coverage 69.48% 70.16% +0.68%
==============================================
Files 39 39
Lines 2949 3023 +74
==============================================
+ Hits 2049 2121 +72
- Misses 900 902 +2
Continue to review full report at Codecov.
|
sdv/metadata/single_table.py
Outdated
@@ -22,6 +32,68 @@ class SingleTableMetadata: | |||
KEYS = ['columns', 'primary_key', 'alternate_keys', 'constraints', 'SCHEMA_VERSION'] | |||
SCHEMA_VERSION = 'SINGLE_TABLE_V1' | |||
|
|||
@staticmethod | |||
def _validate_numerical(column_name, **kwargs): | |||
representations = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull this out as a constant?
""" | ||
# Setup | ||
instance = SingleTableMetadata() | ||
representations = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could parameterize the unit test with the list of representations. That way if one fails, it's easy to tell which representation failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments but I think this is ready!
@@ -47,6 +47,12 @@ class TestSingleTableMetadata: | |||
re.escape("Invalid values '(anonymization, order_by)' for phone_number column 'phone'.")) | |||
] | |||
|
|||
NUMERICAL_REPRESENTATIONS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just access this from the class like SingleTableMetadata._NUMERICAL_REPRESENTATIONS
sdv/metadata/single_table.py
Outdated
@@ -29,18 +29,18 @@ class SingleTableMetadata: | |||
'b': 'boolean', | |||
'M': 'datetime', | |||
} | |||
|
|||
_NUMERICAL_REPRESENTATIONS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this a frozenset instead of a list? That way it is immutable
@@ -12,16 +14,86 @@ | |||
class SingleTableMetadata: | |||
"""Single Table Metadata class.""" | |||
|
|||
_EXPECTED_KWARGS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this a frozendict as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
* Implement update / add columns. *WIP * Add docstrings * Fix call functions * WIP: Unit tests in progress * Finish unit tests * Address comments * Fix multiple python version erroring * fix exception * Fix error msg * Address comments * Address comments about frozensets * Bump macos version.
Resolves #877