Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add update_column and add_column to SingleTableMetadata #915

Merged

Conversation

pvk-developer
Copy link
Member

Resolves #877

@pvk-developer pvk-developer marked this pull request as ready for review July 20, 2022 19:29
@pvk-developer pvk-developer requested a review from a team as a code owner July 20, 2022 19:29
@pvk-developer pvk-developer requested review from katxiao and amontanez24 and removed request for a team July 20, 2022 19:29
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but I think this is looking really good so far!

Comment on lines 52 to 55
matches = re.findall('(%.)|(%)', formated_date)
if matches:
matches = ', '.join(item for match in matches for item in match if item)
matches = ''.join(char if not char.isalnum() else '.' for char in matches)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this regex doing? Can we just use the same logic as in the actual transformer? As in try to parse the datetime and if it comes back with an error than we catch that here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such way to tell if a strftime is valid (or at least I didn't find one). This regex detects if there are % left on the formated_date, which is a way to tell whether or not the user passed an extra % which is the character that is usually used to specify any datetime like formating string.
The last line maybe I can remove it since it's just to clear the string a bit. I don't want ot confuse the end user since it can be a bit confusing if we find a % next to a 1 that comes from the datetime.now().

Comment on lines 226 to 236
instance._validate_unexpected_kwargs('age', 'numerical')
instance._validate_unexpected_kwargs('age', 'numerical', representation='int')
instance._validate_unexpected_kwargs('start_date', 'datetime')
instance._validate_unexpected_kwargs('start_date', 'datetime', datetime_format='%Y-%d')
instance._validate_unexpected_kwargs('name', 'categorical')
instance._validate_unexpected_kwargs('name', 'categorical', order_by='alphabetical')
instance._validate_unexpected_kwargs('name', 'categorical', order=['a', 'b', 'c'])
instance._validate_unexpected_kwargs('synthetic', 'boolean')
instance._validate_unexpected_kwargs('phrase', 'text')
instance._validate_unexpected_kwargs('phrase', 'text', regex_format='[A-z]')
instance._validate_unexpected_kwargs('phone', 'phone_number')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parameterize this test instead of calling all these different cases like this? We can make the passed kwargs, and the error message parameters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yes let me re-do it

@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2022

Codecov Report

Merging #915 (214a504) into V1.0.0.dev (fd26ffd) will increase coverage by 0.68%.
The diff coverage is 97.46%.

@@              Coverage Diff               @@
##           V1.0.0.dev     #915      +/-   ##
==============================================
+ Coverage       69.48%   70.16%   +0.68%     
==============================================
  Files              39       39              
  Lines            2949     3023      +74     
==============================================
+ Hits             2049     2121      +72     
- Misses            900      902       +2     
Impacted Files Coverage Δ
sdv/metadata/single_table.py 98.70% <97.46%> (-1.30%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd26ffd...214a504. Read the comment docs.

@@ -22,6 +32,68 @@ class SingleTableMetadata:
KEYS = ['columns', 'primary_key', 'alternate_keys', 'constraints', 'SCHEMA_VERSION']
SCHEMA_VERSION = 'SINGLE_TABLE_V1'

@staticmethod
def _validate_numerical(column_name, **kwargs):
representations = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pull this out as a constant?

"""
# Setup
instance = SingleTableMetadata()
representations = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could parameterize the unit test with the list of representations. That way if one fails, it's easy to tell which representation failed.

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments but I think this is ready!

@@ -47,6 +47,12 @@ class TestSingleTableMetadata:
re.escape("Invalid values '(anonymization, order_by)' for phone_number column 'phone'."))
]

NUMERICAL_REPRESENTATIONS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just access this from the class like SingleTableMetadata._NUMERICAL_REPRESENTATIONS

@@ -29,18 +29,18 @@ class SingleTableMetadata:
'b': 'boolean',
'M': 'datetime',
}

_NUMERICAL_REPRESENTATIONS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a frozenset instead of a list? That way it is immutable

@@ -12,16 +14,86 @@
class SingleTableMetadata:
"""Single Table Metadata class."""

_EXPECTED_KWARGS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a frozendict as well?

Copy link
Contributor

@katxiao katxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@pvk-developer pvk-developer merged commit 106621f into V1.0.0.dev Jul 22, 2022
@pvk-developer pvk-developer deleted the issue-877-add-update-column-and-add-column-to-stm branch July 22, 2022 06:57
pvk-developer added a commit that referenced this pull request Jul 25, 2022
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
pvk-developer added a commit that referenced this pull request Aug 9, 2022
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
pvk-developer added a commit that referenced this pull request Aug 10, 2022
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
amontanez24 pushed a commit that referenced this pull request Sep 13, 2022
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
amontanez24 pushed a commit that referenced this pull request Nov 22, 2022
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
amontanez24 pushed a commit that referenced this pull request Dec 21, 2022
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
amontanez24 pushed a commit that referenced this pull request Feb 17, 2023
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
amontanez24 pushed a commit that referenced this pull request Mar 23, 2023
* Implement update / add columns. *WIP

* Add docstrings

* Fix call functions

* WIP: Unit tests in progress

* Finish unit tests

* Address comments

* Fix multiple python version erroring

* fix exception

* Fix error msg

* Address comments

* Address comments about frozensets

* Bump macos version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants