Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add sql-datatype to the SDK discovery and catalog #1872

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

BuzzCutNorman
Copy link
Contributor

@BuzzCutNorman BuzzCutNorman commented Jul 21, 2023

This is an attempt to add the sql_datatype to the discovery process and will be reflected in the catalog.

Closes #1323


📚 Documentation preview 📚: https://meltano-sdk--1872.org.readthedocs.build/en/1872/

@codecov
Copy link

codecov bot commented Jul 21, 2023

Codecov Report

Attention: Patch coverage is 87.09677% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 86.93%. Comparing base (59c3e3d) to head (0f3841a).
Report is 341 commits behind head on main.

Files Patch % Lines
singer_sdk/connectors/sql.py 82.60% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1872      +/-   ##
==========================================
- Coverage   86.94%   86.93%   -0.02%     
==========================================
  Files          59       59              
  Lines        5080     5104      +24     
  Branches      822      831       +9     
==========================================
+ Hits         4417     4437      +20     
- Misses        466      469       +3     
- Partials      197      198       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BuzzCutNorman
Copy link
Contributor Author

I have run across instances where the discovery process runs into SQL data types that SQLAlchemy cannot handle. When this occurs, the following warnings are logged to the console.

PS C:\development\projects\my-test-stuff> meltano invoke tap-mssql --discover > discovery.json
2023-07-21T19:40:07.498116Z [info     ] Environment 'dev' is active
C:\development\MeltanoContribute\sdk\singer_sdk\connectors\sql.py:463: SAWarning: Did not recognize type 'hierarchyid' of column 'OrganizationNode'
  for column_def in inspected.get_columns(table_name, schema=schema_name):
C:\development\MeltanoContribute\sdk\singer_sdk\connectors\sql.py:463: SAWarning: Did not recognize type 'geography' of column 'SpatialLocation'
  for column_def in inspected.get_columns(table_name, schema=schema_name):
C:\development\MeltanoContribute\sdk\singer_sdk\connectors\sql.py:463: SAWarning: Did not recognize type 'hierarchyid' of column 'DocumentNode'
  for column_def in inspected.get_columns(table_name, schema=schema_name):

The sql-datatype is presented as a NullType()

        {
          "breadcrumb": [
            "properties",
            "OrganizationNode"
          ],
          "metadata": {
            "inclusion": "available",
            "sql-datatype": "NullType()"
          }
        },

cc @edgarrmondragon

@BuzzCutNorman
Copy link
Contributor Author

The warnings are being captured, the message expanded, and presented back to the console at the info level . The warnings now look like this.

PS C:\development\projects\dev-tap-mssql-discover-sqltype> meltano invoke tap-mssql --discover > discovery.jsonl
2023-08-08T17:09:38.074328Z [info     ] Environment 'dev' is active
2023-08-08 10:09:39,636 | INFO     | sqlconnector         | Discovery warning: 'HumanResources-Employee' Did not recognize type 'hierarchyid' of column 'OrganizationNode'
2023-08-08 10:09:39,730 | INFO     | sqlconnector         | Discovery warning: 'Person-Address' Did not recognize type 'geography' of column 'SpatialLocation'
2023-08-08 10:09:39,859 | INFO     | sqlconnector         | Discovery warning: 'Production-Document' Did not recognize type 'hierarchyid' of column 'DocumentNode'
2023-08-08 10:09:39,890 | INFO     | sqlconnector         | Discovery warning: 'Production-ProductDocument' Did not recognize type 'hierarchyid' of column 'DocumentNode'

@BuzzCutNorman
Copy link
Contributor Author

After looking at this more I update the message to this format:

PS C:\development\projects\dev-tap-mssql-discover-sqltype> meltano invoke tap-mssql --discover > discovery.jsonl
2023-08-08T17:50:34.642130Z [info     ] Environment 'dev' is active
2023-08-08 10:50:36,401 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'hierarchyid' of column 'OrganizationNode' in 'HumanResources-Employee'
2023-08-08 10:50:36,512 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'geography' of column 'SpatialLocation' in 'Person-Address'
2023-08-08 10:50:36,684 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'hierarchyid' of column 'DocumentNode' in 'Production-Document'
2023-08-08 10:50:36,747 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'hierarchyid' of column 'DocumentNode' in 'Production-ProductDocument'

Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BuzzCutNorman! Left a few suggestions and opened #1903 which will make sense once taps output this type of metadata 😁

Comment on lines +471 to +487
with warnings.catch_warnings(record=True) as inspection_warnings:
for column_def in inspected.get_columns(table_name, schema=schema_name):
column_name = column_def["name"]
is_nullable = column_def.get("nullable", False)
jsonschema_type: dict = self.to_jsonschema_type(
t.cast(sqlalchemy.types.TypeEngine, column_def["type"]),
)
table_schema.append(
th.Property(
name=column_name,
wrapped=th.CustomType(jsonschema_type),
required=not is_nullable,
),
)
if len(inspection_warnings) > 0:
for line in inspection_warnings:
expanded_msg: str = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this bits, wdyt about using the builtin logging.captureWarnings?

We'd probably want to call it whenever a plugin's CLI is invoked, so in the body of

@classmethod
def invoke(
cls,
*,
about: bool = False,
about_format: str | None = None,
**kwargs: t.Any, # noqa: ARG003
) -> None:
"""Invoke the plugin.
Args:
about: Display package metadata and settings.
about_format: Specify output style for `--about`.
kwargs: Plugin keyword arguments.
"""
if about:
cls.print_about(about_format)
sys.exit(0)

Comment on lines +494 to +497
for column_def in inspected.get_columns(table_name, schema=schema_name):
sql_datatypes[
str(column_def["name"])
] = self.discover_catalog_entry_sql_datatype(column_def["type"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you feel comfortable adding tests for this? A good place might be in tests/core/test_connector_sql.py since this is a connector method. I think it could even be parametrized with many examples of column types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants