Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43079: Add metadata module for encapsulating sqlalchemy binding #47

Merged
merged 37 commits into from Apr 1, 2024

Conversation

JeremyMcCormick
Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick commented Mar 13, 2024

This is a broadly scoped set of changes for adding a metadata module to Felis, which replaces the existing sql module. The functionality is similar to the old module, but the new ones uses Pydantic for model validation and adds some additional functionality. A number of other dependent and related changes were made as part of this update. RFC-1003 triggers this ticket.

The changes include the following:

  • Replaced the create-all command with create in the CLI to perform a similar function of creating database objects from the schema. (The "all" part seemed redundant and unnecessary.)
  • Moved some important functions for variant type handling from sql into a new db._variants module.
  • Moved the InsertDump class for dumping and printing DDL statements into the metadata module from cli and made it able to dump DDL directly to a file.
  • DM-43108 - Added the ability to drop and create schemas in the target database using the CLI tool with the create command.
  • Added a flag to the create command for echoing SQL commands when a real (not mock) connection is being used.
  • Added CLI arguments to the create command for writing SQL statements directly to an output file.
  • Added an additional simple test YAML file used in test_metadata.
  • Modified and added tests for the metadata and cli modules; the tests for the sql module were deleted.

Minor changes that were included in this branch which are not strictly related to the referenced ticket:

  • Added configuration for the pydeps utility to pyproject.toml
  • Commented out a line with value setting in tests/data/test.yml that was causing errors. (The issue causing this error will be resolved on another ticket branch.)

TODO

  • Add test that loads a schema into a database and compares its built SQA metadata to that reflected from the database to make sure they match.
  • Add test that compares information from a schema object with a built SQA metadata object from MetaDataBuilder.
  • Check what the schema name needs to be when using the sqlite driver and modify CLI handling accordingly; currently it is deliberately left blank but might be settable to "main" instead.
  • Change construction of MetaData from a schema object to use a builder pattern to avoid polluting interface with build-related functions that are meant to be private

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

Attention: Patch coverage is 79.14439% with 78 lines in your changes are missing coverage. Please review.

Project coverage is 91.14%. Comparing base (eada62a) to head (b955513).

Files Patch % Lines
python/felis/metadata.py 77.90% 36 Missing and 4 partials ⚠️
python/felis/cli.py 60.46% 11 Missing and 6 partials ⚠️
python/felis/db/_variants.py 60.60% 12 Missing and 1 partial ⚠️
tests/test_metadata.py 92.52% 2 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
- Coverage   93.28%   91.14%   -2.14%     
==========================================
  Files          20       21       +1     
  Lines        2114     2225     +111     
  Branches      428      468      +40     
==========================================
+ Hits         1972     2028      +56     
- Misses         82      131      +49     
- Partials       60       66       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The current logic in Felis's sql module assumes that autoincrement
will be None if it is not set explicitly so for now match this logic
in the datamodel.
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested, I've had a quick look but only at the first few files. Can look again another time.

python/felis/cli.py Outdated Show resolved Hide resolved
python/felis/datamodel.py Show resolved Hide resolved
python/felis/datamodel.py Outdated Show resolved Hide resolved
python/felis/datamodel.py Outdated Show resolved Hide resolved
python/felis/datamodel.py Outdated Show resolved Hide resolved
python/felis/db/_variants.py Outdated Show resolved Hide resolved
Add a module with a new class that encapsulates the creation of
sqlalchemy MetaData from a Pydantic Schema object, along with a test
that dumps the corresponding SQL DDL statements to a file. The logic
for translating into SQA was primarily derived from the existing
visitor class in sql.py, which this is intended to replace eventually.
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-43079 branch 13 times, most recently from 9b79165 to 4fd0621 Compare March 20, 2024 23:39
Add a "create" command that uses the metadata module instead of the
sql module.
Add flags for specifying an alternate schema name and creating the
schema in the database if it does exist already. For MySQL, a new
database will be created. In PostgreSQL, a new schema will be created
within the current database.
Add several new flags to the CLI for dropping a schema if it exists and
creating one if it does not exist.

Add methods implementing this functionality to the metadata module.
This is a generic schema representing sales data with customers and
orders for testing purposes.
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-43079 branch 2 times, most recently from 46b5440 to f398dd4 Compare March 20, 2024 23:48
@JeremyMcCormick JeremyMcCormick marked this pull request as ready for review March 20, 2024 23:49
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-43079 branch 2 times, most recently from 931fc5d to 07d53f8 Compare March 21, 2024 00:53
The sql module is being deleted but the variant handling utilities
need to be kept.
Remove the sql module which is replaced by the metadata module.

Remove the "create-all" command from the CLI which is replaced by
"create".

Remove the InsertDump class from the cli module, as it was moved into
the metadata module.
Instead of having a class with internal functions for building a
MetaData object, use a builder pattern for a cleaner interface.
Instead of not setting the schema name, use "main" instead which works
for sqlite. In case a host name is not set on the database engine URL,
force a dry run using the supplied variant.
Test that the metadata from the builder and that reflected from the
database have equivalent index and constraint objects. Add some
additional constraints to the YAML test file.
In some cases, it is not desirable to apply the schema name to either
the metadata or the tables. Add two flags to the builder for
controlling this behavior. The flags are true by default and can be
set to false if the user does not want the schema to be applied.
Change the regular expression to match on numbers rather than any
character between the parentheses.
Fix mypy issue where TextIOBase does not have a visible name attribute.
Provide a utility method for getting the SQLAlchemy datatype from a
column object in the Pydantic model, including the variant type
handling.
This test reads in a test schema, creates a SQLAlchemy MetaData object
from it, and then compares the schema to the metadata to check that
they have the equivalent, expected information.
Remove exception checking for each constraint type as it is
unnecessary. In the unlikely case that a constraint was not built
properly, throw an exception.
This field is required so that a PostgreSQL datatype can be optionally
provided in the column information.
This setting confuses mypy and the syntax seems clearer when using the
enum object instead.
All of the other fields use snake case and aliases for camelcase
variables, so change primary key to match.
@JeremyMcCormick
Copy link
Collaborator Author

Reduction in test coverage is noted and will be dealt with in subsequent tickets.

@JeremyMcCormick JeremyMcCormick merged commit 0d31a49 into main Apr 1, 2024
9 of 11 checks passed
@JeremyMcCormick JeremyMcCormick deleted the tickets/DM-43079 branch April 1, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants