Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation of dbt-synapse in same env as dbt-sqlserver conflicts namespace and starts running SQL server models using synapse Adapter #18

Closed
alittlesliceoftom opened this issue Sep 29, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@alittlesliceoftom
Copy link

Due to the presence of two identical 'types' in the adapters.

On my side dbt defaults to using the synapse connection, I'm not sure why.

I suspect that this would be fixed with an updated type: SqlServerSynapse but I'd have to read and understand the code better to be sure.

Actual error:
('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near 'DISTRIBUTION'. (102) (SQLExecDirectW)")

@alittlesliceoftom
Copy link
Author

This could be fixed with isolated envs, but I think that you should be able to use different adapaters separated using the project.yml

@dataders
Copy link
Collaborator

great point. It's definitely a use case we'd like to support wherein we could have a dev target be Azure SQL and a production target be Synapse. I've been meaning to rename the target officially to synapse in our code with #6, but have been uncertain of what that means when we need to pull an update from the upstream dbt-sqlserver will it:

  1. respect our rename, or
  2. will we have to do a merge commit where we rename the macro names every time back to synapse?

I guess we won't know until we try!

@alittlesliceoftom
Copy link
Author

alittlesliceoftom commented Sep 30, 2020 via email

@dataders
Copy link
Collaborator

Yeah that's an option I think. Check out this dbt slack thread from a month ago where some options are discussed.

@alittlesliceoftom
Copy link
Author

I guess it's fairly edge case for many users and easily solvable using venvs, so a nice to have feature change really.

@dataders dataders added the enhancement New feature or request label Oct 20, 2020
@dataders
Copy link
Collaborator

Going to do some experiments this week on this.

@jtcohen6
Copy link
Contributor

One option: you can have dbt-synapse pull in dbt-sqlserver as a python dependency. That's the relationship dbt-redshift has with dbt-postgres: it installs it as a dependency, and builds its own classes on top of the posgres ones (e.g.).

That would enable you to avoid repeating boilerplate code around connections, for instance, between this repo and dbt-sqlserver. You'd only need to reimplement (= write code for) the things that are genuinely different on synapse.

macros

Right now, both dbt-sqlserver and dbt-synapse can use sqlserver__ implementations of common ("dispatched") macros because they both use an adapter type named sqlserver.

Currently, adapter.dispatch doesn't include a notion of adapter dependencies, but it could: In that world, dbt-redshift would first look for a redshift__ implementation, then postgres__, then default__. So, too, would dbt-synapse look first for synapse__, then sqlserver__, then default__.

@dataders
Copy link
Collaborator

@jtcohen6 this is definitely something I've been wanting to discuss. Take a look at dbt-msft/dbt-sqlserver#32. Looking at just the the .sql macro files you can see there's very little difference between the packages. We'd love the help making streamlining them.

Another possible alternative is to have a single adapter, dbt-tsql that supports SQL Server, Azure SQL, and Azure Synapse Analytics. My thinking here would be to use the EngineEdition property via T-SQL's SERVERPROPERTY() where:

Id Product
1-4,8 SQL Server
5,8,9 Azure SQL
6 Azure Synapse

The high level differences are:

  • SQL Server
    • has a 20 years of legacy, but
    • some functions like DROP ... IF EXISTS weren't introduced until 2016
  • Azure SQL
    • has the vast majority of still-relevant SQL Server functionality
    • (to the extent that dbt-sqlserver works perfectly with Azure SQL especially with recent PRs that add cloud features like Azure Active Directory and External Tables
  • Azure Synapse:
    • promotes a new pattern of creating tables, CTAS that allows specification of indices and distribution
    • is still re-writing T-SQL to be parallel by default so doesn't have all the T-SQL functions of Azure SQL (e.g. MERGE and Table-Valued Functions are now in preview as of October 2020)

from the SERVERPROPERTY() doc

Id EngineEdition
1 Personal or Desktop Engine (Not available in SQL Server 2005 (9.x) and later versions.)
2 Standard (This is returned for Standard, Web, and Business Intelligence.)
3 Enterprise (This is returned for Evaluation, Developer, and Enterprise editions.)
4 Express (This is returned for Express, Express with Tools, and Express with Advanced Services)
5 SQL Database
6 Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse)
8 Azure SQL Managed Instance
9 Azure SQL Edge (This is returned for all editions of Azure SQL Edge)

cc: @NandanHegde15 @alieus @mikaelene

@mikaelene
Copy link
Contributor

One option: you can have dbt-synapse pull in dbt-sqlserver as a python dependency. That's the relationship dbt-redshift has with dbt-postgres: it installs it as a dependency, and builds its own classes on top of the posgres ones (e.g.).

That would enable you to avoid repeating boilerplate code around connections, for instance, between this repo and dbt-sqlserver. You'd only need to reimplement (= write code for) the things that are genuinely different on synapse.

macros

Right now, both dbt-sqlserver and dbt-synapse can use sqlserver__ implementations of common ("dispatched") macros because they both use an adapter type named sqlserver.

Currently, adapter.dispatch doesn't include a notion of adapter dependencies, but it could: In that world, dbt-redshift would first look for a redshift__ implementation, then postgres__, then default__. So, too, would dbt-synapse look first for synapse__, then sqlserver__, then default__.

This sounds like a great way of solving it. It would make it more transparent what the differences are as well.

@dataders
Copy link
Collaborator

dataders commented Jan 9, 2021

getting very close to this with #32!

@dataders
Copy link
Collaborator

resolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants