Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters get encoded wrong with dbt templater #3729

Open
2 of 3 tasks
kim-a-eriksson opened this issue Aug 10, 2022 · 9 comments
Open
2 of 3 tasks

Special characters get encoded wrong with dbt templater #3729

kim-a-eriksson opened this issue Aug 10, 2022 · 9 comments
Labels
documentation Add or improve documentation (including error messages)

Comments

@kim-a-eriksson
Copy link

kim-a-eriksson commented Aug 10, 2022

Search before asking

  • I searched the issues and found no similar issues.

What Happened

When running sqlfluff fix the special characters "ÅÄÖ" becomes "ÅÄÖ". This happens when using templater = dbt and NOT when using templater = jinja.

Expected Behaviour

For special characters to remain same, i.e. to respect encoding.

Observed Behaviour

Since it works for templater = jinja, I guess there's just some incomplete implementation for dbt.

How to reproduce

SQL-code (couldn't attach sql files): testing_special_characters

{{
	config(materialized='table')
}}

SELECT
    FIRST_COLUMN,
    SECOND_COLUMN
FROM TABLE_TO_TEST
where TYPE_OF_TEST = 'TESTING ÅÄÖ' 

CLI-command

sqlfluff fix models/stage/testing_special_characters.sql

Dialect

snowflake

Version

sqlfluff==1.2.1
sqlfluff-templater-dbt==1.2.1
dbt-core==1.2.0
dbt-snowflake==1.2.0
Python 3.9.13

Configuration

[sqlfluff]
dialect = snowflake
templater = dbt
exclude_rules = L011, L031, L016, L060
encoding = utf-8

[sqlfluff:rules:L010]
capitalisation_policy = upper

[sqlfluff:rules:L014]
extended_capitalisation_policy = upper

[sqlfluff:rules:L030]
extended_capitalisation_policy = upper

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

@kim-a-eriksson kim-a-eriksson added the bug Something isn't working label Aug 10, 2022
@barrywhart
Copy link
Member

I'm not sure SQLFluff can do much in this case, since dbt is reading the file when using the dbt templater. I'm curious, what operating system are you using? I heard about a similar problem recently from a Windows user, which they were able to fix somehow; I think it involved changing the character encoding in their Command Prompt.

@kim-a-eriksson
Copy link
Author

Oh, I didn't know that all the encoding was handled by dbt when using the dbt templater. Atleast that explains why it worked for templater = jinja =)

I'm using Windows too. Tried what I interpreted was your suggestion $OutputEncoding = [System.Text.Encoding]::utf8.

I verified that the change actually took place in power shell, but still no difference in results.

Do you happen to know where I could find this similar example?

@barrywhart
Copy link
Member

I think it was a discussion in Slack. Maybe try searching the Slack history. It may have been a GitHub issue, so you could also try searching for issues (closed as well as open ones) that mention "Windows".

@barrywhart
Copy link
Member

Is there a dbt setting that lets you control the encoding?

@barrywhart
Copy link
Member

I found the other issue/discussion: #3585

@kim-a-eriksson
Copy link
Author

Thanks alot for your help @barrywhart ! Setting it to utf-8 globally as in the other issue worked for me as well. Is this something that I should report as a bug to dbt? I'm guessing it could be handled by them allowing to set the output encoding explicitly like sqlfluff is.

@barrywhart
Copy link
Member

It's probably worth raising it as a bug with them, especially if it causes issues other than with SQLFluff.

I also wonder if this is something SQLFluff can check for, or at least document somewhere. You're the second person to report this issue, so it'd be great if the info were not "hidden"' in an old GitHub issue.

@barrywhart barrywhart added documentation Add or improve documentation (including error messages) and removed bug Something isn't working labels Aug 20, 2022
@barrywhart
Copy link
Member

Changing from bug to documentation. Probably worth documenting since several Windows users have encountered this issue.

@barrywhart
Copy link
Member

In the dbt templater, we may want to use in_str rather than reading the SQL file from disk, because SQLFluff knows and uses the encoding specified in .sqlfluff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Add or improve documentation (including error messages)
Projects
None yet
Development

No branches or pull requests

2 participants