You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
failed when the settings object was deepcopied because it lost its dialect information and therefore parsing a string with a spark backtick failed/
Full traceback
> linker.unlinkables_chart(source_dataset="Testing")
tests/test_full_example_spark.py:106:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
splink/linker.py:2970: in unlinkables_chart
records = unlinkables_data(self)
splink/unlinkables.py:20: in unlinkables_data
self_link = linker._self_link()
splink/linker.py:2013: in _self_link
sqls = block_using_rules_sqls(self)
splink/blocking.py:361: in block_using_rules_sqls
sql = br.create_blocked_pairs_sql(linker, where_condition, probability)
splink/blocking.py:98: in create_blocked_pairs_sql
columns_to_select = linker._settings_obj._columns_to_select_for_blocking
splink/settings.py:222: in _columns_to_select_for_blocking
cols.append(uid_col.l_name_as_l)
splink/input_column.py:239: in l_name_as_l
alias = self.unquote().name_l
splink/input_column.py:202: in unquote
self_copy = deepcopy(self)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/copy.py:172: in deepcopy
y = _reconstruct(x, memo, *rv)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/copy.py:270: in _reconstruct
state = deepcopy(state, memo)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/copy.py:146: in deepcopy
y = copier(x, memo)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/copy.py:230: in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/copy.py:153: in deepcopy
y = copier(memo)
splink/settings.py:87: in __deepcopy__
cc = Settings(self.as_dict())
splink/settings.py:80: in __init__
self.def()
splink/settings.py:129: in _get_additional_columns_to_retain
get_columns_used_from_sql(br.blocking_rule_sql, br.sql_dialect)
splink/parse_sql.py:10: in get_columns_used_from_sql
syntax_tree = sqlglot.parse_one(sql, read=dialect)
.venv/lib/python3.9/site-packages/sqlglot/__init__.py:125: in parse_one
result = dialect.parse(sql, **opts)
.venv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py:311: in parse
return self.parser(**opts).parse(self.tokenize(sql), sql)
.venv/lib/python3.9/site-packages/sqlglot/parser.py:979: in parse
return self._parse(
.venv/lib/python3.9/site-packages/sqlglot/parser.py:1048: in _parse
self.raise_error("Invalid expression / Unexpected token")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <sqlglot.parser.Parser object at 0x7f9eaf27a7c0>
message = 'Invalid expression / Unexpected token'
token = <Token token_type: TokenType.IDENTIFIER, text: `, line: 1, col: 13, start: 12, end: 12, comments: []>
def raise_error(self, message: str, token: t.Optional[Token] = None) -> None:
"""
Appends an error in the list of recorded errors or raises it, depending on the chosen
error level setting.
"""
token = token or self._curr or self._prev or Token.string("")
start = token.start
end = token.end + 1
start_context = self.sql[max(start - self.error_message_context, 0) : start]
highlight = self.sql[start:end]
end_context = self.sql[end : end + self.error_message_context]
error = ParseError.new(
f"{message}. Line {token.line}, Col: {token.col}.\n"
f" {start_context}\033[4m{highlight}\033[0m{end_context}",
description=message,
line=token.line,
col=token.col,
start_context=start_context,
highlight=highlight,
end_context=end_context,
)
if self.error_level == ErrorLevel.IMMEDIATE:
> raise error
E sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 13.
E l.`unique_id` = r.`unique_id`
.venv/lib/python3.9/site-packages/sqlglot/parser.py:1089: ParseError
When we re-do blocking rules for Splink 4, we need to have a consistent approach ensuring they're always dialected, and that dialect is 'taken from' the same place (which should be the dialect on the root settings object, rather than a blocking-rule-specific dialect
This example underscores a general principle around deepcoping that it should be straightforward to serialise settings to json and read back in to get a copy of settings
The text was updated successfully, but these errors were encountered:
I just ran into an issue whereby this
failed when the settings object was deepcopied because it lost its dialect information and therefore parsing a string with a spark backtick failed/
Full traceback
When we re-do blocking rules for Splink 4, we need to have a consistent approach ensuring they're always dialected, and that dialect is 'taken from' the same place (which should be the dialect on the root settings object, rather than a blocking-rule-specific dialect
This example underscores a general principle around deepcoping that it should be straightforward to serialise settings to json and read back in to get a copy of settings
The text was updated successfully, but these errors were encountered: