-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParseError: Required keyword: 'this' missing for <class 'sqlglot.expressions.EQ'> #2018
Comments
This seems to be an error originating from |
Hi, the full list of the stack is here: altair==5.2.0 The full error trace is here: --WARN-- ParseError Traceback (most recent call last) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/linker.py:3695, in Linker.estimate_probability_two_random_records_match(self, deterministic_matching_rules, recall) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/analyse_blocking.py:86, in cumulative_comparisons_generated_by_blocking_rules(linker, blocking_rules, output_chart, return_dataframe) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/blocking.py:542, in block_using_rules_sqls(linker) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/blocking.py:112, in BlockingRule.create_blocked_pairs_sql(self, linker, where_condition, probability) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/settings.py:222, in Settings._columns_to_select_for_blocking(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/input_column.py:256, in InputColumn.l_name_as_l(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/input_column.py:219, in InputColumn.unquote(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/copy.py:172, in deepcopy(x, memo, _nil) File ~/miniforge3/envs/tomenv2/lib/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) File ~/miniforge3/envs/tomenv2/lib/python3.9/copy.py:146, in deepcopy(x, memo, _nil) File ~/miniforge3/envs/tomenv2/lib/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy) File ~/miniforge3/envs/tomenv2/lib/python3.9/copy.py:153, in deepcopy(x, memo, _nil) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/settings.py:87, in Settings.deepcopy(self, memo) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/settings.py:80, in Settings.init(self, settings_dict) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/settings.py:129, in Settings._get_additional_columns_to_retain(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/parse_sql.py:12, in get_columns_used_from_sql(sql, dialect, retain_table_prefix) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/init.py:125, in parse_one(sql, read, dialect, into, **opts) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/dialects/dialect.py:311, in Dialect.parse(self, sql, **opts) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:986, in Parser.parse(self, raw_tokens, sql) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:1052, in Parser._parse(self, parse_method, raw_tokens, sql) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:1241, in Parser._parse_statement(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:3175, in Parser._parse_expression(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:3178, in Parser._parse_conjunction(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:4840, in Parser._parse_tokens(self, parse_method, expressions) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:3181, in Parser._parse_equality(self) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:4843, in Parser._parse_tokens(self, parse_method, expressions) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:1116, in Parser.expression(self, exp_class, comments, **kwargs) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:1136, in Parser.validate_expression(self, expression, args) File ~/miniforge3/envs/tomenv2/lib/python3.9/site-packages/sqlglot/parser.py:1096, in Parser.raise_error(self, message, token) ParseError: Required keyword: 'this' missing for <class 'sqlglot.expressions.EQ'>. Line 1, Col: 65. tom |
I've run this with the deterministic rules list and the same version of sqlglot, and I don't get the errror What happens if you run:
What do you get if you run:
? |
Hi, From this query: import sqlglot I get: (AND this: From the second query: import sqlglot I get: '18.17.0' Thank you!! |
hmm - something strange seems to be going on, because splink seems to be generating an error when it attempts to run I have run your suggested script on my side under the same splink/sqlglot versions without error. You're absolutely certain that in the environment you're running splink, it's definitely For instance, immediately after you get the error, you're able to run |
I am... I ran the import sqlglot sqlglot.version code in the same jupyterlab notebook.. |
Do you get the same error if you run the same code in duckdb?
|
Hi, I get the results that I expected: Probability two random records match is estimated to be 0.00389. The DuckDB version works really well!! Unfortunately, we have 94 million records to process so that wont do it.. Thats why I have been trying to get the spark version going.. Thank you!! Sincerely, tom |
Struggling a bit with what to suggest, sorry! You could try upgrading to latest sqlglot I guess., or maybe downgrade splink to an earlier version, say 3.9.5? |
Hi, I tried both upgrading down grading both Splink and sqlglot.. Same error.. |
Hi - I don't know if this helps, but I got exactly this error (same version of sqlglot) when I copy & pasted this example from the github.io page. This page has the deterministic rules written with |
Thanks so much for this @mattjbishop! I think this looks like it is probably the issue - @theimanph if I look at the source text of your comment I see that the code block has exactly these character entity refs: deterministic_rules = [
"l.first_name = r.first_name and levenshtein(r.dob, l.dob) <= 1",
"l.surname = r.surname and levenshtein(r.dob, l.dob) <= 1",
"l.first_name = r.first_name and levenshtein(r.surname, l.surname) <= 2",
"l.email = r.email"
] I think because the code was not contained in a codeblock github renders ' |
Hi, Thank you!! I am trying to run that version of the code and am now running into this error: ImportError: cannot import name 'block_on' from 'splink.spark.blocking_rule_library' (/home/c265616/miniforge3/envs/tomenv2/lib/python3.9/site-packages/splink/spark/blocking_rule_library.py) .. Any ideas? Thank you!! Sincerely, tom |
Not sure - possibly you don't have the latest version of splink. feel free to ask a question in the discussion forum. Closing this issue |
Hi Robin, Thank you!! Sincerely, tom |
What happens?
Hi,
I am trying to run the spark example: https://moj-analytical-services.github.io/splink/demos/examples/spark/deduplicate_1k_synthetic.html and the error I am getting is: ParseError: Required keyword: 'this' missing for <class 'sqlglot.expressions.EQ'>. Line 1, Col: 65.
l.first_name = r.first_name and levenshtein(r.dob, l.dob) <= 1
Any ideas on what is going wrong and what I can do about it? Thank you!!
Sincerely,
tom
To Reproduce
%pip install pyspark pyspark==3.4.1
#%pip install pyspark
%pip install --upgrade --force-reinstall pyarrow
%pip install pyodbc
%pip install duckdb
%pip install splink
%pip install usaddress
%pip install nbformat
import pyodbc
import os
import pandas as pd
import re
import usaddress
import time
from pyspark.sql import SparkSession
import pyspark.sql.functions as pyfuncs
from pyspark.sql.types import *
from pyspark.sql import Window
from splink.spark.jar_location import similarity_jar_location
path = similarity_jar_location()
print('create spark sesh')
spark = SparkSession
.builder
.appName("tomssplinktest")
.config("spark.master", "spark://ddlas01.hosted.lac.com:7077")
.config("spark.executor.memory", "45g")
.config("spark.driver.memory", "10g")
.config('spark.executor.cores', '1')
.config('spark.cores.max', '8')
.config('spark.executor.instances', '1')
.config('spark.jars', path)
.config('spark.sql.parquet.int96RebaseModeInWrite', "CORRECTED")
.getOrCreate()
print("created spark sesh!")
Disable warnings for pyspark - you don't need to include this
import warnings
spark.sparkContext.setLogLevel("ERROR")
warnings.simplefilter("ignore", UserWarning)
from splink.datasets import splink_datasets
pandas_df = splink_datasets.fake_1000
df = spark.createDataFrame(pandas_df)
import splink.spark.comparison_library as cl
import splink.spark.comparison_template_library as ctl
from splink.spark.blocking_rule_library import block_on
settings = {
"link_type": "dedupe_only",
"comparisons": [
ctl.name_comparison("first_name"),
ctl.name_comparison("surname"),
ctl.date_comparison("dob", cast_strings_to_date=True),
cl.exact_match("city", term_frequency_adjustments=True),
ctl.email_comparison("email", include_username_fuzzy_level=False),
],
"blocking_rules_to_generate_predictions": [
block_on("first_name"),
"l.surname = r.surname", # alternatively, you can write BRs in their SQL form
],
"retain_matching_columns": True,
"retain_intermediate_calculation_columns": True,
"em_convergence": 0.01
}
from splink.spark.linker import SparkLinker
linker = SparkLinker(df, settings, spark=spark)
deterministic_rules = [
"l.first_name = r.first_name and levenshtein(r.dob, l.dob) <= 1",
"l.surname = r.surname and levenshtein(r.dob, l.dob) <= 1",
"l.first_name = r.first_name and levenshtein(r.surname, l.surname) <= 2",
"l.email = r.email"
]
linker.estimate_probability_two_random_records_match(deterministic_rules, recall=0.6)
OS:
linux
Splink version:
most recent pypi version
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: