We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When running spark example notebook "deduplicate_1k_synthetic" AttributeError: 'NoneType' object has no attribute 'sparkContext'
AttributeError: 'NoneType' object has no attribute 'sparkContext'
AttributeError Traceback (most recent call last) File <command-3287110354252329>:2 1 from splink.spark.linker import SparkLinker ----> 2 linker = SparkLinker(df, settings) 3 deterministic_rules = [ 4 "l.first_name = r.first_name and levenshtein(r.dob, l.dob) <= 1", 5 "l.surname = r.surname and levenshtein(r.dob, l.dob) <= 1", 6 "l.first_name = r.first_name and levenshtein(r.surname, l.surname) <= 2", 7 "l.email = r.email" 8 ] 10 linker.estimate_probability_two_random_records_match(deterministic_rules, recall=0.6) File {redacted}/splink/spark/linker.py:192, in SparkLinker.__init__(self, input_table_or_tables, settings_dict, break_lineage_method, set_up_basic_logging, input_table_aliases, spark, validate_settings, catalog, database, repartition_after_blocking, num_partitions_on_repartition, register_udfs_automatically) 190 self.in_databricks = "DATABRICKS_RUNTIME_VERSION" in os.environ 191 if self.in_databricks: --> 192 enable_splink(spark) 194 self._set_default_break_lineage_method() 196 if register_udfs_automatically: File /Workspace/Repos/hbrashear@w2ogroup.com/hbrashear-splink-issue-param-spark/splink/databricks/enable_splink.py:15, in enable_splink(spark) 4 def enable_splink(spark): 5 """ 6 Enable Splink functions. 7 Use this function at the start of your workflow to ensure Splink is registered on (...) 13 None 14 """ ---> 15 sc = spark.sparkContext 16 _jar_path = similarity_jar_location() 17 JavaURI = sc._jvm.java.net.URI AttributeError: 'NoneType' object has no attribute 'sparkContext'
Run spark example notebook "deduplicate_1k_synthetic" on Databricks 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)
Databricks 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)
splink==3.9.8
master
The text was updated successfully, but these errors were encountered:
It's a really quick fix : in cell 5 change linker = SparkLinker(df, settings) to linker = SparkLinker(df, settings, spark=spark)
linker = SparkLinker(df, settings)
linker = SparkLinker(df, settings, spark=spark)
Sorry, something went wrong.
Merge pull request #1873 from w2o-hbrashear/master
8e36034
Fixes #1872 Update deduplicate_1k_synthetic.ipynb to fix spark error
Successfully merging a pull request may close this issue.
What happens?
When running spark example notebook "deduplicate_1k_synthetic"
AttributeError: 'NoneType' object has no attribute 'sparkContext'
To Reproduce
Run spark example notebook "deduplicate_1k_synthetic" on Databricks 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)
OS:
Databricks 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)
Splink version:
splink==3.9.8
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: