AttributeError: 'NoneType' object has no attribute 'sparkContext' #1872

w2o-hbrashear · 2024-01-22T18:51:54Z

What happens?

When running spark example notebook "deduplicate_1k_synthetic"
AttributeError: 'NoneType' object has no attribute 'sparkContext'

AttributeError                            Traceback (most recent call last)
File 
<command-3287110354252329>:2
      1 from splink.spark.linker import SparkLinker
----> 2 linker = SparkLinker(df, settings)
      3 deterministic_rules = [
      4     "l.first_name = r.first_name and levenshtein(r.dob, l.dob) <= 1",
      5     "l.surname = r.surname and levenshtein(r.dob, l.dob) <= 1",
      6     "l.first_name = r.first_name and levenshtein(r.surname, l.surname) <= 2",
      7     "l.email = r.email"
      8 ]
     10 linker.estimate_probability_two_random_records_match(deterministic_rules, recall=0.6)


File {redacted}/splink/spark/linker.py:192, in SparkLinker.__init__(self, input_table_or_tables, settings_dict, break_lineage_method, set_up_basic_logging, input_table_aliases, spark, validate_settings, catalog, database, repartition_after_blocking, num_partitions_on_repartition, register_udfs_automatically)
    190 self.in_databricks = "DATABRICKS_RUNTIME_VERSION" in os.environ
    191 if self.in_databricks:
--> 192     enable_splink(spark)
    194 self._set_default_break_lineage_method()
    196 if register_udfs_automatically:

File /Workspace/Repos/hbrashear@w2ogroup.com/hbrashear-splink-issue-param-spark/splink/databricks/enable_splink.py:15, in enable_splink(spark)
      4 def enable_splink(spark):
      5     """
      6     Enable Splink functions.
      7     Use this function at the start of your workflow to ensure Splink is registered on
   (...)
     13         None
     14     """
---> 15     sc = spark.sparkContext
     16     _jar_path = similarity_jar_location()
     17     JavaURI = sc._jvm.java.net.URI

AttributeError: 'NoneType' object has no attribute 'sparkContext'

To Reproduce

Run spark example notebook "deduplicate_1k_synthetic" on Databricks 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

OS:

Databricks 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

Splink version:

splink==3.9.8

Have you tried this on the latest `master` branch?

I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

I agree

The text was updated successfully, but these errors were encountered:

w2o-hbrashear · 2024-01-22T18:57:32Z

It's a really quick fix : in cell 5 change
linker = SparkLinker(df, settings)
to
linker = SparkLinker(df, settings, spark=spark)

Fixes #1872 Update deduplicate_1k_synthetic.ipynb to fix spark error

w2o-hbrashear mentioned this issue Jan 22, 2024

Fixes #1872 Update deduplicate_1k_synthetic.ipynb to fix spark error #1873

Merged

7 tasks

RobinL closed this as completed in #1873 Jan 22, 2024

RobinL added a commit that referenced this issue Jan 22, 2024

Merge pull request #1873 from w2o-hbrashear/master

8e36034

Fixes #1872 Update deduplicate_1k_synthetic.ipynb to fix spark error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NoneType' object has no attribute 'sparkContext' #1872

AttributeError: 'NoneType' object has no attribute 'sparkContext' #1872

w2o-hbrashear commented Jan 22, 2024

w2o-hbrashear commented Jan 22, 2024

AttributeError: 'NoneType' object has no attribute 'sparkContext' #1872

AttributeError: 'NoneType' object has no attribute 'sparkContext' #1872

Comments

w2o-hbrashear commented Jan 22, 2024

What happens?

To Reproduce

OS:

Splink version:

Have you tried this on the latest master branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

w2o-hbrashear commented Jan 22, 2024

Have you tried this on the latest `master` branch?