Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix non-quoted db&catalog issues #1558

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

ThomasHepworth
Copy link
Contributor

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

See #1556

Give a brief description for the solution you have provided

Previously, we weren't quoting catalog and database names which was resulting in unusual catalog and database combinations causing SQL errors.

I've simply added quotes to self.splink_data_store (the only location both catalog and db are used) and it appears to be resolving the issue.

To test, you can try creating a random database named 1111 and setting that as your target db from within the linker object:

# Create a database called '1111'
spark.sql("CREATE DATABASE IF NOT EXISTS `1111`")

# Run our actual link job
linker = SparkLinker(
    df_spark,
    settings_dict,
    database="1111"
)

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter

@ThomasHepworth
Copy link
Contributor Author

CI tests are all failing still... I've fixed this here.

@github-actions
Copy link
Contributor

Test: test_2_rounds_1k_duckdb

Percentage change: -30.0%

date time stats_mean stats_min commit_info_branch commit_info_id machine_info_cpu_brand_raw machine_info_cpu_hz_actual_friendly commit_hash
849 2022-07-12 18:40:05 1.89098 1.87463 splink3 c334bb9 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 2.7934 GHz c334bb9
1986 2023-08-23 11:35:18 1.32291 1.3118 (detached head) b87da07 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 2.7934 GHz b87da07

Test: test_2_rounds_1k_sqlite

Percentage change: -27.9%

date time stats_mean stats_min commit_info_branch commit_info_id machine_info_cpu_brand_raw machine_info_cpu_hz_actual_friendly commit_hash
851 2022-07-12 18:40:05 4.32179 4.25898 splink3 c334bb9 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 2.7934 GHz c334bb9
1988 2023-08-23 11:35:18 3.12197 3.07046 (detached head) b87da07 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 2.7934 GHz b87da07

Click here for vega lite time series charts

@ThomasHepworth ThomasHepworth mentioned this pull request Aug 23, 2023
2 tasks
@espenfl
Copy link

espenfl commented Aug 23, 2023

@ThomasHepworth That is great. Thanks a lot. I also started on a PR and was stuck trying to add a proper test for it. I think it would be nice to also add this as part of the PR.

@ThomasHepworth
Copy link
Contributor Author

@espenfl ah, sorry, I wasn't aware you were also working on a fix!

I'm happy to close this PR and let you polish up a fix and a proper test if you'd like to make a contribution?

Our contribution guide is available here. Please add myself and Robin as reviewers once you're done, if you're up for it.

@espenfl
Copy link

espenfl commented Aug 23, 2023

@ThomasHepworth No problem. That happens. No way for you to know this as I was a bit slow submitting a draft. I learned some new things and we will get it fixed. That is all that matters. Please let this PR be open. Greatly appreciated. I was actually trying to make a new catalog name which involved figuring out how to set this in the Spark config. Easier and better to just do it with a new database as you do it here as the names are combined anyway in the issue that we want to solve.

@espenfl
Copy link

espenfl commented Aug 30, 2023

@ThomasHepworth Can I contribute with something to get this one merged?

@ThomasHepworth
Copy link
Contributor Author

#1558 (comment)

👋 apologies for the delayed reply, I was on leave last week.

Please do. I think it may in fact be easier if you submit a pull request as then I'll be able to QA and merge it personally.

I'll close this PR once you've opened another.

@espenfl
Copy link

espenfl commented Sep 21, 2023

@ThomasHepworth Unfortunately I am unable to get this done in the next two weeks so maybe we can just leave this as is and get it merged when the dev team find it suitable? Thanks.

Copy link
Contributor

@ADBond ADBond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seems to be working okay, nice one 👍

@ThomasHepworth ThomasHepworth merged commit a4957d0 into master Oct 11, 2023
5 of 11 checks passed
@ThomasHepworth ThomasHepworth deleted the 1556_catalog_db_missing_quotes branch October 11, 2023 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants