feat: Add support for Microsoft SQL Server (MSSQL) #77

marcusmunch · 2024-01-03T11:27:37Z

Intent

This PR adds support for Microsoft SQL Server (MSSQL).

Approach

I have been careful to add functionality to existing S3 methods. Simultaneously, some of these have been "promoted" from default methods to methods for DBIConnection objects as this is more specific while retaining use.

Testing has been enabled by treating ODBC connections a little more specifically, as these cannot be made with no arguments and it is no secret that the test suite is due for an overhaul. If one wishes to test ODBC backends, they can be defined in an environment variable SCDB_ODBC_JSON, which is a text string parseable by jsonlite::fromJSON. This can be set in the user's Renviron file¹ and (in a good way) forgotten about 🙂

MSSQL seems to have revealed a potential issue in how dbplyr handles temporary tables, as it fails to overwrite a temporary table even if dplyr::copy_to(overwrite=TRUE) is given.²

Known issues

db_joins.R currently does not pass tests, but as these functions are next for a more or less complete S3 makeover, I consider them outside the scope of the PR.

Checklist

The PR passes all local unit tests (except one, see "Known issues")
I have documented any new features introduced
If the PR adds a new feature, please add an entry in NEWS.md
A reviewer is assigned to this PR

Example line:
SCDB_ODBC_JSON='{"MSSQL": {"driver": "SQL Server", "server": "localhost", "database": "my_MSSQL_db", "trusted_connection": "true"}}' ↩
See e.g tests/testthat/test-update_snapshot.R. Tables are temporary by default in dplyr::copy_to. ↩

`any(is.na(.))` was causing issues with MSSQL. This is now dropped in favor of instead counting `NA` values and filtering `is_na` on whether or not it has a sum greater than 0.

RasmusSkytte · 2024-01-04T11:13:02Z

There seems to be some missing imports in DESCRIPTION:

See these messages from R-CMD-check:

❯ checking for unstated dependencies in ‘tests’ ... WARNING
  '::' or ':::' imports not declared from:
    ‘jsonlite’ ‘odbc’

RasmusSkytte · 2024-01-04T11:15:06Z

The test-coverage workflows throws an error for digest_to_checksum

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-db_manipulating_functions.R:121:3'): digest_to_checksum() works ──
checksums[1] == checksums[2] is not FALSE

`actual`:   TRUE 
`expected`: FALSE

RasmusSkytte · 2024-01-09T10:22:57Z

Also, there are some flagged possible spelling mistakes:

< Potential spelling errors:
<   WORD     FOUND IN
< ODBC     NEWS.md:3
< funder   SCDB-package.Rd:31

RasmusSkytte · 2024-01-09T10:27:26Z

And with all this hard work, you can also add some more to the NEWS.md :)

## Features

* Added support for Microsoft SQL Server using ODBC

## Minor Improvements and Fixes

* Implementation of `*_joins` improved and no longer masks `dplyr::*_joins`.

(also sorted one existing entry)

RasmusSkytte · 2024-01-09T09:55:12Z

R/db_manipulating_functions.R

 filter_keys <- function(.data, filters, by = NULL, na_by = NULL) {
-
-  # Check arguments
-  assert_data_like(.data)
-  assert_data_like(filters, null.ok = TRUE)
-  checkmate::assert_character(by, null.ok = TRUE)
-  checkmate::assert_character(na_by, null.ok = TRUE)
-
  if (is.null(filters)) {
    return(.data)
-  } else {
-    if (is.null(by) && is.null(na_by)) {
-      # Determine key types
-      key_types <- filters |>
-        dplyr::ungroup() |>
-        dplyr::summarise(dplyr::across(.cols = tidyselect::everything(), .fns = ~ any(is.na(.), na.rm = TRUE))) |>
-        tidyr::pivot_longer(tidyselect::everything(), names_to = "column_name", values_to = "is_na")
-
-      by    <- key_types |> dplyr::filter(!.data$is_na) |> dplyr::pull("column_name")
-      na_by <- key_types |> dplyr::filter(.data$is_na)  |> dplyr::pull("column_name")
-
-      if (length(by) == 0)    by    <- NULL
-      if (length(na_by) == 0) na_by <- NULL
-    }
-    return(inner_join(.data, filters, by = by, na_by = na_by))
  }


Is there a reason all checkmate:: checks have ben removed?
I can see why the assert_data_like are redundant now that it is S3 method.

In fact, I would suggest adding a check that all columns are defined in by and na_by.
(since we say so in the description of the function)

checkmate::check_set_equal(c(by, na_by), colnames(filters))

assert_character(by) most likely disappeared when I removed assert_character(na_by) due to na_by no longer being defined in the function signature.

I reinstated the checks, but since it is possible to skip to the dplyr::inner_join call, but have opted for checkmate::check_subset instead of check_set_equal (which would require the user to specify all columns in by or na_by if any were given).

RasmusSkytte · 2024-01-09T10:17:56Z

R/db_manipulating_functions.R

+      )) |>
+      tidyr::pivot_longer(tidyselect::everything(), names_to = "column_name", values_to = "is_na")
+
+    by    <- key_types |> dplyr::filter(.data$is_na > 0) |> dplyr::pull("column_name")
+    na_by <- key_types |> dplyr::filter(.data$is_na == 0)  |> dplyr::pull("column_name")
+
+    if (length(by) == 0)    by    <- NULL
+    if (length(na_by) == 0) na_by <- NULL
+  }
+  return(dplyr::inner_join(.data, filters, by = by, na_by = na_by))
+}
+
+#' @export
+filter_keys.data.frame <- function(.data, filters, by = NULL, ...) {
+  .dots <- list(...)
+
+  args <- list(
+    x = .data,
+    y = filters,
+    by = by
+  ) |>
+    append(.dots)
+
+  if ("na_by" %in% names(args)) {
+    args$na_matches <- "na"
+    args$na_by <- NULL
+  }
+
+  if (is.null(by)) args$by <- colnames(filters)
+
+  return(do.call(dplyr::inner_join, args = args))
 }


I think this can be simplified a lot.

In my view, SCDB joins_* and by extension filter_keys are designed to mimic base R joins as much as possible. Since R by default joins NA with NA, we can simplify this function:

#' @export filter_keys.data.frame <- function(.data, filters, by = NULL, na_by = NULL) { if (is.null(by) && is.null(na_by)) { by <- colnames(filters) } return(dplyr::inner_join(.data, filters, by = c(by, na_by), na_matches = "na")) }

na_matches defaults to "na" for dplyr:::inner_join.data.frame, so I have now added an even more simplified version of your suggestion 😇

Marcus Munch Grünewald added 16 commits December 15, 2023 13:51

chore: Change defaults to DBIConnection

639ad26

feat: Add get_schema for MSSQL

d365936

test: Allow testing with MSSQL if possible

0995eba

feat: add get_tables for MSSQL and ODBC

92f0f1b

fix: Convert db_latest to timestamp for comparison

c43bcdd

feat: Add table signature for MSSQL connections

b0d1e40

fix: Use as.character instead of paste

28d5b45

fix: Fix table assignment for MSSQL

27a03ca

fix: Prepare table for downstream handling

254b0e2

test: Remove #temp table for MSSQL

72d8d6d

fix: Drop any( in filter_keys

02f8d63

`any(is.na(.))` was causing issues with MSSQL. This is now dropped in favor of instead counting `NA` values and filtering `is_na` on whether or not it has a sum greater than 0.

fix: Ensure from_vars is always a vector of column names

f6c52de

fix: Prefer CONCAT_WS, except for SQLite

35bea68

test: Skip close_connection() test for ODBC

7be2cf1

test: Test consistency for id()

17e300d

chore: Update NEWS.md

eefa568

marcusmunch added the enhancement New feature or request label Jan 3, 2024

marcusmunch requested a review from RasmusSkytte January 3, 2024 11:27

marcusmunch self-assigned this Jan 3, 2024

Marcus Munch Grünewald added 2 commits January 3, 2024 12:40

fix: add global binding for CONCAT_WS

6298564

chore: linting

cab371e

Marcus Munch Grünewald added 2 commits January 8, 2024 13:53

fix: promote schema_exists.PqConnection to DBIConnection

968653d

fix: Convert *_joins to dplyr::*_join methods

9103eb3

marcusmunch changed the title ~~chore: Change defaults to DBIConnection~~ feat: Add support for Microsoft SQL Server (MSSQL) Jan 8, 2024

Marcus Munch Grünewald added 4 commits January 8, 2024 14:30

test: Use S3 methods

70b1b3e

fix(tests): Remove tests comparing to dplyr methods

178ada7

fix: Use coalesce in digest_to_checksum

fffc918

chore: Add jsonlite and odbc to Suggests

836ff53

Marcus Munch Grünewald added 2 commits January 8, 2024 15:47

chore: Linting

d9d54df

fix: Remove @param tags for na_by

ffaf1b1

marcusmunch requested review from RasmusSkytte and removed request for RasmusSkytte January 8, 2024 15:08

RasmusSkytte mentioned this pull request Jan 9, 2024

Add tests on SQL Server back ends ssi-dk/diseasystore#111

Merged

4 tasks

Marcus Munch Grünewald added 2 commits January 10, 2024 09:27

chore: Update WORDLIST

ebb839e

(also sorted one existing entry)

chore: Update NEWS.md

5012781

marcusmunch mentioned this pull request Jan 10, 2024

Add testing framework for Microsoft SQL Server #78

Closed

RasmusSkytte reviewed Jan 10, 2024

View reviewed changes

Marcus Munch Grünewald added 3 commits January 10, 2024 12:49

chore: Simplified filter_keys.data.frame

8d9bd6d

chore: Remove redundant early return

7a79468

fix: Re-add checks to filter_keys

96eedfb

marcusmunch requested a review from RasmusSkytte January 10, 2024 12:07

RasmusSkytte approved these changes Jan 10, 2024

View reviewed changes

marcusmunch merged commit 7ab7a6d into ssi-dk:main Jan 10, 2024
13 checks passed

marcusmunch deleted the backend_mssql branch January 10, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for Microsoft SQL Server (MSSQL) #77

feat: Add support for Microsoft SQL Server (MSSQL) #77

marcusmunch commented Jan 3, 2024 •

edited

RasmusSkytte commented Jan 4, 2024

RasmusSkytte commented Jan 4, 2024

RasmusSkytte commented Jan 9, 2024

RasmusSkytte commented Jan 9, 2024

RasmusSkytte Jan 9, 2024

marcusmunch Jan 10, 2024

RasmusSkytte Jan 9, 2024

marcusmunch Jan 10, 2024

feat: Add support for Microsoft SQL Server (MSSQL) #77

feat: Add support for Microsoft SQL Server (MSSQL) #77

Conversation

marcusmunch commented Jan 3, 2024 • edited

Intent

Approach

Known issues

Checklist

Footnotes

RasmusSkytte commented Jan 4, 2024

RasmusSkytte commented Jan 4, 2024

RasmusSkytte commented Jan 9, 2024

RasmusSkytte commented Jan 9, 2024

RasmusSkytte Jan 9, 2024

Choose a reason for hiding this comment

marcusmunch Jan 10, 2024

Choose a reason for hiding this comment

RasmusSkytte Jan 9, 2024

Choose a reason for hiding this comment

marcusmunch Jan 10, 2024

Choose a reason for hiding this comment

marcusmunch commented Jan 3, 2024 •

edited