This is an R package to help you expose differences between two vocabulary versions.
A demo (under development) on the Synpuf data can be found here:
- Provides a Shiny app to allow you to visually inspect row level differences between vocabularies.
- Easily customizable; you can add additional SQL queries, the results of which will be displayed by the Shiny app.
- Provides an optional report (also customizable) consisting of numeric summaries (differences) between two vocabularies within CDMs.
- Provides functionality to download the results as csv or excel file.
The first example shows how to compare two vocabulary versions and launch the Shiny app to visualize the results of the "comparison" queries. By defining findPrevalences = TRUE you can filter on those codes that appear in the database.
For this example the data reside in a Microsoft PDW dbms on a server called X, using default port 17001.
The databases containing the CDMs are called "db1" and "db2". The database schema is "dbo".
library(Tantalus) cdmDatabaseSchema <- "db1.dbo" oldVocabularyDatabaseSchema <- cdmDatabaseSchema newVocabularyDatabaseSchema <- "db2.dbo" connectionDetails <- createConnectionDetails(dbms = "pdw", server = "X", user = "some user", password = "some pw", port = 17001) result = compareVocabData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, oldVocabularyDatabaseSchema = oldVocabularyDatabaseSchema, newVocabularyDatabaseSchema = newVocabularyDatabaseSchema, findPrevalences = TRUE) launchComparisonExplorer(result)
Queries used by compareVocabData() are located in inst/sql/sql_server. Details of these queries can be found in the SQL files. By default, only "Test" and "Map" queries are executed. This can be modified by adjusting sqlFiles and sqlMapFiles in compareVocabData():
sqlFiles <- list.files(pathToSql, pattern = "Test.*.sql") sqlMapFiles <- list.files(pathToSql, pattern = "MapSource.*.sql")
The next example shows how to create a summary (diffSummary.html) of the differences between two vocabularies. The SQL files for the summary can by adjusting sqlFiles in createDiffSummary():
sqlFiles <- list.files(pathToSql, pattern = "Count.*.sql")
Using the same variables as above, we call createDiffSummary() which creates diffSummary.html via rmarkdown.
A JSON file containing the results of the numeric summaries is also created.
JSONPath <- "C:\\Temp" createDiffSummary(connectionDetails,oldVocabularyDatabaseSchema,newVocabularyDatabaseSchema,JSONPath)
The above calls will create diffSummary.html in JSONPath, unless otherwise specified.
The Tantalus package is an R package that makes use of Shiny, R Markdown, and JSON for visualization.
Running the package requires R with the packages SqlRender, DatabaseConnector, shiny, DT, stringdist, and jsonlite, installed.
- There are no dependencies.
To install the latest development version, install from GitHub:
Once installed, you can try follow the examples above to invoke the Shiny app to inspect row level differences and create a summary diff report:
library(Tantalus) # set appropriate variables output <- compareVocabData( ... ) # Compare vocabularies launchComparisonExplorer(output) # View the results of the comparison queries via Shiny createDiffSummary( ... ) # Create a high level summary of the differences between the two vocabs
- Package manual: Tantalus manual
- Developer questions/comments/feedback: OHDSI Forum
- We use the GitHub issue tracker for all bugs/issues/enhancements
Tantalus is licensed under Apache License 2.0
Tantalus is being developed in R Studio.
Beta. Still under development