This notebook is provided as basic demonstration of use of the pmapUtilities R package, developed by Luke C. Mullany, PhD MS MHS, and available at https://github.com/lmullany/pmapUtilities.git and can be installed using `devtools::install_github("lmullany/pmapUtilities")`

- **Platform Tool** : Jupyter/RStudio Crunchr Compute Containers
- **Programming Language**: R (>=3.6)
- **Author(s)** : Luke C. Mullany
- **Last Updated** : September 12, 2022

## Basic use of pmapUtilities R package

#### Clear workspace and load libraries

In [1]:
rm(list=ls())
library(pmapUtilities)

#### Get information on available databases
Use the `get_database_names()` function, passing your username (JHED ID). This will return a three column tibble with columns `name`, `database_id`, and `create_date`. You can optionally pass an argumen to the `pattern` parameter, to filter the results to a specific subset of database names

In [2]:
dbs = get_database_names(username="lmullan1")

Enter Password for lmullan1:  ·········


In [3]:
tail(dbs,3)

name,database_id,create_date
<chr>,<int>,<dttm>
Kidney_Menez_IRB00304696_Scratch,305,2022-09-08 16:37:58
Kidney_Menez_IRB00304696_Projection,306,2022-09-08 16:38:09
NCCU_OMOP,307,2022-09-08 16:42:35


Here we limit our search to just databases where name starts with `"PatientSafety"`

In [4]:
dbs = get_database_names(username = "lmullan1", pattern="^PatientSafety")

Enter Password for lmullan1:  ·········


In [5]:
dbs

name,database_id,create_date
<chr>,<int>,<dttm>
PatientSafetyQualityWSP_Scratch,123,2021-03-02 07:38:28
PatientSafetyQualityVTE_Scratch,182,2021-07-27 09:57:07
PatientSafetyQualityMA_Projection,219,2022-01-04 16:25:42
PatientSafetyQualityMA_Scratch,220,2022-01-04 16:25:56
PatientSafetyQualityWSP_Projection,221,2022-03-16 08:42:08
PatientSafetyQuality_JHM_Keystone_Scratch,269,2022-05-06 10:54:07
PatientSafetyQuality_JHM_Keystone_Projection,270,2022-05-06 10:54:20
PatientSafetyQualityVTE_Projection,299,2022-08-19 08:55:42


#### Generate a connection to the database
Here, we use the `get_sql_connection()` function, passing a database name, and a username (JHED ID). You can assign this object to `default_engine` (or any other name). The advantage of using `default_engine` is that all subsequent `pmapUtilities` functions that require a connection object will look by default in the global environment for `default_engine`

In [6]:
default_engine = get_sql_connection(dbname = "CAMP_PMCoE_Projection",username = "lmullan1")

Enter Password for lmullan1:  ·········


#### Show all the tables in this database

In [7]:
list_tables()

table
<chr>
patients
encounters
labs
meds
problemlist
procedures
symptoms
vitals_BP
vitals_height
vitals_pulse


#### Call the same function, but this time ask for dimensions (`show_dimensions = TRUE`)

In [8]:
list_tables(show_dimensions = TRUE, exact=T)

table,rows,cols
<chr>,<dbl>,<dbl>
patients,60676,5
encounters,753484,4
labs,3509868,12
meds,631022,10
problemlist,781379,4
procedures,5550672,6
symptoms,1967436,5
vitals_BP,1061684,7
vitals_height,305612,7
vitals_pulse,1227099,7


#### List columns for a particular table

In [10]:
list_columns("encounters")

#### Get the number of rows and columns for just one table (rather than all tables, see above)

In [11]:
get_table_dim("encounters", exact = T)

#### Return a lazy handle to the table
Note that this feature is using dplyr/dbplyr under the hood, which enables us to translate dplyr verbs into SQL and execute on the table without pulling all rows from the table

In [12]:
encounters = return_table("encounters")
encounters

[90m# Source:   table<dbo.encounters> [?? x 4][39m
[90m# Database: Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
   osler_id                             encounter_id encou~1 encounter_date     
   [3m[90m<chr>[39m[23m                                       [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<dttm>[39m[23m             
[90m 1[39m 5303550b-8ed2-42fd-885a-d32b308b05f3        [4m1[24m[4m7[24m938 Office~ 2015-10-10 [90m00:00:00[39m
[90m 2[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m1[24m[4m4[24m[4m2[24m706 Office~ 2016-01-24 [90m00:00:00[39m
[90m 3[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m5[24m[4m7[24m[4m1[24m465 Office~ 2017-03-19 [90m00:00:00[39m
[90m 4[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m4[24m[4m3[24m[4m2[24m470 Office~ 2016-10-22 [90m00:00:00[39m
[90m 5[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m4[24m[4m1[24m[4m0[24m795 Office~ 2016-10-01 [90m00:

For example, we can do `"SELECT encounter_type, count() as CT from encounters group by encounter_type ORDER BY COUNT()"` without explicitly writing the SQL query

In [13]:
dplyr::count(encounters, encounter_type, sort=T)

[90m# Source:     SQL [?? x 2][39m
[90m# Database:   Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
[90m# Ordered by: desc(n)[39m
   encounter_type          n
   [3m[90m<chr>[39m[23m               [3m[90m<int>[39m[23m
[90m 1[39m Office Visit       [4m4[24m[4m3[24m[4m1[24m080
[90m 2[39m Appointment         [4m9[24m[4m7[24m718
[90m 3[39m Hospital Encounter  [4m8[24m[4m6[24m789
[90m 4[39m Visit Encounter     [4m6[24m[4m2[24m740
[90m 5[39m Clinical Support    [4m2[24m[4m1[24m626
[90m 6[39m Procedure visit     [4m1[24m[4m8[24m256
[90m 7[39m Results Only         [4m8[24m895
[90m 8[39m Orders Only          [4m8[24m635
[90m 9[39m Anti-coag visit      [4m7[24m284
[90m10[39m Provider Procedure   [4m2[24m987
[90m# ... with more rows[39m
[90m# i Use `print(n = ...)` to see more rows[39m

We can of course pull the entire table locally, if we desire, using `dplyr::collect()`

In [14]:
encounters_local = encounters %>% dplyr::collect()
print(encounters_local %>% head(10))

[90m# A tibble: 10 x 4[39m
   osler_id                             encounter_id encou~1 encounter_date     
   [3m[90m<chr>[39m[23m                                       [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<dttm>[39m[23m             
[90m 1[39m 5303550b-8ed2-42fd-885a-d32b308b05f3        [4m1[24m[4m7[24m938 Office~ 2015-10-10 [90m00:00:00[39m
[90m 2[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m1[24m[4m4[24m[4m2[24m706 Office~ 2016-01-24 [90m00:00:00[39m
[90m 3[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m5[24m[4m7[24m[4m1[24m465 Office~ 2017-03-19 [90m00:00:00[39m
[90m 4[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m4[24m[4m3[24m[4m2[24m470 Office~ 2016-10-22 [90m00:00:00[39m
[90m 5[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m4[24m[4m1[24m[4m0[24m795 Office~ 2016-10-01 [90m00:00:00[39m
[90m 6[39m 5303550b-8ed2-42fd-885a-d32b308b05f3       [4m6[24m[4m3[24m[4m1[24m974 Office~ 2017

#### Use `query_db()` to submit any sql query directly to the database; by default it returns a lazy tbl

In [16]:
qry <- "SELECT encounter_type, ct = COUNT(*) FROM encounters GROUP BY encounter_type"
query_db(qry) %>% dplyr::arrange(desc(ct))

[90m# Source:     SQL [?? x 2][39m
[90m# Database:   Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
[90m# Ordered by: desc(ct)[39m
   encounter_type         ct
   [3m[90m<chr>[39m[23m               [3m[90m<int>[39m[23m
[90m 1[39m Office Visit       [4m4[24m[4m3[24m[4m1[24m080
[90m 2[39m Appointment         [4m9[24m[4m7[24m718
[90m 3[39m Hospital Encounter  [4m8[24m[4m6[24m789
[90m 4[39m Visit Encounter     [4m6[24m[4m2[24m740
[90m 5[39m Clinical Support    [4m2[24m[4m1[24m626
[90m 6[39m Procedure visit     [4m1[24m[4m8[24m256
[90m 7[39m Results Only         [4m8[24m895
[90m 8[39m Orders Only          [4m8[24m635
[90m 9[39m Anti-coag visit      [4m7[24m284
[90m10[39m Provider Procedure   [4m2[24m987
[90m# ... with more rows[39m
[90m# i Use `print(n = ...)` to see more rows[39m

#### Use gen_random_table() to get subset of the identifying variables for a table; the function creates a table on the db, and returns the name of that temp table

In [18]:
random_px = gen_random_table("patients",idvars="osler_id", engine=default_engine)
cat("Name of the newly created table is ", random_px,"\n")

Name of the newly created table is  #qw1whqw431npwtjs 


### You can feed this table name, just like any other table name, to the `return_table()` function

In [20]:
return_table(random_px, engine=default_engine)

[90m# Source:   table<dbo.#qw1whqw431npwtjs> [?? x 1][39m
[90m# Database: Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
   osler_id                            
   [3m[90m<chr>[39m[23m                               
[90m 1[39m 02d6224e-f7cb-4b5d-9868-952360cdcaa9
[90m 2[39m 02f55980-ab50-41aa-812d-0c642848fa45
[90m 3[39m 044dc300-c300-4b51-b345-48475c895187
[90m 4[39m 0edef1fb-02ea-473d-b1bb-6bedc77c26b0
[90m 5[39m 1431a676-d36b-4653-8c82-db96bb157f13
[90m 6[39m 1d9d520e-b137-4182-92e4-3ac663791617
[90m 7[39m 2407e141-1a56-483b-931a-3380a6e5d1c7
[90m 8[39m 27137853-1129-4e99-b42c-0d601567b779
[90m 9[39m 27b0ed87-12b2-4d28-9da3-211de651cd9f
[90m10[39m 365dbbb0-3e6b-4479-8b99-6cc9e6f95620
[90m# ... with more rows[39m
[90m# i Use `print(n = ...)` to see more rows[39m

### This table of random ids could be used in simple joins, for example, below we get all the rows from the `vitals_weight` table for the osler ids in the `random_px` table.

In [21]:
return_table("vitals_weight") %>% 
    dplyr::inner_join(return_table(random_px), by="osler_id")

[90m# Source:   SQL [?? x 7][39m
[90m# Database: Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
   osler_id       encou~1 encou~2 admission_date      discharge_date weight
   [3m[90m<chr>[39m[23m            [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<dttm>[39m[23m              [3m[90m<dttm>[39m[23m         [3m[90m<chr>[39m[23m 
[90m 1[39m e490d206-db28~  [4m1[24m[4m1[24m[4m3[24m734 Office~ 2017-01-01 [90m00:00:00[39m [31mNA[39m             35.1  
[90m 2[39m e490d206-db28~  [4m2[24m[4m1[24m[4m6[24m667 Office~ 2016-11-30 [90m00:00:00[39m [31mNA[39m             34.93 
[90m 3[39m e490d206-db28~  [4m1[24m[4m7[24m[4m1[24m900 Office~ 2016-10-24 [90m00:00:00[39m [31mNA[39m             36.03 
[90m 4[39m e490d206-db28~  [4m4[24m[4m3[24m[4m9[24m056 Office~ 2017-07-02 [90m00:00:00[39m [31mNA[39m             36.8  
[90m 5[39m e490d206-db28~  [4m3[24m[4m9[24m[4m4[24m328 Office~ 2017-05