This notebook is provided as basic demonstration of use of the pmapUtilities R package, developed by Luke C. Mullany, PhD MS MHS, and available at https://github.com/lmullany/pmapUtilities.git and can be installed using `devtools::install_github("lmullany/pmapUtilities")`

- **Platform Tool** : Jupyter/RStudio Crunchr Compute Containers
- **Programming Language**: R (>=3.6)
- **Author(s)** : Luke C. Mullany
- **License** : The notebook is release under the [Apache 2.0 License] (https://www.apache.org/licenses/LICENSE-2.0)
- **Last Updated** : March 22, 2021

## Basic use of pmapUtilities R package

#### Clear workspace and load libraries

In [15]:
rm(list=ls())
library(pmapUtilities)

#### Generate a connection to the database

In [17]:
de = get_sql_connection("CAMP_PMCoE_Projection",username = "lmullan1")

Enter Password for lmullan1:  ·········


Note: name/rename your connection as 'default_engine' to avoid
 specifying an engine in subsequent pmap.utilities:: functions


#### Show all the tables in this database

In [18]:
print(list_tables(engine=de))

                 table
 1:           patients
 2:         encounters
 3:               labs
 4:               meds
 5:        problemlist
 6:         procedures
 7:           symptoms
 8:          vitals_BP
 9:      vitals_height
10:       vitals_pulse
11: vitals_respiration
12: vitals_temperature
13:      vitals_weight
14:        sysdiagrams


#### Call the same function, but this time ask for dimensions (`show_dimensions = TRUE`)

In [19]:
print(list_tables(engine=de,show_dimensions = TRUE, exact=T))

                 table    rows cols
 1:           patients   60676    5
 2:         encounters  753484    4
 3:               labs 3509868   12
 4:               meds  631022   10
 5:        problemlist  781379    4
 6:         procedures 5550672    6
 7:           symptoms 1967436    5
 8:          vitals_BP 1061684    7
 9:      vitals_height  305612    7
10:       vitals_pulse 1227099    7
11: vitals_respiration  957677    7
12: vitals_temperature  805345    7
13:      vitals_weight  406052    7
14:        sysdiagrams       0    5


#### List columns for a particular table

In [20]:
print(list_columns("encounters",engine=de))

[1] "osler_id"       "encounter_id"   "encounter_type" "encounter_date"


#### Get the number of rows and columns for just one table (rather than all tables, see above)

In [21]:
dims = get_table_dim("encounters", engine=de, exact = T)
print(dims)

  rows   cols 
753484      4 


#### Return a lazy handle to the table
Note that this feature is using dplyr/dbplyr under the hood, which enables us to translate dplyr verbs into SQL and execute on the table without pulling all rows from the table

In [22]:
encounters = return_table("encounters", engine=de)
encounters

[90m# Source:   SQL [?? x 4][39m
[90m# Database: Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
   osler_id                      encounter_id encounter_type encounter_date     
   [3m[90m<chr>[39m[23m                                [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m          [3m[90m<dttm>[39m[23m             
[90m 1[39m 5303550b-8ed2-42fd-885a-d32b~        [4m1[24m[4m7[24m938 Office Visit   2015-10-10 [90m00:00:00[39m
[90m 2[39m 5303550b-8ed2-42fd-885a-d32b~       [4m1[24m[4m4[24m[4m2[24m706 Office Visit   2016-01-24 [90m00:00:00[39m
[90m 3[39m 5303550b-8ed2-42fd-885a-d32b~       [4m5[24m[4m7[24m[4m1[24m465 Office Visit   2017-03-19 [90m00:00:00[39m
[90m 4[39m 5303550b-8ed2-42fd-885a-d32b~       [4m4[24m[4m3[24m[4m2[24m470 Office Visit   2016-10-22 [90m00:00:00[39m
[90m 5[39m 5303550b-8ed2-42fd-885a-d32b~       [4m4[24m[4m1[24m[4m0[24m795 Office Visit   2016-10-01 [90m00:00:00[39m
[90m 6

For example, we can do "SELECT encounter_type, count() as CT from encounters group by encounter_type ORDER BY COUNT()" without explicitly writing the SQL query

In [23]:
encounters %>% dplyr::group_by(encounter_type) %>% dplyr::summarize(ct = n()) %>% dplyr::arrange(desc(ct))

[90m# Source:     lazy query [?? x 2][39m
[90m# Database:   Microsoft SQL Server
#   13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
[90m# Ordered by: desc(ct)[39m
   encounter_type         ct
   [3m[90m<chr>[39m[23m               [3m[90m<int>[39m[23m
[90m 1[39m Office Visit       [4m4[24m[4m3[24m[4m1[24m080
[90m 2[39m Appointment         [4m9[24m[4m7[24m718
[90m 3[39m Hospital Encounter  [4m8[24m[4m6[24m789
[90m 4[39m Visit Encounter     [4m6[24m[4m2[24m740
[90m 5[39m Clinical Support    [4m2[24m[4m1[24m626
[90m 6[39m Procedure visit     [4m1[24m[4m8[24m256
[90m 7[39m Results Only         [4m8[24m895
[90m 8[39m Orders Only          [4m8[24m635
[90m 9[39m Anti-coag visit      [4m7[24m284
[90m10[39m Provider Procedure   [4m2[24m987
[90m# ... with more rows[39m

We can of course pull the entire table locally, if we desire, using `dplyr::collect()`

In [24]:
encounters_local = encounters %>% dplyr::collect()
print(encounters_local %>% head(10))

[90m# A tibble: 10 x 4[39m
   osler_id                      encounter_id encounter_type encounter_date     
   [3m[90m<chr>[39m[23m                                [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m          [3m[90m<dttm>[39m[23m             
[90m 1[39m 5303550b-8ed2-42fd-885a-d32b~        [4m1[24m[4m7[24m938 Office Visit   2015-10-10 [90m00:00:00[39m
[90m 2[39m 5303550b-8ed2-42fd-885a-d32b~       [4m1[24m[4m4[24m[4m2[24m706 Office Visit   2016-01-24 [90m00:00:00[39m
[90m 3[39m 5303550b-8ed2-42fd-885a-d32b~       [4m5[24m[4m7[24m[4m1[24m465 Office Visit   2017-03-19 [90m00:00:00[39m
[90m 4[39m 5303550b-8ed2-42fd-885a-d32b~       [4m4[24m[4m3[24m[4m2[24m470 Office Visit   2016-10-22 [90m00:00:00[39m
[90m 5[39m 5303550b-8ed2-42fd-885a-d32b~       [4m4[24m[4m1[24m[4m0[24m795 Office Visit   2016-10-01 [90m00:00:00[39m
[90m 6[39m 5303550b-8ed2-42fd-885a-d32b~       [4m6[24m[4m3[24m[4m1[24m974 Office Visit   2017

#### Use `query_db()` to submit any sql query directly to the database; by default it returns a lazy tbl

In [25]:
qry <- "SELECT encounter_type, ct = COUNT(*) FROM encounters GROUP BY encounter_type"
query_db(qry, engine=de) %>% dplyr::arrange(desc(ct))

[90m# Source:     SQL [?? x 2][39m
[90m# Database:   Microsoft SQL Server
#   13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
[90m# Ordered by: desc(ct)[39m
   encounter_type         ct
   [3m[90m<chr>[39m[23m               [3m[90m<int>[39m[23m
[90m 1[39m Office Visit       [4m4[24m[4m3[24m[4m1[24m080
[90m 2[39m Appointment         [4m9[24m[4m7[24m718
[90m 3[39m Hospital Encounter  [4m8[24m[4m6[24m789
[90m 4[39m Visit Encounter     [4m6[24m[4m2[24m740
[90m 5[39m Clinical Support    [4m2[24m[4m1[24m626
[90m 6[39m Procedure visit     [4m1[24m[4m8[24m256
[90m 7[39m Results Only         [4m8[24m895
[90m 8[39m Orders Only          [4m8[24m635
[90m 9[39m Anti-coag visit      [4m7[24m284
[90m10[39m Provider Procedure   [4m2[24m987
[90m# ... with more rows[39m

#### Use gen_random_table() to get subset of a table

In [34]:
# generate the temp table; the function returns the name of the temp table
random_px = gen_random_table("patients",idvars="osler_id", engine=de)
cat("Name of the newly created table is ", random_px,"\n")

Name of the newly created table is  #31e5ixaf9t0d8yr9 


In [39]:
return_table(random_px, engine=de) %>%
    dplyr::count()


[90m# Source:   lazy query [?? x 1][39m
[90m# Database: Microsoft SQL Server 13.00.5830[@ESMPMDBPR4/CAMP_PMCoE_Projection][39m
      n
  [3m[90m<int>[39m[23m
[90m1[39m  [4m1[24m000