# Capabilities Demonstration
Shows the features in Jupyter Helper functions. The following are demonstrated below:

1. How to connect to spark
2. How to specify RUN_ID
3. How to init Juyper Helper
4. Data Loading
    1. How to load business rules/reasons (final scoring)
    2. How to load business rules/reasons (scoring engine)
    3. How to load final scores
    4. How to load rollup
    5. How to load rollup participation
    6. How to load benchmarks
    7. How to load submision scores
    8. How to load base scores
5. How to load data from UDS

## Step 1 - Connect to Spark:

In [1]:
%reload_ext sparkmagic.magics
import pandas as pd
pd.set_option('display.max_colwidth', 100)
%store -r myuser
%store -r mypass
print('Hi ' + myuser + ', is your username printed correctly?. No, then run Utilities script.')

Hi J5D6, is your username printed correctly?. No, then run Utilities script.


In [42]:
%spark cleanup

In [6]:
%spark add -s $myuser -l scala -k -u https://ambari.impl.qppar.internal:8443/qpp-ar-impl-hadoop/default/livy/v1 -a $myuser -p $mypass

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
191,application_1555450382193_0031,spark,idle,Link,Link,✔


SparkSession available as 'spark'.


You must wait for the above step to finish. <span style="color:red">some **Do you see the yarn application id ?** No, then keep waiting...</span>

## Step 2 - Initialize your Run & Environment

**Which run are you dealing with ?**

In [8]:
%%spark 
val RUN_ID = "105"

RUN_ID: String = 105

**Initialize Jupyter Helper**

In [9]:
%%spark
import gov.cms.qpp.scoring.vo.fs._
import gov.cms.qpp.scoring.vo.rollup._
import gov.cms.qpp.scoring.vo.bs._
import gov.cms.qpp.scoring.vo.benchmarks._
import gov.cms.qpp.scoring.jupyter._
val jupyter = Jupyter.connect(spark, RUN_ID)

jupyter: gov.cms.qpp.scoring.jupyter.Jupyter = gov.cms.qpp.scoring.jupyter.Jupyter@354303f6

## Step 3 - Loading Data (from Hive)

### Loading reason codes associated with final scores

In [10]:
%%spark -o fsReasons
val fsReasons = jupyter.loadFsReasons

fsReasons: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.jupyter.FsReason] = [rule: string, ruleName: string ... 2 more fields]

In [None]:
fsReasons

### Loading reason codes associated with scoring engine

In [11]:
%%spark -o seReasons
val seReasons = jupyter.loadSeReasons

seReasons: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.jupyter.SeReason] = [id: string, statute: string ... 2 more fields]

In [None]:
seReasons

### Load the final score data from spark

In [14]:
%%spark -o fs
val fs = jupyter.loadFinalScores

fs: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.vo.fs.FinalScoreVO] = [tin: string, npi: string ... 26 more fields]

In [15]:
%%spark 
print (s" The total number of final score entries in run ${RUN_ID}: ${fs.count()}")

The total number of final score entries in run 105: 1577038

In [None]:
%%spark 
fs.show()

### Load Benchmarks

In [20]:
%%spark -o benchmarks
val benchmarks = jupyter.loadBenchmarks

benchmarks: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.vo.benchmarks.MeasureDecilesVO] = [measureId: string, submissionMethod: string ... 2 more fields]

In [21]:
%%spark 
print(s"Total benchmarks for ${RUN_ID}: ${benchmarks.count()}")

Total benchmarks for 105: 0

### Load the rollup 

In [22]:
%%spark -o rollup
val rollup = jupyter.loadRollup

rollup: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.vo.rollup.PiRollupScoreVO] = [apmEntityId: string, rollupScore: double ... 3 more fields]

In [23]:
%%spark
println(s"Total apm entities receiving rollup scores for ${RUN_ID}: ${rollup.count()}") 

Total apm entities receiving rollup scores for 105: 2375

In [None]:
rollup

### Load the rollup participation list

In [25]:
%%spark -o rollupParticipats
val rollupParticipats = jupyter.loadRollupParticipats

rollupParticipats: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.vo.rollup.ApmRollupParticipantVO] = [apmEntityId: string, tin: string ... 8 more fields]

In [26]:
%%spark
println(s"Number of entries available in rollup participation list for ${RUN_ID}: ${rollupParticipats.count()}") 

Number of entries available in rollup participation list for 105: 666958

In [None]:
rollupParticipats

### Load Base scores

In [32]:
%%spark -o baseScores 
val baseScores = jupyter.hive(s"select * from final_scoring.basescore_${RUN_ID}")

baseScores: org.apache.spark.sql.Dataset[gov.cms.qpp.scoring.vo.bs.SubmissionScoreVO] = [id: string, tin: string ... 25 more fields]

In [33]:
%%spark
println(s"Number of entries available in basescores for ${RUN_ID}: ${baseScores.count()}") 

Number of entries available in basescores for 105: 187982

In [None]:
baseScores

In [None]:
%%spark -o submissionScores 
val submissionScores = jupyter.hive(s"select * from final_scoring.submissionscore_${RUN_ID}")

### Load Submission scores

In [None]:
%%spark
println(s"Number of entries available in submission scores for ${RUN_ID}: ${submissionScores.count()}") 

In [None]:
submissionScores

## Loading Data (from UDS)

In [35]:
%%spark -o qps 
val qps = jupyter.uds("""
    SELECT DISTINCT npi, qp_status FROM active.provider
    WHERE run in (0, 4, 5) AND year = 2018 AND ( qp_status = 'Y' OR qp_status = 'Q' )
 """)

qps: org.apache.spark.sql.DataFrame = [npi: string, qp_status: string]

In [None]:
%%spark
qps.show()

In [37]:
%%spark
qps.printSchema()

root
 |-- npi: string (nullable = true)
 |-- qp_status: string (nullable = true)

## Done Now kill spark session 

In [39]:
%spark cleanup

In [41]:
%spark?

[0;31mDocstring:[0m
::

  %spark [-c CONTEXT] [-s SESSION] [-o OUTPUT] [-q [QUIET]]
             [-m SAMPLEMETHOD] [-n MAXROWS] [-r SAMPLEFRACTION] [-u URL]
             [-a USER] [-p PASSWORD] [-t AUTH] [-l LANGUAGE] [-k [SKIP]]
             [-i ID] [-e COERCE]
             [command [command ...]]

Magic to execute spark remotely.

This magic allows you to create a Livy Scala or Python session against a Livy endpoint. Every session can
be used to execute either Spark code or SparkSQL code by executing against the SQL context in the session.
When the SQL context is used, the result will be a Pandas dataframe of a sample of the results.

If invoked with no subcommand, the cell will be executed against the specified session.

Subcommands
-----------
info
    Display the available Livy sessions and other configurations for sessions.
add
    Add a Livy session given a session name (-s), language (-l), and endpoint credentials.
    The -k argument, if present, will skip adding this sessio