# Run Smv, Check Results and Answer Questions

## I. Build the Project

As discussed in the sample project, users need to build the project after initializaiton. Use `mvn package` or `sbt assembly` depending on the tool you use to compile Scala. Please note that if users are using [the latest SMV version](https://github.com/TresAmigosSD/SMV), this step may be skipped. Please carefully check relevant documents and test.

## II. Run Modules / Stages / App
After the project has been successfully built and compiled, users can choose to run a smv module in many ways:

### 1. Run using command line

#### Commands

Open a Terminal, users can run smv modules / stages by command line.  

**Python Interface**  
Run `CustomerMaster` module: **`smv-pyrun -m com.mycompany.airlineapp.datamodel.customermaster.CustomerMaster`**   

**Scala Interface**  
Run `CustomerMaster` module: **`smv-run -m datamodel.CustomerMaster`**  
Run all modules in `feature` stage: **`smv-run -s feature`**   
Run the entire airline app: **`smv-run --run-app`**  

The output files will be generated at the configured output location (for example AirlineApp/data/output). By default, smv will run the module and all its dependent modules (predecessors), so that users do not need to run modules one by one from the data flow. If we make some changes only in module `CustomerMaster` and then rerun, all predecessors will not rerun.

#### Auto-Versioning

One great functionality Smv offers is ***auto-versioning***, where each dataset will be tagged a version (examples as below) after running.   
**`com.mycompany.airlineapp.datamodel.customermaster.CustomerMaster_d0cbd3e4.csv  
com.mycompany.airlineapp.feature.cmprofilefeat.CustomerProfileFeature_34aaa51d.csv
`**

If any code or input changes for a module, when you rerun the module, you will have the old and new outputs with different version tags. Developers can check results and compare to older versions when needed. 

#### Purging Old Outputs

Smv also offers an option in `smv-rum` to purge old outputs based on the auto-versioning: **`smv-run --purge-old-output`**. Users are suggested to try this tool when there are a lot of datasets under the output directory.

### 2. Run in a jupyter notebook

Running a module in jupyter persists outputs in the same way as running in commandline, and as discussed before, with the notebook users can do instant checks and quality controls of the outputs.

Run `CustomerMaster` module

In [1]:
module_name="com.mycompany.airlineapp.datamodel.customermaster.CustomerMaster"

In [2]:
cm_mst = pdf(module_name)

In [3]:
cm_mst.show()

+------------+----------+--------+--------+------+--------+----------+
|     CUST_ID|BIRTH_YYYY|BIRTH_MM|BIRTH_DD|gender|CURR_LVL| ENROLL_DT|
+------------+----------+--------+--------+------+--------+----------+
|000544814415|      1976|       9|      19|     F|       0|2014-05-20|
|000407811114|      1979|       4|      30|     F|       0|      null|
|000876964176|      null|    null|    null|    NA|       1|      null|
|000499804303|      1971|       2|      26|     M|       3|2001-03-05|
|000163353775|      1990|       4|      13|     F|       1|2014-06-04|
|000682324216|      1981|      12|      26|     M|       0|2009-08-21|
|000134299922|      1956|       4|      26|     M|       0|      null|
|000365709222|      1953|      12|      20|     M|       5|1982-09-15|
|000402848256|      1957|       8|       9|     F|       0|2014-07-30|
|000531414908|      1977|       9|      25|     M|       0|2001-03-01|
|000171572471|      1973|       3|       1|     M|       0|1996-11-05|
|00080

Run `CustomerProfileFeature` module

In [4]:
cm_pro_feat = pdf("com.mycompany.airlineapp.feature.cmprofilefeat.CustomerProfileFeature")

In [5]:
cm_pro_feat.show(10)

+------------+--------------+---------------------+-------------+---------------+
|     CUST_ID|cmstr_tier_now|cmcnt_days_enroll_now|cmint_age_now|cmstr_gender_cd|
+------------+--------------+---------------------+-------------+---------------+
|000544814415|             0|                   72|           38|              F|
|000407811114|             0|                 null|           35|              F|
|000876964176|             1|                 null|         null|             NA|
|000499804303|             3|                 4896|           43|              M|
|000163353775|             1|                   57|           24|              F|
|000682324216|             0|                 1805|           33|              M|
|000134299922|             0|                 null|           58|              M|
|000365709222|             5|                11642|           61|              M|
|000402848256|             0|                    1|           57|              F|
|000531414908|  

In [6]:
cm_pro_feat.smvEdd()

CUST_ID              Non-Null Count         22
CUST_ID              Min Length             12
CUST_ID              Max Length             12
CUST_ID              Approx Distinct Count  22
cmstr_tier_now       Non-Null Count         22
cmstr_tier_now       Min Length             1
cmstr_tier_now       Max Length             1
cmstr_tier_now       Approx Distinct Count  5
cmcnt_days_enroll_now Non-Null Count         16
cmcnt_days_enroll_now Average                3910.9375
cmcnt_days_enroll_now Standard Deviation     3705.7524556424437
cmcnt_days_enroll_now Min                    1.0
cmcnt_days_enroll_now Max                    11642.0
cmint_age_now        Non-Null Count         19
cmint_age_now        Average                40.68421052631579
cmint_age_now        Standard Deviation     11.392261660047758
cmint_age_now        Min                    23.0
cmint_age_now        Max                    61.0
cmstr_gender_cd      Non-Null Count         22
cmstr_gender_cd      Min Length          

### 3. Run in Smv Shell / PyShell

Another alternative way to run a module is from Smv Shell, users can launch a shell session with command **`smv-pyshell`** or **`smv-shell`** in the terminal

**Python** 
```python
>>> a = pdf("com.mycompany.airlineapp.feature.cmprofilefeat.CustomerProfileFeature")
>>> a.smvEdd()
>>> ...
```

**Scala**
```scala
scala> val a = df("com.mycompany.airlineapp.feature.CustomerProfileFeature")
scala> a.smvEdd()
scala> ...
```

## III. Address the business requirement to understand the customer profile

Now with some customer features created, the business questions may be addressed.

### 1. How many customers joined the FFP programme in recent 3 years?

In [7]:
cm_pro_feat.filter(col("cmcnt_days_enroll_now")/365 < 3).count()

5

### 2. What is the average age and tenure of each tier?

In [8]:
cm_pro_feat.groupBy("cmstr_tier_now").agg(
    avg(col("cmint_age_now")).alias("tieravg_age_now"),
    avg(col("cmcnt_days_enroll_now")/365).alias("tieravg_years_enroll_now"),
).show()

+--------------+------------------+------------------------+
|cmstr_tier_now|   tieravg_age_now|tieravg_years_enroll_now|
+--------------+------------------+------------------------+
|             0|41.357142857142854|       9.333333333333334|
|             1|              29.0|                     0.0|
|             2|              32.0|                     8.0|
|             3|              43.0|                    13.0|
|             5|              61.0|                    31.0|
+--------------+------------------+------------------------+

