# DEMO: Imputing Missing Values

### 1. Import Packages and Connect to the CAS Server

Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [14]:
## Packages
import swat
import pandas as pd
import numpy as np
from casConnect import connect_to_cas ## custom personal module

##
## Connect to CAS
##

## General connection syntax
# conn = swat.CAS(host, port, username, password)

## Viya for Learners 3.5 connection
# hostValue = os.environ.get('CASHOST')
# portValue = os.environ.get('CASPORT')
# passwordToken=os.environ.get('SAS_VIYA_TOKEN')
# conn = swat.CAS(hostname=hostValue, port=portValue, password=passwordToken)

## Personal connection
conn = connect_to_cas()

## Load the demonstration data into memory
Load the xxx

In [28]:
df = pd.DataFrame([
            [np.nan, 2, 45, 0, 'A'],
            [3, 4, np.nan, 1,'A'],
            [np.nan, np.nan, 50, np.nan,'B'],
            [np.nan, 3, np.nan, 4,],
            [2, 2, np.nan, 0, 'A'],
            [3, 4, np.nan, 1,'A'],
            [np.nan, np.nan, 75, np.nan,'B'],
            [np.nan, 3, 60, 4,]
            ],
            columns=list("ABCDE"))
castbl = conn.upload_frame(df, casout = {'name':'test', 'caslib':'casuser', 'replace':True})

NOTE: Cloud Analytic Services made the uploaded file available as table TEST in caslib CASUSER(Peter.Styliadis@sas.com).
NOTE: The table TEST has been created in caslib CASUSER(Peter.Styliadis@sas.com) from binary data uploaded to Cloud Analytic Services.


In [43]:
castbl.head(10)

Unnamed: 0,A,B,C,D,E
0,,2.0,45.0,0.0,A
1,3.0,4.0,,1.0,A
2,,,50.0,,B
3,,3.0,,4.0,
4,2.0,2.0,,0.0,A
5,3.0,4.0,,1.0,A
6,,,75.0,,B
7,,3.0,60.0,4.0,


In [44]:
castbl.nmiss()

A    5
B    2
C    4
D    2
E    2
dtype: int64

### 4. Impute Missing Values

a. Use the [dataPreprocess.impute](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-datapreprocess-impute.htm) action to modify missing values in the **EmpLength** column using the default parameters of the action. Notice CAS returns information on how the column was imputed. Here the default impute technique is the mean, and creates it creates a new column named **IMP_EmpLength**.

In [46]:
castbl.impute(input='A')

Unnamed: 0,Variable,ImputeTech,ResultVar,N,NMiss,ImputedValueContinuous
0,A,Mean,IMP_A,3.0,5.0,2.666667


c. To store the CAS table with imputed missing values you can add the casOut parameter to create a new CAS table. Here a CAS table named **loans_imputed** is created in the **Casuser** caslib. The CAS server returns imputed column information, and output CAS table information.

In [40]:
castbl.impute(input='A', 
              casout={'name':'impute_a', 
                      'caslib':'casuser',
                      'replace':True})

Unnamed: 0,Variable,ImputeTech,ResultVar,N,NMiss,ImputedValueContinuous
0,A,Mean,IMP_A,3.0,5.0,2.666667

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(Peter.Styliadis@sas.com),impute_a,8,1,"CASTable('impute_a', caslib='CASUSER(Peter.Sty..."


In [41]:
conn.tableInfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,TEST,8,5,0,utf-8,2023-01-30T14:10:59+00:00,2023-01-30T14:10:59+00:00,2023-01-30T14:18:39+00:00,UTF8,1990707000.0,0,0,0,,,0,Peter.Styliadis@sas.com,,2023-01-30T14:10:59+00:00,1990707000.0
1,IMPUTE_A,8,1,0,utf-8,2023-01-30T14:18:39+00:00,2023-01-30T14:18:39+00:00,2023-01-30T14:18:39+00:00,UTF8,1990708000.0,0,0,0,,,0,Peter.Styliadis@sas.com,,,
2,HOME_EQUITY_CAS_SAS,5960,20,0,utf-8,2023-01-25T18:24:31+00:00,2023-01-25T18:24:31+00:00,2023-01-25T18:24:31+00:00,UTF8,1990290000.0,0,0,0,,,0,Peter.Styliadis@sas.com,,,
3,HOME_EQUITY_CAS_PY,5960,20,0,utf-8,2023-01-25T18:27:48+00:00,2023-01-25T18:27:48+00:00,2023-01-25T18:43:53+00:00,UTF8,1990290000.0,0,0,0,,,0,Peter.Styliadis@sas.com,,,


In [42]:
impute_a = conn.CASTable('impute_a', caslib = 'casuser')
impute_a.head()

Unnamed: 0,IMP_A
0,2.666667
1,3.0
2,2.666667
3,2.666667
4,2.0


d. Use the tableInfo action to view the new CAS table. Notice the new CAS table **loans_imputed** contains a single column.

In [None]:
conn.tableInfo(caslib='casuser')

e. Reference the new CAS table **loans_imputed** in the variable **impTbl** and execute the head method to preview the table. Notice only the imputed column **IMP_EmpLength** was saved.

In [None]:
impTbl = conn.CASTable('loans_imputed', caslib='casuser')
impTbl.head()

f. View the number of missing values in the **IMP_EmpLength** column using the nmiss method. Notice all missing values were replaced.

In [None]:
impTbl.nmiss()

f. To copy all columns when using the impute action, use the copyAllVars parameter and set it to *True*. Execute the impute action and preview the created CAS table. Notice the imputed column **IMP_EmpLength** and all original columns have been added to the new table.

In [None]:
tbl.impute(input='EmpLength',
           copyAllVars=True,
           casout={'name':'loans_imputed', 
                   'caslib':'casuser', 
                   'replace':True})

impTbl = conn.CASTable('loans_imputed', caslib='casuser')
impTbl.head()

### 5. Doing More with the Impute Action

a. Set the CAS table reference to the **loans_raw** CAS table. Add the where parmaeter to filter for rows with the value *Credit Card*, the vars parameter to select specific columns, and the computedVarsProgram to create a new calculated column. View the **tbl** object and notice the parameters have been added.

In [None]:
tbl = conn.CASTable('loans_raw', caslib = 'casuser')

## Add parameters to the CASTable object
tbl.where = 'Category = "Credit Card"'
tbl.vars = ['AccNumber', 'Salary', 'EmpLength', 'Amount', 'CCOpenDate']
tbl.computedVars = [{'name':'CCOpenDate', 'format':'DATE9.'}]
tbl.computedVarsProgram = 'CCOpenDate = mdy(Month, Day, Year)'

## Impute the EmpLength column and create a new CAS table
tbl.impute(input='EmpLength',
           copyVars = tbl.vars,
           methodInterval='MEDIAN',
           casout={'name':'cc_imputed',
                   'caslib':'casuser', 
                   'replace':True})

## Preview the new CAStable
impTbl = conn.CASTable('cc_imputed', caslib='casuser')

display(impTbl.head())
display(conn.tableInfo(caslib = 'casuser'))
display(impTbl.nmiss())

### 6. Terminate the CAS Session

a. It's best practice to always terminate the CAS session when you are done.

In [48]:
conn.terminate()