# DEMO: Imputing Missing Values

### 1. Import Packages and Connect to the CAS Server

Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [None]:
## Import packages
import swat
import pandas as pd
import matplotlib.pyplot as plt

## Set options
pd.set_option('max_columns', None)

## Connect to CAS
conn = swat.CAS('server.demo.sas.com', 30571, 'student', 'Metadata0', name='py03d07')

## Function to load the loans_raw table into memory if necessary
def loadloans():
    conn.loadTable(path='loans_raw.sashdat', caslib='casuser',
                   casOut={'name':'loans_raw',
                           'caslib':'casuser',
                           'promote':True})

### 2. Explore Available CAS Tables and Data Source Files


a. Use the **tableInfo** action to view all available in-memory tables in the casuser caslib. If the **loans_raw** CAS table is not available, uncomment the statement and execute the loadloans function.

In [None]:
#loadloans()
conn.tableInfo(caslib='casuser')

b. Reference the **loans_raw** CAS table.

In [None]:
tbl = conn.CASTable('loans_raw', caslib='casuser')

### 3. Quick Exploration

a. Preview the **loans_raw** table using the head method.

In [None]:
tbl.head()

b. View the missing values in the **loans_raw** CAS table using the nmiss method. Notice the **EmpLength** column contains missing values.

In [None]:
tbl.nmiss()

### 4. Impute Missing Values

a. Use the [dataPreprocess.impute](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-datapreprocess-impute.htm) action to modify missing values in the **EmpLength** column using the default parameters of the action. Notice CAS returns information on how the column was imputed. Here the default impute technique is the mean, and creates it creates a new column named **IMP_EmpLength**.

In [None]:
tbl.impute(input='EmpLength')

b. Check to see the if the values were fixed in the original table. Notice the original CAS table has not been modified. That is because the impute action imputes missing values for the action. No CAS table was created.

In [None]:
tbl.nmiss()

c. To store the CAS table with imputed missing values you can add the casOut parameter to create a new CAS table. Here a CAS table named **loans_imputed** is created in the **Casuser** caslib. The CAS server returns imputed column information, and output CAS table information.

In [None]:
tbl.impute(input='EmpLength', 
           casout={'name':'loans_imputed', 
                   'caslib':'casuser',
                   'replace':True})

d. Use the tableInfo action to view the new CAS table. Notice the new CAS table **loans_imputed** contains a single column.

In [None]:
conn.tableInfo(caslib='casuser')

e. Reference the new CAS table **loans_imputed** in the variable **impTbl** and execute the head method to preview the table. Notice only the imputed column **IMP_EmpLength** was saved.

In [None]:
impTbl = conn.CASTable('loans_imputed', caslib='casuser')
impTbl.head()

f. View the number of missing values in the **IMP_EmpLength** column using the nmiss method. Notice all missing values were replaced.

In [None]:
impTbl.nmiss()

f. To copy all columns when using the impute action, use the copyAllVars parameter and set it to *True*. Execute the impute action and preview the created CAS table. Notice the imputed column **IMP_EmpLength** and all original columns have been added to the new table.

In [None]:
tbl.impute(input='EmpLength',
           copyAllVars=True,
           casout={'name':'loans_imputed', 
                   'caslib':'casuser', 
                   'replace':True})

impTbl = conn.CASTable('loans_imputed', caslib='casuser')
impTbl.head()

### 5. Doing More with the Impute Action

a. Set the CAS table reference to the **loans_raw** CAS table. Add the where parmaeter to filter for rows with the value *Credit Card*, the vars parameter to select specific columns, and the computedVarsProgram to create a new calculated column. View the **tbl** object and notice the parameters have been added.

In [None]:
tbl = conn.CASTable('loans_raw', caslib = 'casuser')

## Add parameters to the CASTable object
tbl.where = 'Category = "Credit Card"'
tbl.vars = ['AccNumber', 'Salary', 'EmpLength', 'Amount', 'CCOpenDate']
tbl.computedVars = [{'name':'CCOpenDate', 'format':'DATE9.'}]
tbl.computedVarsProgram = 'CCOpenDate = mdy(Month, Day, Year)'

## Impute the EmpLength column and create a new CAS table
tbl.impute(input='EmpLength',
           copyVars = tbl.vars,
           methodInterval='MEDIAN',
           casout={'name':'cc_imputed',
                   'caslib':'casuser', 
                   'replace':True})

## Preview the new CAStable
impTbl = conn.CASTable('cc_imputed', caslib='casuser')

display(impTbl.head())
display(conn.tableInfo(caslib = 'casuser'))
display(impTbl.nmiss())

### 6. Terminate the CAS Session

a. It's best practice to always terminate the CAS session when you are done.

In [None]:
conn.terminate()