# Update rows in a table
[Getting Started with Python Integration to SAS® Viya® - Part 18 - Update rows in a table](https://blogs.sas.com/content/sgf/2023/04/19/getting-started-with-python-integration-to-sas-viya-part-18-update-rows-in-a-table/)

In [1]:
## Packages
import swat
import sys
import os
import pandas as pd
import numpy as np

## My custom package to connect to the CAS Server. Will not work in your environment.
try:
    from casauth import CASAuth
    print('Imported personal custom CAS auth package')
except:
    print('casauth package not available')


print(f'Python version:{sys.version.split("|")[0]}')
print(f'swat version:{swat.__version__}')
print(f'pandas version:{pd.__version__}')
print(f'numpy version:{np.__version__}')

Imported personal custom CAS auth package
Python version:3.8.16 (default, Mar  2 2023, 03:18:16) [MSC v.1916 64 bit (AMD64)]
swat version:1.13.1
pandas version:1.5.3
numpy version:1.24.3


## Make a Connection to CAS (REQUIRED: MODIFY CONNECTION INFORMATION)

##### To connect to the CAS server you will need:
1. the host name, 
2. the portnumber, 
3. your user name, and your password.

Visit the documentation [Getting Started with SAS® Viya® for Python](https://go.documentation.sas.com/doc/en/pgmsascdc/default/caspg3/titlepage.htm) for more information about connecting to CAS.

**Be aware that connecting to the CAS server can be implemented in various ways, so you might need to see your system administrator about how to make a connection. Please follow company policy regarding authentication.**

In [3]:
##
## Connect to CAS
##

################################
## General connection syntax  ##
################################
# conn = swat.CAS(host, port, username, password)

############################################
## SAS Viya for Learners 3.5 connection   ##
############################################
# hostValue = os.environ.get('CASHOST')
# portValue = os.environ.get('CASPORT')
# passwordToken=os.environ.get('SAS_VIYA_TOKEN')
# conn = swat.CAS(hostname=hostValue, port=portValue, password=passwordToken)


##############################
## My Personal connection   ##
##############################
try:
    path = os.getenv('CAS_CREDENTIALS')
    pem_file = os.getenv('CAS_CLIENT_SSL_CA_LIST')
    conn = CASAuth(path, ssl_ca_list = pem_file)
except:
    print('No connection')
    pass

CAS Connection created


## Enter your connection information to CAS below

In [None]:
## conn = swat.CAS()

## Create demo CAS table

In [7]:
## Load the RAND_RETAILDEMO.sashdat file into memory on the CAS server
conn.loadTable(path = 'RAND_RETAILDEMO.sashdat', caslib = 'samples',
               casout = {
                      'name' : 'rand_retaildemo',
                      'caslib' : 'casuser',
                      'replace' : True
               })

## Reference the CAS table
retailTbl = conn.CASTable('rand_retaildemo', caslib = 'casuser')

## Create a copy of the table with a new column
(retailTbl
 .eval("age_dup = age", inplace = False)          ## create a duplicate of the age column
 .partition(casout = {'name':'rand_retaildemo',
                      'caslib':'casuser',
                      'replace':True})
)


## Create a list of columns to rename 
newColNames = [{'name':col,'rename':col.lower()} for col in retailTbl.columns.to_list()]

## List of columns to keep
keepColumns = ['custid','bucket','age','age_dup','loyalty_card','brand_name','channeltype','class']

## Rename and keep columns
retailTbl.alterTable(columns = newColNames, 
                     keep = keepColumns)

## Preview the new CAS table
display(retailTbl.shape, 
        retailTbl.tableDetails(),
        retailTbl.tableInfo(caslib = 'casuser'),
        retailTbl.head())

NOTE: Cloud Analytic Services made the file RAND_RETAILDEMO.sashdat available as table RAND_RETAILDEMO in caslib CASUSER(Peter.Styliadis@sas.com).


(930046, 8)

Unnamed: 0,Node,Blocks,Active,Rows,IndexSize,DataSize,VardataSize,CompressedSize,CompressionRatio,Mapped,MappedMemory,Unmapped,UnmappedMemory,Allocated,AllocatedMemory,DeletedRows,TableLocation
0,ALL,800,400,930046,0,372018400,0,0,0,400,372064736,400,372064736,0,0,0,CAS


Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,RAND_RETAILDEMO,930046,8,0,utf-8,2023-10-26T12:42:03+00:00,2023-10-26T12:42:04+00:00,2023-10-26T12:42:04+00:00,UTF8,2013943000.0,0,0,0,,,0,Peter.Styliadis@sas.com,,,


Unnamed: 0,custid,bucket,age,age_dup,loyalty_card,brand_name,channeltype,class
0,8750117.0,1.0,28.0,28.0,1.0,Pine,Internet,kids_bookcases
1,8750153.0,1.0,,,0.0,Pine,Internet,kids_outerwear
2,8750199.0,2.0,,,0.0,Pine,Internet,accessories
3,8750229.0,2.0,,,0.0,Pine,Internet,bath & body
4,8750333.0,1.0,,,0.0,Pine,Internet,oral care


## Simple column updates in place

In [8]:
retailTbl.update(set = [
    {'var':'brand_name', 'value':'upcase(brand_name)'},
    {'var':'channeltype', 'value':'lowcase(channeltype)'},
    {'var':'class', 'value':'propcase(class)'}
])

In [9]:
retailTbl.head()

Unnamed: 0,custid,bucket,age,age_dup,loyalty_card,brand_name,channeltype,class
0,8750117.0,1.0,28.0,28.0,1.0,PINE,internet,Kids_bookcases
1,8750153.0,1.0,,,0.0,PINE,internet,Kids_outerwear
2,8750199.0,2.0,,,0.0,PINE,internet,Accessories
3,8750229.0,2.0,,,0.0,PINE,internet,Bath & Body
4,8750333.0,1.0,,,0.0,PINE,internet,Oral Care


## Update column based on a conditions

In [10]:
retailTbl.distinct(inputs = ['age', 'age_dup'])

Unnamed: 0,Column,NDistinct,NMiss,Trunc
0,age,124.0,673447.0,0.0
1,age_dup,124.0,673447.0,0.0


Get the mean of the age column

In [11]:
meanAge = retailTbl.age.mean().round(3)
meanAge

43.577

In [12]:
(retailTbl
 .query("age is null")
 .update(set = [
     {'var':'age', 'value':f'{meanAge}'}])
)

### Confirm no missing values exists in age

In [9]:
retailTbl.distinct(inputs = ['age', 'age_dup'])

Unnamed: 0,Column,NDistinct,NMiss,Trunc
0,age,124.0,0.0,0.0
1,age_dup,124.0,673447.0,0.0


Notice that all the missing values (673,447) are now the mean age (44)

In [10]:
(retailTbl
 .age
 .value_counts()
)

43.577     673447
19.000       6996
23.000       6944
24.000       6941
21.000       6882
            ...  
97.000         26
98.000         25
94.000         21
105.000        20
140.000        18
Length: 124, dtype: int64

## Update rows using conditional logic

In [11]:
(retailTbl
 .update(set = [
     {'var':'age_dup', 'value':f'ifn(age_dup = . , {meanAge}, age_dup)'}])
)

### Confirm no missing values exists in age_dup

In [12]:
retailTbl.distinct(inputs = ['age', 'age_dup'])

Unnamed: 0,Column,NDistinct,NMiss,Trunc
0,age,124.0,0.0,0.0
1,age_dup,124.0,0.0,0.0


In [13]:
(retailTbl
 .age_dup
 .value_counts()
)

43.577     673447
19.000       6996
23.000       6944
24.000       6941
21.000       6882
            ...  
97.000         26
98.000         25
94.000         21
105.000        20
140.000        18
Length: 124, dtype: int64

## Save the CAS table as a data source file

In [14]:
retailTbl.save(name = 'retail_clean.parquet', caslib = 'casuser')

NOTE: Cloud Analytic Services saved the file retail_clean.parquet in caslib CASUSER(Peter.Styliadis@sas.com).


## Delete the source file

In [15]:
conn.deleteSource(source = 'retail_clean.parquet', caslib = 'casuser')

NOTE: Cloud Analytic Services removed the source data retail_clean.parquet from caslib CASUSER(Peter.Styliadis@sas.com).


## Terminate the CAS connection

In [16]:
conn.terminate()