# Update rows in a table
[Getting Started with Python Integration to SAS® Viya® - Part 18 - Update rows in a table](https://blogs.sas.com/content/sgf/2023/04/19/getting-started-with-python-integration-to-sas-viya-part-18-update-rows-in-a-table/)

In [1]:
## Packages
import swat
import os
import pandas as pd
pd.set_option('display.max_columns', 50)
import numpy as np


## custom personal module to connect to my CAS environment
try:
    from casConnect import connect_to_cas 
except:
    print('CasConnect package not available')

## Make a Connection to CAS (REQUIRED: MODIFY CONNECTION INFORMATION)

##### To connect to the CAS server you will need:
1. the host name, 
2. the portnumber, 
3. your user name, and your password.

Visit the documentation [Getting Started with SAS® Viya® for Python](https://go.documentation.sas.com/doc/en/pgmsascdc/default/caspg3/titlepage.htm) for more information about connecting to CAS.

**Be aware that connecting to the CAS server can be implemented in various ways, so you might need to see your system administrator about how to make a connection. Please follow company policy regarding authentication.**

In [2]:
##
## Connect to CAS
##

## General connection syntax
# conn = swat.CAS(host, port, username, password)

## SAS Viya for Learners 3.5 connection
# hostValue = os.environ.get('CASHOST')
# portValue = os.environ.get('CASPORT')
# passwordToken=os.environ.get('SAS_VIYA_TOKEN')
# conn = swat.CAS(hostname=hostValue, port=portValue, password=passwordToken)

## Personal connection
try:
    conn = connect_to_cas()
    print('CAS connection succesful')
    print(conn)
except:
    print('No connection')
    pass

CAS connection succesful
CAS('ssemonthly.demo.sas.com', 443, protocol='https', name='py-session-1', session='1207df8d-3d86-d34a-aac6-9f222115242a')


## Create demo CAS table

In [3]:
## Load the RAND_RETAILDEMO.sashdat file into memory on the CAS server
conn.loadTable(path = 'RAND_RETAILDEMO.sashdat', caslib = 'samples',
               casout = {
                      'name' : 'rand_retaildemo',
                      'caslib' : 'casuser',
                      'replace' : True
               })

## Reference the CAS table
retailTbl = conn.CASTable('rand_retaildemo', caslib = 'casuser')

## Create a copy of the table with a new column
(retailTbl
 .eval("age_dup = age", inplace = False)          ## create a duplicate of the age column
 .partition(casout = {'name':'rand_retaildemo',
                      'caslib':'casuser',
                      'replace':True})
)


## Create a list of columns to rename 
newColNames = [{'name':col,'rename':col.lower()} for col in retailTbl.columns.to_list()]

## List of columns to keep
keepColumns = ['custid','bucket','age','age_dup','loyalty_card','brand_name','channeltype','class']

## Rename and keep columns
retailTbl.alterTable(columns = newColNames, 
                     keep = keepColumns)

## Preview the new CAS table
display(retailTbl.shape, 
        retailTbl.tableDetails(),
        retailTbl.tableInfo(caslib = 'casuser'),
        retailTbl.head())

NOTE: Cloud Analytic Services made the file RAND_RETAILDEMO.sashdat available as table RAND_RETAILDEMO in caslib CASUSER(Peter.Styliadis@sas.com).


(930046, 8)

Unnamed: 0,Node,Blocks,Active,Rows,IndexSize,DataSize,VardataSize,CompressedSize,CompressionRatio,Mapped,MappedMemory,Unmapped,UnmappedMemory,Allocated,AllocatedMemory,DeletedRows,TableLocation
0,ALL,802,401,930046,0,372018400,0,0,0,401,372064848,401,372064848,0,0,0,CAS


Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,ModTime,AccessTime,Global,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,RAND_RETAILDEMO,930046,8,0,utf-8,2023-05-09T18:20:43+00:00,2023-05-09T18:20:44+00:00,2023-05-09T18:20:48+00:00,UTF8,1999276000.0,1999276000.0,1999276000.0,0,0,0,0,,,0,Peter.Styliadis@sas.com,,,


Unnamed: 0,custid,bucket,age,age_dup,loyalty_card,brand_name,channeltype,class
0,4940875.0,1.0,40.0,40.0,1.0,Pine,Internet,kids_hats
1,4940875.0,1.0,40.0,40.0,1.0,Pine,Internet,kids_outerwear
2,4940875.0,2.0,40.0,40.0,1.0,Pine,Internet,bath & body
3,4940875.0,2.0,40.0,40.0,1.0,Pine,Internet,vitamins
4,4940985.0,2.0,,,0.0,Pine,Internet,computers


## Simple column updates in place

In [4]:
retailTbl.update(set = [
    {'var':'brand_name', 'value':'upcase(brand_name)'},
    {'var':'channeltype', 'value':'lowcase(channeltype)'},
    {'var':'class', 'value':'propcase(class)'}
])

In [5]:
retailTbl.head()

Unnamed: 0,custid,bucket,age,age_dup,loyalty_card,brand_name,channeltype,class
0,4940875.0,1.0,40.0,40.0,1.0,PINE,internet,Kids_hats
1,4940875.0,1.0,40.0,40.0,1.0,PINE,internet,Kids_outerwear
2,4940875.0,2.0,40.0,40.0,1.0,PINE,internet,Bath & Body
3,4940875.0,2.0,40.0,40.0,1.0,PINE,internet,Vitamins
4,4940985.0,2.0,,,0.0,PINE,internet,Computers


## Update column based on a conditions

In [6]:
retailTbl.distinct(inputs = ['age', 'age_dup'])

Unnamed: 0,Column,NDistinct,NMiss,Trunc
0,age,124.0,673447.0,0.0
1,age_dup,124.0,673447.0,0.0


Get the mean of the age column

In [7]:
meanAge = retailTbl.age.mean().round(3)
meanAge

43.577

In [8]:
(retailTbl
 .query("age is null")
 .update(set = [
     {'var':'age', 'value':f'{meanAge}'}])
)

### Confirm no missing values exists in age

In [9]:
retailTbl.distinct(inputs = ['age', 'age_dup'])

Unnamed: 0,Column,NDistinct,NMiss,Trunc
0,age,124.0,0.0,0.0
1,age_dup,124.0,673447.0,0.0


Notice that all the missing values (673,447) are now the mean age (44)

In [10]:
(retailTbl
 .age
 .value_counts()
)

43.577     673447
19.000       6996
23.000       6944
24.000       6941
21.000       6882
            ...  
97.000         26
98.000         25
94.000         21
105.000        20
140.000        18
Length: 124, dtype: int64

## Update rows using conditional logic

In [11]:
(retailTbl
 .update(set = [
     {'var':'age_dup', 'value':f'ifn(age_dup = . , {meanAge}, age_dup)'}])
)

### Confirm no missing values exists in age_dup

In [12]:
retailTbl.distinct(inputs = ['age', 'age_dup'])

Unnamed: 0,Column,NDistinct,NMiss,Trunc
0,age,124.0,0.0,0.0
1,age_dup,124.0,0.0,0.0


In [13]:
(retailTbl
 .age_dup
 .value_counts()
)

43.577     673447
19.000       6996
23.000       6944
24.000       6941
21.000       6882
            ...  
97.000         26
98.000         25
94.000         21
105.000        20
140.000        18
Length: 124, dtype: int64

## Save the CAS table as a data source file

In [14]:
retailTbl.save(name = 'retail_clean.parquet', caslib = 'casuser')

NOTE: Cloud Analytic Services saved the file retail_clean.parquet in caslib CASUSER(Peter.Styliadis@sas.com).


## Delete the source file

In [15]:
conn.deleteSource(source = 'retail_clean.parquet', caslib = 'casuser')

NOTE: Cloud Analytic Services removed the source data retail_clean.parquet from caslib CASUSER(Peter.Styliadis@sas.com).


## Terminate the CAS connection

In [16]:
conn.terminate()