# Group and Aggregate CAS Tables
[Getting Started with Python Integration to SAS® Viya® - Part 9 - Summarize Columns](https://blogs.sas.com/content/sgf/2022/10/06/getting-started-with-python-integration-to-sas-viya-part-10-group-and-aggregate-cas-tables/) blog post

## Import Packages
Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [1]:
import swat
import pandas as pd

## custom personal module to connect to my CAS server environment
from casConnect import connect_to_cas 

## Make a Connection to CAS (REQUIRED: MODIFY CONNECTION INFORMATION)

##### To connect to the CAS server you will need:
1. the host name, 
2. the portnumber, 
3. your user name, and your password.

Visit the documentation [Getting Started with SAS® Viya® for Python](https://go.documentation.sas.com/doc/en/pgmsascdc/default/caspg3/titlepage.htm) for more information about connecting to CAS.

**Be aware that connecting to the CAS server can be implemented in various ways, so you might need to see your system administrator about how to make a connection. Please follow company policy regarding authentication.**

In [2]:
##
## Connect to CAS
##

## General connection syntax
# conn = swat.CAS(host, port, username, password)

## SAS Viya for Learners 3.5 connection
# hostValue = os.environ.get('CASHOST')
# portValue = os.environ.get('CASPORT')
# passwordToken=os.environ.get('SAS_VIYA_TOKEN')
# conn = swat.CAS(hostname=hostValue, port=portValue, password=passwordToken)

## Personal connection
try:
    conn = connect_to_cas()
    print('CAS connection succesful')
    print(conn)
except:
    print('No connection')
    pass

CAS connection succesful
CAS('ssemonthly.demo.sas.com', 443, protocol='https', name='py-session-1', session='ba763425-88c6-1e43-9a33-8b200370fed5')


## Load and explore data

In [3]:
conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.head()

NOTE: Cloud Analytic Services made the file WATER_CLUSTER.sashdat available as table WATER_CLUSTER in caslib CASUSER(Peter.Styliadis@sas.com).


Unnamed: 0,Year,Month,Day,Date,Serial,Property,Address,City,Zip,Lat,Property_type,Meter_Location,Clli,DMA,Weekday,Weekend,Daily_W_C_M3,Week,US Holiday,CLUSTER
0,2014.0,1.0,31.0,2014-01-31,955.0,773.0,1800 POST OAK BLVD,HOUSTON,77056.0,-95.461478,0.0,internal,HSTNTXNA,1.0,6.0,0.0,4.376,4.0,,4.0
1,2015.0,12.0,26.0,2015-12-26,1076.0,879.0,1811 E CROSSTIMBERS ST,HOUSTON,77093.0,-95.352264,0.0,external,HSTNTXOX,2.0,5.0,0.0,1.515,51.0,,4.0
2,2014.0,1.0,19.0,2014-01-19,955.0,773.0,1800 POST OAK BLVD,HOUSTON,77056.0,-95.461478,0.0,internal,HSTNTXNA,1.0,1.0,1.0,1.694,3.0,,4.0
3,2014.0,5.0,9.0,2014-05-09,871.0,706.0,17575 ALDINE WESTFIELD RD,HOUSTON,77073.0,-95.364653,0.0,external,HSTNTXWE,1.0,6.0,0.0,0.728,18.0,,4.0
4,2014.0,1.0,30.0,2014-01-30,955.0,773.0,1800 POST OAK BLVD,HOUSTON,77056.0,-95.461478,0.0,internal,HSTNTXNA,1.0,5.0,0.0,3.973,4.0,,4.0


## Using the SWAT groupby method

In [4]:
type(tbl.groupby('Serial')) 

swat.cas.table.CASTableGroupBy

In [5]:
df_serial = (tbl                    ## CAS table reference
             .groupby('Serial')     ## Group the CAS table
             .Daily_W_C_M3          ## Specify the CAS table column to aggregate
             .sum()                 ## Specify the aggregation
)
 
display(df_serial)

Serial
140.0        659.394
141.0      19503.757
198.0     263599.819
541.0       1873.732
542.0         53.159
             ...    
2345.0       294.313
2366.0       327.090
2367.0     12321.214
2440.0       728.555
2451.0       639.289
Name: Daily_W_C_M3, Length: 64, dtype: float64

In [6]:
(tbl                      ## CAS table reference          
 .groupby('Weekend')      ## Group the CAS table
 .Daily_W_C_M3            ## Specify the CAS table column to aggregate
 .mean()                  ## Specify the aggregation
 .rename({0:'Weekday',    ## Rename the values in the Series object returned from the CAS server on the client
          1:'Weekend'})
)

Weekend
Weekday    8.254251
Weekend    9.438843
Name: Daily_W_C_M3, dtype: float64

## Using the CASTable groupby parameter

In [7]:
tbl.params

{'name': 'water_cluster', 'caslib': 'casuser'}

In [8]:
tbl.groupby = 'Weekend'

In [9]:
tbl.params

{'name': 'water_cluster', 'caslib': 'casuser', 'groupby': 'Weekend'}

In [10]:
(tbl                      ## CAS table reference with the groupby parameter
 .Daily_W_C_M3            ## Specify the CAS table column to aggregate
 .mean()                  ## Specify the aggregation
 .rename({0:'Weekday',    ## Rename the values in the Series object returned from the CAS server on the client
          1:'Weekend'})
)

Weekend
Weekday    8.254251
Weekend    9.438843
Name: Daily_W_C_M3, dtype: float64

## Terminate the CAS Connection

In [11]:
conn.terminate()