# Descriptive Statistics
[Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics](https://blogs.sas.com/content/sgf/2022/01/07/getting-started-with-python-integration-sas-viya-part-6-descriptive-statistics/) blog post

## Import Packages
Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [1]:
import swat
import pandas as pd

## custom personal module to connect to my CAS server environment
from casConnect import connect_to_cas 

## Make a Connection to CAS

##### To connect to the CAS server you will need:
1. the host name, 
2. the portnumber, 
3. your user name, and your password.

Visit the documentation [Getting Started with SAS® Viya® for Python](https://go.documentation.sas.com/doc/en/pgmsascdc/default/caspg3/titlepage.htm) for more information about connecting to CAS.

**Be aware that connecting to the CAS server can be implemented in various ways, so you might need to see your system administrator about how to make a connection. Please follow company policy regarding authentication.**

In [2]:
##
## Connect to CAS
##

## General connection syntax
# conn = swat.CAS(host, port, username, password)

## SAS Viya for Learners 3.5 connection
# hostValue = os.environ.get('CASHOST')
# portValue = os.environ.get('CASPORT')
# passwordToken=os.environ.get('SAS_VIYA_TOKEN')
# conn = swat.CAS(hostname=hostValue, port=portValue, password=passwordToken)

## Personal connection using my custom module
conn = connect_to_cas()

## Load demo data into the CAS server
conn.read_csv('https://support.sas.com/documentation/onlinedoc/viya/exampledatasets/cars.csv',
              casout = {'name':'cars', 'caslib':'casuser'})

type(conn)

NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter.Styliadis@sas.com).
NOTE: The table CARS has been created in caslib CASUSER(Peter.Styliadis@sas.com) from binary data uploaded to Cloud Analytic Services.


swat.cas.connection.CAS

In [3]:
tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)

CASTable('cars', caslib='casuser')


## Preview the CAS Table

In [4]:
tbl.head()

Unnamed: 0,Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
0,Acura,MDX,SUV,Asia,All,36945.0,33337.0,3.5,6.0,265.0,17.0,23.0,4451.0,106.0,189.0
1,Acura,RSX Type S 2dr,Sedan,Asia,Front,23820.0,21761.0,2.0,4.0,200.0,24.0,31.0,2778.0,101.0,172.0
2,Acura,TSX 4dr,Sedan,Asia,Front,26990.0,24647.0,2.4,4.0,200.0,22.0,29.0,3230.0,105.0,183.0
3,Acura,TL 4dr,Sedan,Asia,Front,33195.0,30299.0,3.2,6.0,270.0,20.0,28.0,3575.0,108.0,186.0
4,Acura,3.5 RL 4dr,Sedan,Asia,Front,43755.0,39014.0,3.5,6.0,225.0,18.0,24.0,3880.0,115.0,197.0


## The Describe Method

In [5]:
tbl.describe()

Unnamed: 0,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
count,428.0,428.0,428.0,426.0,428.0,428.0,428.0,428.0,428.0,428.0
mean,32774.85514,30014.700935,3.196729,5.807512,215.885514,20.060748,26.843458,3577.953271,108.154206,186.36215
std,19431.716674,17642.11775,1.108595,1.558443,71.836032,5.238218,5.741201,758.983215,8.311813,14.357991
min,10280.0,9875.0,1.3,3.0,73.0,10.0,12.0,1850.0,89.0,143.0
25%,20329.5,18851.0,2.35,4.0,165.0,17.0,24.0,3103.0,103.0,178.0
50%,27635.0,25294.5,3.0,6.0,210.0,19.0,26.0,3474.5,107.0,187.0
75%,39215.0,35732.5,3.9,6.0,255.0,21.5,29.0,3978.5,112.0,194.0
max,192465.0,173560.0,8.3,12.0,500.0,60.0,66.0,7190.0,144.0,238.0


## Summary CAS Action

In [6]:
tbl.summary()

Unnamed: 0,Column,Min,Max,N,NMiss,Mean,Sum,Std,StdErr,Var,USS,CSS,CV,TValue,ProbT,Skewness,Kurtosis
0,MSRP,10280.0,192465.0,428.0,0.0,32774.85514,14027638.0,19431.716674,939.267478,377591600.0,620985400000.0,161231600000.0,59.28849,34.894059,4.160412e-127,2.798099,13.879206
1,Invoice,9875.0,173560.0,428.0,0.0,30014.700935,12846292.0,17642.11775,852.763949,311244300.0,518478900000.0,132901300000.0,58.778256,35.196963,2.684398e-128,2.83474,13.946164
2,EngineSize,1.3,8.3,428.0,0.0,3.196729,1368.2,1.108595,0.053586,1.228982,4898.54,524.7754,34.679034,59.656105,3.133745e-209,0.708152,0.541944
3,Cylinders,3.0,12.0,426.0,2.0,5.807512,2474.0,1.558443,0.075507,2.428743,15400.0,1032.216,26.834946,76.913766,1.515569e-251,0.592785,0.440378
4,Horsepower,73.0,500.0,428.0,0.0,215.885514,92399.0,71.836032,3.472326,5160.415,22151100.0,2203497.0,33.275059,62.173176,4.185344e-216,0.930331,1.552159
5,MPG_City,10.0,60.0,428.0,0.0,20.060748,8586.0,5.238218,0.253199,27.43892,183958.0,11716.42,26.111777,79.229235,1.866284e-257,2.782072,15.791147
6,MPG_Highway,12.0,66.0,428.0,0.0,26.843458,11489.0,5.741201,0.277511,32.96139,322479.0,14074.51,21.387709,96.729204,1.665621e-292,1.252395,6.045611
7,Weight,1850.0,7190.0,428.0,0.0,3577.953271,1531364.0,758.983215,36.686838,576055.5,5725125000.0,245975700.0,21.212776,97.52689,5.8125469999999994e-294,0.891824,1.688789
8,Wheelbase,89.0,144.0,428.0,0.0,108.154206,46290.0,8.311813,0.401767,69.08624,5035958.0,29499.82,7.68515,269.196577,0.0,0.962287,2.133649
9,Length,143.0,238.0,428.0,0.0,186.36215,79763.0,14.357991,0.69402,206.1519,14952830.0,88026.87,7.704349,268.525733,0.0,0.181977,0.614725


## Selecting Columns and Summary Statistics with the Summary Action

In [7]:
tbl.summary(inputs = ['MPG_City','MPG_Highway'])

Unnamed: 0,Column,Min,Max,N,NMiss,Mean,Sum,Std,StdErr,Var,USS,CSS,CV,TValue,ProbT,Skewness,Kurtosis
0,MPG_City,10.0,60.0,428.0,0.0,20.060748,8586.0,5.238218,0.253199,27.438924,183958.0,11716.420561,26.111777,79.229235,1.866284e-257,2.782072,15.791147
1,MPG_Highway,12.0,66.0,428.0,0.0,26.843458,11489.0,5.741201,0.277511,32.961386,322479.0,14074.511682,21.387709,96.729204,1.665621e-292,1.252395,6.045611


In [9]:
tbl.summary(inputs = ['MPG_City','MPG_Highway'],
            subSet = ['mean','min','max'])

Unnamed: 0,Column,Min,Max,Mean
0,MPG_City,10.0,60.0,20.060748
1,MPG_Highway,12.0,66.0,26.843458


## Creating a Calculated Column

In [10]:
tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

Unnamed: 0,Column,Min,Max,Mean
0,MPG_City,10.0,60.0,20.060748
1,MPG_Highway,12.0,66.0,26.843458
2,MPG_Avg,11.0,63.0,23.452103


## Terminate the CAS Connection

In [11]:
conn.terminate()