# Python Dataframe to CAS Dataset Conversion

### Purpose
This Python Notebook shows you how to leverage the rdatasets package in SAS Viya for Learners 4. This is an act in 3 parts: <br>
<br>(1) Examine all the datasets available in rdatasets
<br>(2) Load a specific dataset of interest
<br>(3) Convert that python dataframe to table in CAS

Why do all that work? Well, by pushing to CAS, you can use other tools in SAS Viya like SAS Viya Analytics or SAS Model Studio.  And - additionally - you're a coder, so it's really not that much work!

In [1]:
# Import the required packages.
import rdatasets
import pandas as pd

### Part 1: Examine all the datasets available in rdatasets
There is a lot, so hold on!

In [2]:
rdatasets.summary()

Unnamed: 0,Package,Item,Title,Rows,Cols,n_binary,n_character,n_factor,n_logical,n_numeric,CSV,Doc
0,AER,Affairs,Fair's Extramarital Affairs Data,601,9,2,0,2,0,7,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
1,AER,ArgentinaCPI,Consumer Price Index in Argentina,80,2,0,0,0,0,2,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
2,AER,BankWages,Bank Wages,474,4,2,0,3,0,1,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
3,AER,BenderlyZwick,"Benderly and Zwick Data: Inflation, Growth and...",31,5,0,0,0,0,5,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
4,AER,BondYield,Bond Yield Data,60,2,0,0,0,0,2,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
...,...,...,...,...,...,...,...,...,...,...,...,...
2259,wooldridge,wage1,wage1,526,24,16,0,0,0,24,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
2260,wooldridge,wage2,wage2,935,17,4,0,0,0,17,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
2261,wooldridge,wagepan,wagepan,4360,44,37,0,0,0,44,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...
2262,wooldridge,wageprc,wageprc,286,20,0,0,0,0,20,https://vincentarelbundock.github.io/Rdatasets...,https://vincentarelbundock.github.io/Rdatasets...


### Part 2: Load a specific dataset of interest

Confession: I don't spend a lot of time running models on extramaritial affairs data. But it's the first one in the list, so let's just start with that, just for fun.

In [3]:
# Get the data ready to load
from rdatasets import data

In [4]:
# Load the "Affairs" dataset from the "AER" package
affairs_data = data(package='AER', item='Affairs')

In [5]:
# Let's check out a sample of the data
print(affairs_data.head())  # Print the first few rows of the dataset

   rownames  affairs  gender   age  yearsmarried children  religiousness  \
0         4        0    male  37.0         10.00       no              3   
1         5        0  female  27.0          4.00       no              4   
2        11        0  female  32.0         15.00      yes              1   
3        16        0    male  57.0         15.00      yes              5   
4        23        0    male  22.0          0.75       no              2   

   education  occupation  rating  
0         18           7       4  
1         14           6       4  
2         12           1       4  
3         18           6       5  
4         17           6       3  


### Part 3: Convert that python dataframe to table in CAS

In [6]:
# Load some SAS Packages so that we can access the CAS engine in SAS Viya
import os,swat

In [7]:
# Setup the access rules
conn = swat.CAS(os.environ['CAS_CONTROLLER'], 5570, password=os.environ['ACCESS_TOKEN'])

In [8]:
# Push Affairs Data to CAS
cas_table = conn.upload_frame(affairs_data, casout=dict(name='affairs_data', replace=True))

NOTE: Cloud Analytic Services made the uploaded file available as table AFFAIRS_DATA in caslib CASUSER(lincoln.groves@sas.com).
NOTE: The table AFFAIRS_DATA has been created in caslib CASUSER(lincoln.groves@sas.com) from binary data uploaded to Cloud Analytic Services.


In [9]:
# Save it in CAS so it persists after this session
cas_table.save(name="affairs_data.sashdat", replace=True)

NOTE: Cloud Analytic Services saved the file affairs_data.sashdat in caslib CASUSER(lincoln.groves@sas.com).


In [10]:
# Examine the tables in casuser... because why not?
conn.tableInfo(caslib = 'casuser')

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime,TableRedistUpPolicy
0,AFFAIRS_DATA,601,10,0,utf-8,2024-06-12T14:46:37+00:00,2024-06-12T14:46:37+00:00,2024-06-12T14:46:37+00:00,UTF8,2033823000.0,0,0,,,0,lincoln.groves@sas.com,,2024-06-12T14:46:37+00:00,2033823000.0,Not Specified
1,_PREPPED,5960,19,0,utf-8,2024-06-04T14:57:23+00:00,2024-06-04T14:58:23+00:00,2024-06-05T18:40:34+00:00,UTF8,2033132000.0,0,0,,,0,lincoln.groves@sas.com,,,,Not Specified
