Large Files and Remote SAS via IOM on Jupyter #151

sylus · 2018-07-31T14:09:04Z

I hope you are well! My SAS container in jupyter with SASpy is working great but I have noticed an issue with larger datasets and curious what forms of mitigations there are.

Basically if I run the following this works (file size is 5kb):

import saspy
import pandas as pd

df = pd.read_sas(‘/home/jovyan/work/airline.sas7bdat’, format = ‘sas7bdat’)
df.describe()

Then if I run on a 8mb file:

import saspy
import pandas as pd

df = pd.read_sas('/home/jovyan/work/trauma.sas7bdat', format = 'sas7bdat')
df.describe()

Oddly though if I then run on a much larger document (file size is 250m) then I get no output at all.

Potential Mitigations

a) Find a proper solution to pass large files in SAS Session
b) Run SASPy + SAS together in same container
c) Give both containers a shared volume mount in k8s

tomweber-sas · 2018-07-31T14:29:28Z

Hey William! So, the code you're using in these examples uses neither saspy nor SAS. It's using pandas to read a SAS data set into python as a panda data frame. If I remember right, python (jupyter) is outside your container and on the client side? I can't say why pandas isn't producing output when you read a larger table. But is there enough memory for the python process where it's running?

sylus · 2018-07-31T14:41:00Z

Ah yeah thanks for the clarification, right now we are trying to figure out how to copy files over to SAS container for workload since is separate container. I'm hoping I am using at least the right approach to start this?

I will take a look at the python process as that makes sense ^_^

tomweber-sas · 2018-07-31T14:50:46Z

Ah, so you're looking to read the SAS data set into python and then use df2sd() (dataframe2sasdata()) to transfer it to SAS? That is a way to do it. With the IOM access method, that will generate a data step in SAS and stream the rows from the data frame over to SAS, writing them out a the SAS Data Set.

Both of your cases (b and c) would also work, and it would be more performant to have the SAS session directly access the data set. But, I know you were trying to not have to put saspy inside the container.
Mounting a volume for the container would allow you to keep python on the outside and give the SAS session direct access to the data. I guess it just depends upon your contraints.

Tom

sylus · 2018-07-31T17:46:25Z

Hey @tomweber-sas would you mind providing more info on how that works? Can't seem to find docs showing an example. Sorry to be a pain ^_^

So i launch in python kernel, and then running the following steps:

import saspy
import pandas as pd

df = pd.read_sas('/home/jovyan/work/trauma.sas7bdat', format = 'sas7bdat')
hr = pd.df2sd(df)

What do I now need to do in SAS Kernel to call the file?

tomweber-sas · 2018-07-31T18:36:16Z

You're no pain William. df2sd() is a saspy method of the SASsession object. Here's a sequence (hand typed in, so correct any typos :) ):

import saspy
import pandas as pd

# use pandas module to read a local SAS data set into a data frame.
df = pd.read_sas('/home/jovyan/work/trauma.sas7bdat', format = 'sas7bdat')

# connect to SAS server; sas is a SASsession object with all of its methods
sas = saspy.SASsession()

# use saspy to transfer the data frame to a SAS data set on the SAS server we're connected to
hr = sas.df2sd(df, 'tablename', 'libref-or-WORK-is-the-default')

# hr is a SASdata object and all of its methods are available ...
hr.head()
hr.describe()

The doc for saspy is here: https://sassoftware.github.io/saspy/api.html
It's not a cookbook thing, but if you look through the methods available, you'll see some patterns. There's also a sort of a walk through which should help out too, here: https://github.com/sassoftware/saspy/blob/master/saspy_example_github.ipynb

Let me know if this makes sense or if you have any problem!
Tom

tomweber-sas · 2018-07-31T18:49:33Z

Oh, and you mentioned the SAS_kernel. The code above is python, so that is in a python kernel. If you want to access that data via the sas_kernel, you would still need to move the data over w/ saspy directly, and write it to a permanent library, not WORK. Then, you could connect w/ a sas_kernel and access it directly; WELL, that is if there's a common storage location all of the docker images can get at. A connection to the saskernel would be to a different docker instance, right? So then the SAS data set wouldn't be in that container, right?
If there'a a common location that the docker images can all see, then this would work. You would need to assign a libref pointing to that storage location to write/read that SAS Data set. Here's the extra line in saspy:

# connect to SAS server; sas is a SASsession object with all of its methods
sas = saspy.SASsession()

# assign a permanent libref to write the data to
sas.saslib('perm', "'/perm/storage/path'")

# use saspy to transfer the data frame to a SAS data set on the SAS server we're connected to
hr = sas.df2sd(df, 'tablename', 'perm')

Make sense?
Tom

sylus · 2018-07-31T18:54:58Z

Ah that makes so much sense!

Thanks for walking me through this and I will try this tonight.

I was also made aware of a potential third option which is where we store all of our data in Azure Storage Blob Service (which we want to do anyways) and I can just have both containers use: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux

So have a range of options :) Thanks so much for taking the time, as always you are super helpful and it greatly appreciated :)

sylus changed the title ~~Large Files and Remote SAS via IOM~~ Large Files and Remote SAS via IOM on Jupyter Jul 31, 2018

sylus closed this as completed Jul 31, 2018

sylus mentioned this issue Aug 1, 2018

SAS Viya Deployment Size sassoftware/sas-container-recipes#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Files and Remote SAS via IOM on Jupyter #151

Large Files and Remote SAS via IOM on Jupyter #151

sylus commented Jul 31, 2018 •

edited

tomweber-sas commented Jul 31, 2018 •

edited

sylus commented Jul 31, 2018

tomweber-sas commented Jul 31, 2018

sylus commented Jul 31, 2018 •

edited

tomweber-sas commented Jul 31, 2018

tomweber-sas commented Jul 31, 2018

sylus commented Jul 31, 2018

Large Files and Remote SAS via IOM on Jupyter #151

Large Files and Remote SAS via IOM on Jupyter #151

Comments

sylus commented Jul 31, 2018 • edited

Potential Mitigations

tomweber-sas commented Jul 31, 2018 • edited

sylus commented Jul 31, 2018

tomweber-sas commented Jul 31, 2018

sylus commented Jul 31, 2018 • edited

tomweber-sas commented Jul 31, 2018

tomweber-sas commented Jul 31, 2018

sylus commented Jul 31, 2018

sylus commented Jul 31, 2018 •

edited

tomweber-sas commented Jul 31, 2018 •

edited

sylus commented Jul 31, 2018 •

edited