Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload Files to SAS Server #187

Closed
chrishales709 opened this issue Nov 17, 2018 · 54 comments
Closed

Upload Files to SAS Server #187

chrishales709 opened this issue Nov 17, 2018 · 54 comments

Comments

@chrishales709
Copy link

I'm looking for a way to upload files to the SAS server. I'm also looking for a way to get information on SAS server files (ex: create date, modified date).

The use case I have for this is code deployment. I develop SAS programs locally using the Atom editor. Once I'm done developing and testing, I merge my code into the production branch of the project's git repository. Right now I have to manually copy the production branch files to the SAS server. I would like to develop a python program using saspy to compare production branch files to the files on the SAS server, and then replace outdated files with the newer versions of the file.

First, I was wondering if you could add a method for copying a file to the SAS server. Second, I saw the work done on the dirlist method. I was wondering if you could also return the create date and modified date along with the file name.

@tomweber-sas
Copy link
Contributor

I'm on vacation this week. I'll look into this when I get back after the holiday.
Thanks!
Tom

@tomweber-sas
Copy link
Contributor

@chrishales709 I believe both of these are doable. Just looking at the functions to get file information and, of course, they vary per OS. As a first thought, I'm thinking adding another method to get the various pieces of info, given a file path. See the different item you can get per OS here:
http://support.sas.com/documentation/cdl//en/lefunctionsref/69762/HTML/default/viewer.htm#p0cpuq4ew0dxipn1vtravlludjm7.htm
I think returning a dictionary from this method with the items/values. I think keeping this more atomic and using python to drive it makes more sense than trying to pile it all into one method.
So, you can use the dirlist(), and then, via python code, iterate (of pick the file you're only care about), and call the fileinfo() method to get a dict with whatever attributes you get for that file. If you know the file and want info, just call that; no need to get dirlist and info all piled together.

As for uploading a file, that can be done. Of course you need authority to create files wherever that SAS server is running, no magic here. But it shouldn't be hard to do. I will need to think through various use cases for this though to be sure this is useful for multiple cases. Binary transfer? Character w/ or w/out transcoding? ... And then a download too?

What are your thoughts?
Tom

@chrishales709
Copy link
Author

@tomweber-sas ,
That all sounds like a good plan. Adding in a download method would also be great. I don't have anything to add on the transfer method besides it might be good to have options of how to transfer files (binary vs text).
Thanks,
Chris

@mailbagrahul
Copy link

In my use case, users can download the file, makes changes on different tabs(assume it contains more than 100 tabs) and upload the file back to SAS server for various purposes(it circles for many edits).

@jpf5046
Copy link

jpf5046 commented Dec 4, 2018

This is great. Thanks Chris and Tom. I combine local and third party data for reporting. After getting the third party data via API, I upload the api df to SAS Server to finish the saspy script. It would be nice to seamlessly add the api df within a sas.submit statement.

@tomweber-sas
Copy link
Contributor

@chrishales709 I have an implementation for getting file information coded up. Here's an example showing this for my saspy directory (current dir - '.'). I get the list of files from the dirlist() method then iterate over them getting the file info for each file (excluding any directories). The file information is returned in a dataframe that you can interrogate at will. I returned it like that cuz I just did the implementation for the member list of tables for a libref for issue 182 and this was very similar. Let me know if a dataframe isn't what you want, and I'll see if I can convert it to something else. I'm not much of a dataframe programmer :)

>>> d1 = sas.dirlist('.')
>>>
>>> d1
['__init__.py', 'sasbase.py.bak', 'sasproccommons.py', 'version.py.bak', 'sasdecorator.py', 'sasbase.py', 'sasstat.py', 'sascfg_personal.py', 'sasdata.py', 'sasiohttp.py', 'titanic.csv', 'SASLogLexer.py', 'sasml.py.bak', 'sasqc.py', 'sasViyaML.py', 'sasutil.py', 'sasml.py', 'sasets.py', 'libname_gen.sas', 'sasresults.py', 'sasiostdio.py', '__pycache__/', 'sasdata.py.bak', 'sastabulate.py', 'doc/', 'autocfg.py', 'sasioiom.py', 'version.py',
>>>
>>> for file in d1:
...    if file[len(file)-1] != sas.hostsep:
...       sas.file_info('./'+file)
...
            infoname                                  infoval
0           Filename  /opt/tom/github/saspy/saspy/__init__.py
1         Owner Name                                   sastpw
2         Group Name                                      r&d
3  Access Permission                               -rw-r--r--
4      Last Modified                       30Nov2018:10:53:38
5  File Size (bytes)                                     1469
            infoname                                     infoval
0           Filename  /opt/tom/github/saspy/saspy/sasbase.py.bak
1         Owner Name                                      sastpw
2         Group Name                                         r&d
3  Access Permission                                  -rw-r--r--
4      Last Modified                          05Dec2018:08:22:37
5  File Size (bytes)                                       55341
            infoname                                        infoval
0           Filename  /opt/tom/github/saspy/saspy/sasproccommons.py
1         Owner Name                                         sastpw
2         Group Name                                            r&d
3  Access Permission                                     -rw-r--r--
4      Last Modified                             04Dec2018:16:52:25
5  File Size (bytes)                                          32090
            infoname                                     infoval
0           Filename  /opt/tom/github/saspy/saspy/version.py.bak
1         Owner Name                                      sastpw
2         Group Name                                         r&d
3  Access Permission                                  -rw-r--r--
4      Last Modified                          07Nov2018:09:32:50
5  File Size (bytes)                                          22

[...]  removing a bunch to shorten this

            infoname                                 infoval
0           Filename  /opt/tom/github/saspy/saspy/version.py
1         Owner Name                                  sastpw
2         Group Name                                     r&d
3  Access Permission                              -rw-r--r--
4      Last Modified                      04Dec2018:16:52:25
5  File Size (bytes)                                      22
            infoname                                   infoval
0           Filename  /opt/tom/github/saspy/saspy/sas_magic.py
1         Owner Name                                    sastpw
2         Group Name                                       r&d
3  Access Permission                                -rw-r--r--
4      Last Modified                        30Nov2018:10:53:38
5  File Size (bytes)                                      6713
            infoname                                infoval
0           Filename  /opt/tom/github/saspy/saspy/sascfg.py
1         Owner Name                                 sastpw
2         Group Name                                    r&d
3  Access Permission                             -rw-r--r--
4      Last Modified                     30Nov2018:10:53:38
5  File Size (bytes)                                  10630
>>>

And, here's just grabbing one:

>>> dinfo = sas.file_info('./sasbase.py')
>>> dinfo
            infoname                                 infoval
0           Filename  /opt/tom/github/saspy/saspy/sasbase.py
1         Owner Name                                  sastpw
2         Group Name                                     r&d
3  Access Permission                              -rw-r--r--
4      Last Modified                      05Dec2018:08:31:08
5  File Size (bytes)                                   55330
>>>

Thoughts?
Tom

@tomweber-sas
Copy link
Contributor

I think I'll change this to return a dictionary like I was thinking in the first place. Trying to navigate the df to get values isn't very clean. I can have it return either if you want', add a resutls=['df' | 'dict'] parameter. I think a dict just make more sense for this one.
Let me know what you think,
Tom

@tomweber-sas
Copy link
Contributor

Ok, got a dict being returned. Here's what it's like:

>>> finfo = sas.file_info_dict('./sasbase.py')
>>> finfo
{'Owner Name': 'sastpw', 'File Size (bytes)': '57448', 'Last Modified': '05Dec2018:11:47:25', 'Group Name': 'r&d', 'Filename': '/opt/tom/github/saspy/saspy/sasbase.py', 'Access Permission': '-rw-r--r--'}
>>>
>>> finfo['Last Modified']
'05Dec2018:11:47:25'
>>>
>>> for key in finfo.keys():
...   print(key+' = '+finfo[key])
...
Owner Name = sastpw
File Size (bytes) = 57448
Last Modified = 05Dec2018:11:47:25
Group Name = r&d
Filename = /opt/tom/github/saspy/saspy/sasbase.py
Access Permission = -rw-r--r--
>>>
>>> finfo.keys()
dict_keys(['Owner Name', 'File Size (bytes)', 'Last Modified', 'Group Name', 'Filename', 'Access Permission'])
>>> finfo.values()
dict_values(['sastpw', '57448', '05Dec2018:11:47:25', 'r&d', '/opt/tom/github/saspy/saspy/sasbase.py', '-rw-r--r--'])
>>>

@tomweber-sas
Copy link
Contributor

@chrishales709 I pushed this code to master so you can try it out. I ended up implementing it to return the dictionary. If you want a dataframe, just specify results='pandas' like this:

>>> filepath = '.'
>>> d1 = sas.dirlist(filepath)
>>>
>>> for file in d1:
...    if file[len(file)-1] != sas.hostsep:
...       sas.file_info(filepath+sas.hostsep+file)
...       sas.file_info(filepath+sas.hostsep+file, results='pandas')
...
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/__init__.py', 'Group Name': 'r&d', 'File Size (bytes)': '1469', 'Last Modified': '30Nov2018:10:53:38', 'Owner Name': 'sastpw'}
            infoname                                  infoval
0           Filename  /opt/tom/github/saspy/saspy/__init__.py
1         Owner Name                                   sastpw
2         Group Name                                      r&d
3  Access Permission                               -rw-r--r--
4      Last Modified                       30Nov2018:10:53:38
5  File Size (bytes)                                     1469
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/sasbase.py.bak', 'Group Name': 'r&d', 'File Size (bytes)': '59850', 'Last Modified': '05Dec2018:14:01:26', 'Owner Name': 'sastpw'}
            infoname                                     infoval
0           Filename  /opt/tom/github/saspy/saspy/sasbase.py.bak
1         Owner Name                                      sastpw
2         Group Name                                         r&d
3  Access Permission                                  -rw-r--r--
4      Last Modified                          05Dec2018:14:01:26
5  File Size (bytes)                                       59850
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/sasproccommons.py', 'Group Name': 'r&d', 'File Size (bytes)': '32655', 'Last Modified': '05Dec2018:12:13:21', 'Owner Name': 'sastpw'}
            infoname                                        infoval
0           Filename  /opt/tom/github/saspy/saspy/sasproccommons.py
1         Owner Name                                         sastpw
2         Group Name                                            r&d
3  Access Permission                                     -rw-r--r--
4      Last Modified                             05Dec2018:12:13:21
5  File Size (bytes)                                          32655
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/sascfg.py.bak', 'Group Name': 'r&d', 'File Size (bytes)': '10886', 'Last Modified': '05Dec2018:10:25:36', 'Owner Name': 'sastpw'}

I did the same with list_tables() method from #182 where I return a list of tuples (memname, memtype) by default now, but you can get it as a dataframe w/ results='pandas' on the list_tables() invocation.
I like saspy to work even if you don't have Pandas, so this way it does, and you can get the dataframe if you want.

Let me know how it works for you. Next thing on the list will be up/download of files. That may take a bit longer :)

Thanks!
Tom

@tomweber-sas
Copy link
Contributor

@jpf5046 Can you explain the comment

seamlessly add the api df within a sas.submit statement.

a little more? Maybe an example of what you're looking to do?
I'm not sure I understand that comment in the context of the methods I've put together for these couple issues.
These methods are at master now, so you can play with them and see what you think.
Thanks!
Tom

@chrishales709
Copy link
Author

@tomweber-sas I tested the file_info method, and it looks great. I did notice, however, that the 'Filename' value in both the dictionary version and dataframe version did not include the full path. For example,
f = sas.file_info('/path/to/my/file/file.sas')
got me this:
{'Filename': '/path/to/', 'Owner Name'...}
It actually didn't give me back the full path. Is there a max length for this value?

@tomweber-sas
Copy link
Contributor

Hey Chris, that's curious. I don't see that for either case. Can you send the saslog from after running that?

print(sas.saslog())

For the default case (dict) you should see the info in the log like:

 3451          data _null_;
3452             length infoname infoval $60;
3453             drop rc fid infonum i close;
3454             put 'INFOSTART';
3455             fid=fopen('_spfinfo');
3456             if fid then
3457                do;
3458                   infonum=foptnum(fid);
3459                   do i=1 to infonum;
3460                      infoname=foptname(fid, i);
3461                      infoval=finfo(fid, infoname);
3462                      put 'INFONAME=' infoname;
3463                      put 'INFOVAL=' infoval;
3464                   end;
3465                end;
3466             put 'INFOEND';
3467             close=fclose(fid);
3468             rc = filename('_spfinfo');
3469          run;
INFOSTART
INFONAME=Filename
INFOVAL=/opt/tom/github/saspy/saspy/sascfg.py
INFONAME=Owner Name
INFOVAL=sastpw
INFONAME=Group Name
INFOVAL=r&d
INFONAME=Access Permission
INFOVAL=-rw-r--r--
INFONAME=Last Modified
INFOVAL=06Dec2018:13:20:15
INFONAME=File Size (bytes)
INFOVAL=10885
INFOEND
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

Here's one file info I get. I don't see any kind of truncation. Maybe we'll see something in your log.

{'Filename': '/opt/tom/github/saspy/saspy/sascfg.py', 'File Size (bytes)': '10885', 'Group Name': 'r&d', 'Owner Name': 'sastpw', 'Last Modified': '06Dec2018:13:20:15', 'Access Permission': '-rw-r--r--'}
            infoname                                infoval
0           Filename  /opt/tom/github/saspy/saspy/sascfg.py
1         Owner Name                                 sastpw
2         Group Name                                    r&d
3  Access Permission                             -rw-r--r--
4      Last Modified                     06Dec2018:13:20:15
5  File Size (bytes)                                  10885

@tomweber-sas
Copy link
Contributor

BTW, I see your path was linux, but I also tried this on windows and I'm not seeing truncation either. I do see that the default for displaying the dataframe truncates the column, but that's only a display thing, the whole value is actually there. Could it be something like that, where it's just not displaying it?

image

@jpf5046
Copy link

jpf5046 commented Dec 7, 2018

@jpf5046 Can you explain the comment

seamlessly add the api df within a sas.submit statement.

a little more? Maybe an example of what you're looking to do?
I'm not sure I understand that comment in the context of the methods I've put together for these couple issues.
These methods are at master now, so you can play with them and see what you think.
Thanks!
Tom

Here's a better example, I have a file on my desktop that python reads, df = read.csv('desktop/file.txt'), I then want to have that df to be run with a SAS dataset on the SAS server. Right now, python can have the file loaded locally, but I cannot merge the dataframe in a sas.submit statement. I must upload 'desktop/file.txt' with SAS EG-on to the server- to run with SAS code.

Is there a way to take my local df and upload it to the SAS server, so I can run code that might look like this, and do this all within Jupyter:

c = sas.submit("""

proc sql;
create table new_table as
 select
*
from work.df;
quit; 
""")

...where work.df is the file from my desktop?

@tomweber-sas
Copy link
Contributor

Oh, yes, that's been in saspy since day 1. It's the dataframe2sasdata() method; df2sd() for short.
You would simply do the following

#assume your SASsession object is named 'sas'
sas = saspy.SASsession()

# read in your dataframe
df = read.csv('desktop/file.txt')

# upload data frame to work.new_table on SAS server
new_table = sas.df2sd(df, 'new_table')

# and now have a SASdata object in python that refers to it
new_table.head()

Here's a run doing this:

tom64-3> python3.5
Python 3.5.5 (default, Feb  6 2018, 10:56:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import saspy
>>> sas = saspy.SASsession()
SAS Connection established. Subprocess id is 9991

>>> import pandas as pd
>>> df = pd.read_csv('./titanic.csv')
>>> df.head()
   Unnamed: 0  PassengerId  Survived  Pclass  \
0           1            1         0       3
1           2            2         1       1
2           3            3         1       3
3           4            4         1       1
4           5            5         0       3

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1
2                             Heikkinen, Miss. Laina  female  26.0      0
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1
4                           Allen, Mr. William Henry    male  35.0      0

   Parch            Ticket     Fare Cabin Embarked
0      0         A/5 21171   7.2500   NaN        S
1      0          PC 17599  71.2833   C85        C
2      0  STON/O2. 3101282   7.9250   NaN        S
3      0            113803  53.1000  C123        S
4      0            373450   8.0500   NaN        S
>>> new_table = sas.df2sd(df, 'new_table')
>>> new_table.head()
   Unnamed: 0  PassengerId  Survived  Pclass  \
0           1            1         0       3
1           2            2         1       1
2           3            3         1       3
3           4            4         1       1
4           5            5         0       3

                                                Name     Sex  Age  SibSp  \
0                            Braund, Mr. Owen Harris    male   22      1
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female   38      1
2                             Heikkinen, Miss. Laina  female   26      0
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female   35      1
4                           Allen, Mr. William Henry    male   35      0

   Parch            Ticket     Fare Cabin Embarked
0      0         A/5 21171   7.2500   NaN        S
1      0          PC 17599  71.2833   C85        C
2      0  STON/O2. 3101282   7.9250   NaN        S
3      0            113803  53.1000  C123        S
4      0            373450   8.0500   NaN        S
>>> new_table
Libref  = WORK
Table   = new_table
Dsopts  = {}
Results = Pandas

>>>

@jpf5046
Copy link

jpf5046 commented Dec 7, 2018

@tomweber-sas I was using the syntax incorrectly. Thank you for providing the example! I'm all set.

@tomweber-sas
Copy link
Contributor

@jpf5046 Great. Just open another issue of you have any other questions!
Tom

@chrishales709
Copy link
Author

@tomweber-sas

I think I found what may be causing the error. My SAS setup has really long path names (ex: 150+ characters). First, it looks like the SAS code is limiting the length of infoval on lines 1286 and 1313. The length of infoname and infoval are set to 60, so it would cut off any long path names. I tried setting the length to 500 on these lines, and that appears to have fixed the issue on the SAS side. The SAS log now shows the full value of infoval. However, infoval covered multiple lines in the log, so it causes an issue for line 1341 where the value is parsed out of the log. My infoval looked like this in the log:

INFOVAL=

/imagine/this/is/a/very/long/sas/path/that/co

vers/multiple/lines/filename.sas

As a result, I'm actually getting '' when I run f['Filename']

@tomweber-sas
Copy link
Contributor

Oh, of course, that's cut-n-passted right out of the SAS example doc for this. I didn't even see it looking at it :(
I've addressed this at master. Can you pull master and try it again and see if both cases work now? I bumped it up to 256, which is also the max linesize.

@chrishales709
Copy link
Author

Sorry for the late response. I've been out of town. I tested the fix for both dictionaries and data frames, and everything looks good. Thanks!

Are you still working on the upload/download piece?

@tomweber-sas
Copy link
Contributor

No problem. Thanks for verifying!
Well, I haven't actually started on those yet, unfortunately. Been pulled in other directions around here. If I'm lucky, next week will be a slow week, w/ the holiday and all, and I'll be able to spend some time on those.
Always happy to have external contributors too though, :) :)
Thanks,
Tom

@tomweber-sas
Copy link
Contributor

Hey, I've got an upload implementation working, both STDIO and IOM. Just did it, so it certainly needs more testing and such. But, it works for the cases I've tried. It's a binary transfer, or an image copy, if you will.
I've run it with a simple text file, and html document file, and an executable program (truly binary file).
Everything diffs equal and obviously are the same length. More details to work through before it's production, but I've pushed it to a branch called upload-download.

File permissions is something that still needs to be addressed. Right now, it's all defaults.

If you have a chance, try it out from there. I'll continue on it and see about the equivalent download next. It's not the fastest thing in the world, but for having to do it all w/ python and SAS code, it isn't too bad.

import saspy
sasstd = saspy.SASsession(cfgname='sdssas')
sasiom = saspy.SASsession(cfgname='iomj')

sasstd.upload('/u/sastpw/tomin', '/u/sastpw/tomout_std')
sasstd.upload('/u/sastpw/sashtml.htm', '/u/sastpw/tomhtml_std')
sasstd.upload('/u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy', '/u/sastpw/tommaspy_std')

sasiom.upload('/u/sastpw/tomin', '/u/sastpw/tomout_iom')
sasiom.upload('/u/sastpw/sashtml.htm', '/u/sastpw/tomhtml_iom')
sasiom.upload('/u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy', '/u/sastpw/tommaspy_iom')

Here's listing the files after:

 38 Dec 19 12:15 /u/sastpw/tomin
 38 Dec 20 11:45 /u/sastpw/tomout_std

 34866 Dec 18 16:13 /u/sastpw/sashtml.htm
 34866 Dec 20 11:45 /u/sastpw/tomhtml_std

 647196 Dec 12 13:00 /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy
 647196 Dec 20 11:45 /u/sastpw/tommaspy_std

 38 Dec 19 12:15 /u/sastpw/tomin
 38 Dec 20 11:45 /u/sastpw/tomout_iom

 34866 Dec 18 16:13 /u/sastpw/sashtml.htm
 34866 Dec 20 11:45 /u/sastpw/tomhtml_iom

 647196 Dec 12 13:00 /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy
 647196 Dec 20 11:45 /u/sastpw/tommaspy_iom

Tom

@tomweber-sas
Copy link
Contributor

Ok, I added in the permission= option. I'm afraid it's just the exact string the Filename statement wants. But, that's the same on Unix and Windows; portable syntax document in the Filename statement section of each host guide:
https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=hostwin&docsetTarget=chfnoptfmain.htm&locale=en#p1m24anc2sxjp1n1futk0ekxn3to
https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=hostunx&docsetTarget=n1cwdt7h01vaken0zl8veh8x3ybc.htm&locale=en

Here's the one for the executable file, showing the resulting permissions:

print(sasstd.upload('/u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy', '/u/sastpw/tommaspy_std',
permission='A::u::rwx,A::g::rwx,A::o::r-x'))

tom64-3> ll /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy   /u/sastpw/tommaspy_std
-rwxrwxr-x 1 userid groupid 647196 Dec 12 13:00 /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy
-rwxrwxr-x 1 userid groupid 647196 Dec 20 12:19 /u/sastpw/tommaspy_std
tom64-3>

Let me know what you find!
Thanks,
Tom

@mailbagrahul
Copy link

mailbagrahul commented Dec 20, 2018

Hey , I tried uploading a file but it is running forever. Any thoughts?

image

@tomweber-sas
Copy link
Contributor

how big is the file? I haven't tried anything significant;y large. I just pushed a fix for 0 length files, which would hang (run indefinitely). Try something small first to see if it works?

@mailbagrahul
Copy link

I tried with 2Mb file. Let me try with 1kb file.

@mailbagrahul
Copy link

mailbagrahul commented Dec 21, 2018

image

Is this due to permission?

@tomweber-sas
Copy link
Contributor

Well, I'm not sure exactly. I also tried to write to something that wasn't valid after I saw your first problem.
That was maybe a different error:

>>> print(sasstd.upload('/u/sastpw/tomin', '/fff'))

75
76            filename saspydir '/fff' recfm=F encoding=binary lrecl=1 permission='';
77            data _null_;
78            file saspydir;
79            infile datalines;
80            input;
81            lin = length(_infile_);
82            outdata = inputc(_infile_, '$hex.', lin);
83            lout = lin/2;
84            put outdata $varying80. lout;
85            datalines4;
ERROR: Insufficient authorization to access /fff.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.01 seconds

87   ;;;;
88
89   run;
90

That's a different error. But, obviously, you have to have permission to create the file you're trying to create. There's no magic about doing this. If you can't submit the equivalent code to create a file from the SAS server, you won't be able to do it via saspy, as I'm just submitting SAS code.

What's that path? is it a valid file, or is it an existing directory, such that it's not a valid file to create? I can't say off the top of my head why you would get that specific error. I haven't seen that error in anything I've tried so far.

Tom

@tomweber-sas
Copy link
Contributor

Aha. Yes, I get that error when I specify a directory. I guess I could add support for accessing the target and seeing if it's a directory, then get the file name from the source and use that. But, for now, just specify the file name and see if it's working like you think.

>>> print(sasiom.upload('/u/sastpw/tomin', '/u/sastpw/tomdir'))
4                                                                                                                        The SAS System                                                                                          09:36 Friday, December 21, 2018

29
30                  filename saspydir '/u/sastpw/tomdir' recfm=F encoding=binary lrecl=1 permission='';
31                  data _null_;
32                  file saspydir;
33                  infile datalines;
34                  input;
35                  if _infile_ = '' then delete;
36                  lin = length(_infile_);
37                  outdata = inputc(_infile_, '$hex.', lin);
38                  lout = lin/2;
39                  put outdata $varying80. lout;
40                  datalines4;

ERROR: Invalid file, /u/sastpw/tomdir.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
5                                                                                                                        The SAS System                                                                                          09:36 Friday, December 21, 2018

@chrishales709
Copy link
Author

I was able to upload a small file from Windows to a Linux server without any issue. I was also able to upload a slightly larger csv file (23 KB). This works for my use case. The only feedback I would have would be to replace the log print out with a message (ex: 'Finished uploading example.sas (xx sec)' or 'Unable to upload example.sas').

@tomweber-sas
Copy link
Contributor

Great, thanks. I'll see about changing up what's returned. This needs to be able to be interrogated programmatically after to see if it succeeded or not. I'm thinking of a dict w/ a status and the log segment, not unlike what's returned in batch mode, or from submit(). That way you can test it and you have the log to see what happened if it failed.

Also, I have an initial implementation of download pushed to the upload-download branch now too. Same deal w/ return there for now. And, I have some optimizations to do and error handling, like in upload. But, it's working and if you want to try it, that's be great.

Thank,
Tom

@jasonphillips
Copy link
Contributor

Out of curiosity, why couldn't the STDIO implementation use a socket connection on a local port to stream the file to SAS, the way it currently handles downloads but in reverse?

@tomweber-sas
Copy link
Contributor

Hey @jasonphillips , So, to get better performance by not having to convert to hex string and informat that back to real binasy in SAS? Well, that's a good idea. I don't expect there's a reason that can't work. I guess I was just following the pattern of df2sd() and sd2df(), where df2sd could work w/ the STDIO channels and didn't require special support that might not be available (ability to open sockets between the two machines).
I'm finishing up (hopefully), some more functionality on these; return Dict with Success and LOG, error handling for files that don't exist, and supporting a directory instead of filename for the destination - using filename from source.
After that, I can try our this idea and see if there's any issue, and see how much better it performs. Probably similar to sd2df() vs. sd2df_CVS(). The bigger the file the more it matters, I would suspect.

Hey, while I have you, did you see the new saspy_examples repo? I copied your tabluate notebook there, but wanted to see if you wanted to push it there yourself (PR) so it had your id as the contributor there. I was going to delete it from the saspy repo then.

Thanks for this idea, it should help out performance!
Tom

@tomweber-sas
Copy link
Contributor

Here's what I'm thinking for what's returned from upload and download. Oh, and you can see the dest is a directory, so the souce file name is used for the dest file name (in the log):

>>> res = sas.upload(  '/u/sastpw/compare/maspy35_up',  r'C:\Users\sastpw\Documents\updown')
>>> res.keys()
dict_keys(['Success', 'LOG'])
>>>
# so you can do the following
>>>
>>> res = sas.upload(  '/u/sastpw/compare/maspy35_up',  r'C:\Users\sastpw\Documents\updown')
>>> print(res['Success'])
True
>>> if not res['Success']:
...     print(res['LOG'])
...
>>>
# and print the log at will, of course
>>> print(res['LOG'])
12                                                                                                                       The SAS System                                                                                         12:06 Thursday, January 10, 2019

11874
11875               filename saspydir 'C:\Users\sastpw\Documents\updown\maspy35_up' recfm=F encoding=binary lrecl=1 permission='';
11876               data _null_;
11877               file saspydir;
11878               infile datalines;
11879               input;
11880               if _infile_ = '' then delete;
11881               lin = length(_infile_);
11882               outdata = inputc(_infile_, '$hex.', lin);
11883               lout = lin/2;
11884               put outdata $varying80. lout;
11885               datalines4;

NOTE: The file SASPYDIR is:
      Filename=C:\Users\sastpw\Documents\updown\maspy35_up,
      RECFM=F,LRECL=1,File Size (bytes)=0,
      Last Modified=10Jan2019:12:30:05,
      Create Time=10Jan2019:12:06:14
13                                                                                                                       The SAS System                                                                                         12:06 Thursday, January 10, 2019

NOTE: DATA statement used (Total process time):
      real time           3.33 seconds
      cpu time            2.60 seconds

NOTE: 234496 records were written to the file SASPYDIR.

23613      ;;;;
23614
14                                                                                                                       The SAS System                                                                                         12:06 Thursday, January 10, 2019

23615
23616      run;
23617      filename saspydir;
NOTE: Fileref SASPYDIR has been deassigned.
23618
23619
>>>

Thoughts?
Tom

@mailbagrahul
Copy link

I used upload-download branch and ran .upload() function but it is not successful.

image

@tomweber-sas
Copy link
Contributor

I haven't pushed those last things yet. I'm still working on them. The output I showed was still just from my development repo. Once I finish it up I'll push it out for you to try. Note that's just the log that was returned, not the Dictionary I'll be returning.
Was just looking for feedback on if that look good to you or do you think you would need something different?
I'll let you know when these latest features are at master. For now, specify valid files and upload and download should work.

Thanks,
Tom

@mailbagrahul
Copy link

Ah. Your example looks good and the format is also good.

I was about to ask is it possible to have similar success key for every functions in saspy. In my use case, when I execute any saspy function from GUI, I would like to throw some message to user(specifically when it fails). Any thoughts?

@tomweber-sas
Copy link
Contributor

ok, I just pushed these features. Go ahead and try it out and let me know how it works.

As for changing the API to all methods in saspy, I can't do that. But, there are many methods you can tell if they worked or failed. Some I couldn't tell either way anyway, so I couldn't say. There are methods for getting SAS automarco variables which are basically return codes and statuses for SAS code that was submitted, so those would be useful for a number of situations.

If you have any specific cases, I can look at them to see what can be done. Happy to do that.
But, really, I can't change something that pervasive and break peoples existing code.

Also, the Batch mode might help out in this case. It returns a dict of LOG LST, like submit(), so you may be able to use that to accomplish what you need. For instance, for a given method, if the LST is empty, that may mean it failed, or you can check the log for a known error that proves it worked or didn't.

Tom

@mailbagrahul
Copy link

mailbagrahul commented Jan 10, 2019

I tested with different file sizes. Here are my findings.
1Kb - 5 seconds
20kb - 15 seconds
32kb - 40 seconds
1mb - roughtly 5 minutes

@tomweber-sas
Copy link
Contributor

upload or download or both the same?

@mailbagrahul
Copy link

upload. I have to test for download.

@mailbagrahul
Copy link

Download seems to be lot better. 300kb took just 3 seconds.

@tomweber-sas
Copy link
Contributor

Then I'll @jasonphillips great suggestion and re-implement upload using sockets which should make it run about the same as the download. For now, if you use small files and see if there are any holes in the implementation, that's be great. Handling invalid files, permissions, ... all the edge cases. I'll work on the other implementation next.

Oh, wait, I bet you're using IOM, not STDIO over SSH. I'll have to see about that, it's not like STDIO. But, I may be able to get the 'reverse' to work, so I'll look into both of those cases. Having to encode the binary into hex chars and reconvert to binary is a horrible way to have to do it, but that was a first pass that got us this far.

Thanks,
Tom

@tomweber-sas
Copy link
Contributor

Ok, I've re-implemented upload in STDIO via sockets. @jasonphillips , are you able to try this out? You have linux? I think everyone else is on Windows and can't try it. BTW, the original implementation is still in there to compare against. You have to go to the access method to call it though, so:

res_sock = sas.upload         ('local', 'remote')
res_slow = sas._io.upload_slow('local', 'remote')

I'm looking at the IOM access method now, and it will require more changes than what STDIO took. I'll have to chance the java code as well as python. It'll take some time to work through. But, hopefully I can get it working similarly.

Tom

@jasonphillips
Copy link
Contributor

Great, I gave it a try, generating some dummy files of exact sizes, and saw the following speeds (reporting "real time" from log):

sas.upload()

#  50k   -  .02 seconds
#  500k  -  .10 seconds
#  1MB   -  .25 seconds
#  5MB   - 1.06 seconds
# 25MB - 5.13 seconds

sas._io.upload_slow()

#  50k   -   .18 seconds
#  500k  -  1.68 seconds
#  1MB   -  3.58 seconds
#  5MB   - 15.53 seconds
# 25MB -  1.26 minutes

Both look like linear scales, but indeed the socket method is about 10-15x faster.

I did seem to be having some issues with the socket not being freed up immediately afterward, although haven't investigated thoroughly yet. Just after an upload, any calls that use a socket (uploading another file, or using sd2df() methods) returned the generic socket error; it cleared after waiting for another 30 seconds or so. That doesn't happen for me with many repeated uses of the other socket methods, so perhaps something isn't being freed up in this case.

@tomweber-sas
Copy link
Contributor

Hey Jason, thanks for verifying that. I am using ephemeral ports for these, so it should use a different port each time and not need to wait for a timeout. Unless, if you are using a tunneling port over SSH, then I have to use that port instead of an ephemeral, and that could be the cause of the delay. I'll dig into this further too to see if I see anything suspicious. I'm going to try to get the IOM access method working first though.
Thanks gain,
Tom

@jasonphillips
Copy link
Contributor

I am using in tunneling port in my case, so that might explain it; odd that the other methods using the port don't lock it up even with many quick calls in a row, but the file transfer holds it for a bit until another request using sockets can complete.

@tomweber-sas
Copy link
Contributor

Thanks Jason. I just pushed an implementation of binary stream transfer on upload for IOM. It should behave comparably to the download for IOM now, like the up/down for STDIO. I'll look into this STDIO issue next, now that I have the IOM case working.

@chrishales709 @mailbagrahul can you guys try out the new upload for you IOM cases and see if it's working and faster for you? Just like the STDIO, I left the otiginal implementation in there so you can compare. See above comments for running (sas._io.upload_slow())

@jasonphillips I have one idea about this delay, given it doesn't happen for the other cases. In all cases except this upload, I'm transferring data from SAS to saspy, and saspy is the socket 'server' (creates and accepts the connection). In this upload case, I'm transferring data the other way, and the socket connection is still the same direction. So, I will try reversing the socket connection to see if that might fix this. It could be that the linger is set when SAS is receiving, not transmitting, since at close, it isn't the one that shut down the socket. I may be able to try this out today and see.

Tom

@mailbagrahul
Copy link

Ok. I tried with both cases and I see sas.upload() is pretty faster(2Mb - .25seconds) than sas._io.upload_slow() (2Mb - 4minutes)

And download() seems to be taking long time(more than 3+ minutest) to download 250kb file.

@tomweber-sas
Copy link
Contributor

@mailbagrahul thanks for trying it out. Something must be wrong w/ your download. It should be very similar to upload. I can download/upload 2M in 1-2 seconds; granted I don't have a significant network delay in these cases. Can you provide any more details on what you're seeing? You are using IOM, right?

Thanks!
Tom

Here's a run with a 2M executable from jupyter:

import time
start = time.localtime()
res = sas.download(r'C:\Users\sastpw\Documents\updown\cprxp_dn', '/u/sastpw/compare/cprxp')
finish = time.localtime()
print(res['Success'])
print(res['LOG'])

True
51                                                                                                                       The SAS System                                                                                         07:55 Thursday, January 17, 201
9

646        
647                 data _null_;
648                 infile '/u/sastpw/compare/cprxp'         recfm=F encoding=binary lrecl=4096;
649                 file _tomods1 recfm=N;
650                 input;
651                 put _infile_;
652                 run;

NOTE: The infile '/u/sastpw/compare/cprxp' is:
      Filename=/u/sastpw/compare/cprxp,
      Owner Name=sastpw,Group Name=r&d,
      Access Permission=-r-xr-xr-x,
      Last Modified=10Jan2019:16:43:19,
      File Size (bytes)=2075894

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file _TOMODS1 is:
      Filename=/sastmp/SAS_workEB8500000A40_tom64-3/SAS_work388A00000A40_tom64-3/_tomods1,
      Owner Name=sastpw,Group Name=r&d,
      Access Permission=-rw-r--r--,
      Last Modified=17Jan2019:08:00:38

NOTE: 507 records were read from the infile '/u/sastpw/compare/cprxp'.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              364.37k
      OS Memory           21664.00k
      Timestamp           01/17/2019 08:00:38 AM
      Step Count                        275  Switch Count  0
      Page Faults                       1
      Page Reclaims                     16
      Page Swaps                        0
      Voluntary Context Switches        49
      Involuntary Context Switches      31
      Block Input Operations            440
      Block Output Operations           4056
      

653        
654        
52                                                                                                                       The SAS System                                                                                         07:55 Thursday, January 17, 201
9

655        

print(start)
print(finish)
print(start)
print(finish)


time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=0, tm_sec=37, tm_wday=3, tm_yday=17, tm_isdst=0)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=0, tm_sec=38, tm_wday=3, tm_yday=17, tm_isdst=0)

_up
start = time.localtime()
res = sas.upload(r'C:\Users\sastpw\Documents\updown\cprxp_dn', '/u/sastpw/compare/cprxp_up')
finish = time.localtime()
print(res['Success'])
print(res['LOG'])
True
55                                                                                                                       The SAS System                                                                                         07:55 Thursday, January 17, 201
9

693        
694        
695                    data _null_;
696                    infile _tomods1 recfm=F encoding=binary lrecl=4096;
697                    file  '/u/sastpw/compare/cprxp_up'                  recfm=N permission='';
698                    input;
699                    put _infile_;
700                    run;

NOTE: The infile _TOMODS1 is:
      Filename=/sastmp/SAS_workEB8500000A40_tom64-3/SAS_work388A00000A40_tom64-3/_tomods1,
      Owner Name=sastpw,Group Name=r&d,
      Access Permission=-rw-r--r--,
      Last Modified=17Jan2019:08:01:20,
      File Size (bytes)=2075894

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file '/u/sastpw/compare/cprxp_up' is:
      Filename=/u/sastpw/compare/cprxp_up,
      Owner Name=sastpw,Group Name=r&d,
      Access Permission=-rw-r--r--,
      Last Modified=17Jan2019:08:01:20

NOTE: 507 records were read from the infile _TOMODS1.
NOTE: DATA statement used (Total process time):
      real time           0.24 seconds
      user cpu time       0.00 seconds
      system cpu time     0.02 seconds
      memory              360.68k
      OS Memory           21408.00k
      Timestamp           01/17/2019 08:01:20 AM
      Step Count                        277  Switch Count  0
      Page Faults                       1
      Page Reclaims                     18
      Page Swaps                        0
      Voluntary Context Switches        43
      Involuntary Context Switches      31
      Block Input Operations            440
      Block Output Operations           4056
      

701        
702        

print(start)
print(finish)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=1, tm_sec=19, tm_wday=3, tm_yday=17, tm_isdst=0)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=1, tm_sec=21, tm_wday=3, tm_yday=17, tm_isdst=0)
tom64-2> ll cprxp cprxp_up
-r-xr-xr-x 1 sastpw r&d 2075894 Jan 10 16:43 cprxp
-rw-r--r-- 1 sastpw r&d 2075894 Jan 17 08:01 cprxp_up
tom64-2> diff cprxp cprxp_up
tom64-2>

@tomweber-sas
Copy link
Contributor

Ok, for STDIO I was able to reproduce that 30 second delay, when tunneling, and only when tunneling. STDIO nor over SSH and STDIO over SSH w/out tunnel has no delay, as it's just ephemeral ports which aren't reused; so no problem. In the SSH w/ tunnel, the delay was only after an upload, not because of download or other sas data to data frame methods (which use the socket). This is all as @jasonphillips described.

This did turn out to be a case of the socket being closed in the opposite direction (sequence) compared to all of the other cases (like I suspected). I reworked the upload implementation for this case to have SAS create the socket and accept a connection from saspy (the opposite of the other cases), so that the shutdown sequence was in the right direction. The side that did the connect (not the accept) closed down, and then the creator of the socket (accept) shut down, which eliminated the linger delay.

There was a problem with this however, in that there was a delay with the SAS side accepting the connection from saspy. The connect succeeded immediately, because it was really connecting to ssh, but then writing to data to SAS would end up failing when the buffer got full because the connect hadn't been accepted on the SAS side. When I put a delay in, between submitting the SAS code and doing the connect, it worked, but that's an arbitrary delay and I don't care for that implementation.

So, I currently just connect and start writing, but catch that exception which happens (if it happens, it could connect first try - it's all timing), and start over. So far I haven't seen this fail, and it succeeds as fast as it can; no arbitrary amount of time to wait.

The one thing I had to add for this to work, is a reverse tunnel port. Can't use the tunneling port for the revers server socket. So, I added 'rtunnel' : portnum to the configuration definition and key off of that to do this reverse server upload. If rtunnel isn't there, it's the usual client socket case like all of the others.

Here's a sample:

sshrtun  = {'saspath' : '/sas/tools/com/sdssas',
            'ssh'     : '/usr/bin/ssh',
            'host'    : 'tom64-2', 
            'tunnel'  : 32701,
            'rtunnel' : 32702,
            'options' : ["-t", "dev/mva-v940m3", "-box", "laxnd"]
            }

So, what's at the upload-download branch, as of now, has upload() for both IOM and STDIO, and STDIO over SSH (tunnel and not) all working and as fast as they can be. I see no delays anymore except for the case where you have ssh and tunnel but not rtunnel, which I can't help. That still works, but you have to wait 30 seconds before the next socket method.

Feel free to try this out and let me know what you see. I've tried it as many ways as I can and it looks good so far.

Thanks!
Tom

@tomweber-sas
Copy link
Contributor

Hey everyone (@chrishales709 @mailbagrahul @jasonphillips), have you had a chance to try out the latest versions? I've been off on other things myself, so I haven't really messed with this since my last post (which it says was 18 days ago). Everything was working for me, but knowing that you're getting the same for your cases is what I'd like to verify before merging this into master. If so, I will merger this in. If there's any issues you observe, I'd like to address them before merging this in.
Thanks!
Tom

@tomweber-sas
Copy link
Contributor

Hey everyone, I've gone ahead merged this in and I've created a new pip and release. V2.4.2 which contain this, and other things that were at master. I'm thinking of closing this issue, and if you run into anything with the code that's now out there, we can start a fresh issue for that.

Thanks,
Tom

@tomweber-sas
Copy link
Contributor

Closing this issue as all of this functionality is in the current release. I did implement some performance enhancements for the IOM access method for this which were pushed to version 2.4.3. If you find any issues, just open another issue with the specifics.

Thanks!
Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants