New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload Files to SAS Server #187
Comments
I'm on vacation this week. I'll look into this when I get back after the holiday. |
@chrishales709 I believe both of these are doable. Just looking at the functions to get file information and, of course, they vary per OS. As a first thought, I'm thinking adding another method to get the various pieces of info, given a file path. See the different item you can get per OS here: As for uploading a file, that can be done. Of course you need authority to create files wherever that SAS server is running, no magic here. But it shouldn't be hard to do. I will need to think through various use cases for this though to be sure this is useful for multiple cases. Binary transfer? Character w/ or w/out transcoding? ... And then a download too? What are your thoughts? |
@tomweber-sas , |
In my use case, users can download the file, makes changes on different tabs(assume it contains more than 100 tabs) and upload the file back to SAS server for various purposes(it circles for many edits). |
This is great. Thanks Chris and Tom. I combine local and third party data for reporting. After getting the third party data via API, I upload the api df to SAS Server to finish the saspy script. It would be nice to seamlessly add the api df within a |
@chrishales709 I have an implementation for getting file information coded up. Here's an example showing this for my saspy directory (current dir - '.'). I get the list of files from the dirlist() method then iterate over them getting the file info for each file (excluding any directories). The file information is returned in a dataframe that you can interrogate at will. I returned it like that cuz I just did the implementation for the member list of tables for a libref for issue 182 and this was very similar. Let me know if a dataframe isn't what you want, and I'll see if I can convert it to something else. I'm not much of a dataframe programmer :)
And, here's just grabbing one:
Thoughts? |
I think I'll change this to return a dictionary like I was thinking in the first place. Trying to navigate the df to get values isn't very clean. I can have it return either if you want', add a resutls=['df' | 'dict'] parameter. I think a dict just make more sense for this one. |
Ok, got a dict being returned. Here's what it's like:
|
@chrishales709 I pushed this code to master so you can try it out. I ended up implementing it to return the dictionary. If you want a dataframe, just specify results='pandas' like this:
I did the same with list_tables() method from #182 where I return a list of tuples (memname, memtype) by default now, but you can get it as a dataframe w/ results='pandas' on the list_tables() invocation. Let me know how it works for you. Next thing on the list will be up/download of files. That may take a bit longer :) Thanks! |
@jpf5046 Can you explain the comment
a little more? Maybe an example of what you're looking to do? |
@tomweber-sas I tested the file_info method, and it looks great. I did notice, however, that the 'Filename' value in both the dictionary version and dataframe version did not include the full path. For example, |
Hey Chris, that's curious. I don't see that for either case. Can you send the saslog from after running that?
For the default case (dict) you should see the info in the log like:
Here's one file info I get. I don't see any kind of truncation. Maybe we'll see something in your log.
|
BTW, I see your path was linux, but I also tried this on windows and I'm not seeing truncation either. I do see that the default for displaying the dataframe truncates the column, but that's only a display thing, the whole value is actually there. Could it be something like that, where it's just not displaying it? |
Here's a better example, I have a file on my desktop that python reads, Is there a way to take my local
...where work.df is the file from my desktop? |
Oh, yes, that's been in saspy since day 1. It's the dataframe2sasdata() method; df2sd() for short.
Here's a run doing this:
|
@tomweber-sas I was using the syntax incorrectly. Thank you for providing the example! I'm all set. |
@jpf5046 Great. Just open another issue of you have any other questions! |
I think I found what may be causing the error. My SAS setup has really long path names (ex: 150+ characters). First, it looks like the SAS code is limiting the length of infoval on lines 1286 and 1313. The length of infoname and infoval are set to 60, so it would cut off any long path names. I tried setting the length to 500 on these lines, and that appears to have fixed the issue on the SAS side. The SAS log now shows the full value of infoval. However, infoval covered multiple lines in the log, so it causes an issue for line 1341 where the value is parsed out of the log. My infoval looked like this in the log:
As a result, I'm actually getting |
Oh, of course, that's cut-n-passted right out of the SAS example doc for this. I didn't even see it looking at it :( |
Sorry for the late response. I've been out of town. I tested the fix for both dictionaries and data frames, and everything looks good. Thanks! Are you still working on the upload/download piece? |
No problem. Thanks for verifying! |
Hey, I've got an upload implementation working, both STDIO and IOM. Just did it, so it certainly needs more testing and such. But, it works for the cases I've tried. It's a binary transfer, or an image copy, if you will. File permissions is something that still needs to be addressed. Right now, it's all defaults. If you have a chance, try it out from there. I'll continue on it and see about the equivalent download next. It's not the fastest thing in the world, but for having to do it all w/ python and SAS code, it isn't too bad.
Tom |
Ok, I added in the permission= option. I'm afraid it's just the exact string the Filename statement wants. But, that's the same on Unix and Windows; portable syntax document in the Filename statement section of each host guide: Here's the one for the executable file, showing the resulting permissions:
Let me know what you find! |
how big is the file? I haven't tried anything significant;y large. I just pushed a fix for 0 length files, which would hang (run indefinitely). Try something small first to see if it works? |
I tried with 2Mb file. Let me try with 1kb file. |
Well, I'm not sure exactly. I also tried to write to something that wasn't valid after I saw your first problem.
That's a different error. But, obviously, you have to have permission to create the file you're trying to create. There's no magic about doing this. If you can't submit the equivalent code to create a file from the SAS server, you won't be able to do it via saspy, as I'm just submitting SAS code. What's that path? is it a valid file, or is it an existing directory, such that it's not a valid file to create? I can't say off the top of my head why you would get that specific error. I haven't seen that error in anything I've tried so far. Tom |
Aha. Yes, I get that error when I specify a directory. I guess I could add support for accessing the target and seeing if it's a directory, then get the file name from the source and use that. But, for now, just specify the file name and see if it's working like you think.
|
I was able to upload a small file from Windows to a Linux server without any issue. I was also able to upload a slightly larger csv file (23 KB). This works for my use case. The only feedback I would have would be to replace the log print out with a message (ex: 'Finished uploading example.sas (xx sec)' or 'Unable to upload example.sas'). |
Great, thanks. I'll see about changing up what's returned. This needs to be able to be interrogated programmatically after to see if it succeeded or not. I'm thinking of a dict w/ a status and the log segment, not unlike what's returned in batch mode, or from submit(). That way you can test it and you have the log to see what happened if it failed. Also, I have an initial implementation of download pushed to the upload-download branch now too. Same deal w/ return there for now. And, I have some optimizations to do and error handling, like in upload. But, it's working and if you want to try it, that's be great. Thank, |
Out of curiosity, why couldn't the STDIO implementation use a socket connection on a local port to stream the file to SAS, the way it currently handles downloads but in reverse? |
Hey @jasonphillips , So, to get better performance by not having to convert to hex string and informat that back to real binasy in SAS? Well, that's a good idea. I don't expect there's a reason that can't work. I guess I was just following the pattern of df2sd() and sd2df(), where df2sd could work w/ the STDIO channels and didn't require special support that might not be available (ability to open sockets between the two machines). Hey, while I have you, did you see the new saspy_examples repo? I copied your tabluate notebook there, but wanted to see if you wanted to push it there yourself (PR) so it had your id as the contributor there. I was going to delete it from the saspy repo then. Thanks for this idea, it should help out performance! |
Here's what I'm thinking for what's returned from upload and download. Oh, and you can see the dest is a directory, so the souce file name is used for the dest file name (in the log):
Thoughts? |
I haven't pushed those last things yet. I'm still working on them. The output I showed was still just from my development repo. Once I finish it up I'll push it out for you to try. Note that's just the log that was returned, not the Dictionary I'll be returning. Thanks, |
Ah. Your example looks good and the format is also good. I was about to ask is it possible to have similar success key for every functions in saspy. In my use case, when I execute any saspy function from GUI, I would like to throw some message to user(specifically when it fails). Any thoughts? |
ok, I just pushed these features. Go ahead and try it out and let me know how it works. As for changing the API to all methods in saspy, I can't do that. But, there are many methods you can tell if they worked or failed. Some I couldn't tell either way anyway, so I couldn't say. There are methods for getting SAS automarco variables which are basically return codes and statuses for SAS code that was submitted, so those would be useful for a number of situations. If you have any specific cases, I can look at them to see what can be done. Happy to do that. Also, the Batch mode might help out in this case. It returns a dict of LOG LST, like submit(), so you may be able to use that to accomplish what you need. For instance, for a given method, if the LST is empty, that may mean it failed, or you can check the log for a known error that proves it worked or didn't. Tom |
I tested with different file sizes. Here are my findings. |
upload or download or both the same? |
upload. I have to test for download. |
Download seems to be lot better. 300kb took just 3 seconds. |
Then I'll @jasonphillips great suggestion and re-implement upload using sockets which should make it run about the same as the download. For now, if you use small files and see if there are any holes in the implementation, that's be great. Handling invalid files, permissions, ... all the edge cases. I'll work on the other implementation next. Oh, wait, I bet you're using IOM, not STDIO over SSH. I'll have to see about that, it's not like STDIO. But, I may be able to get the 'reverse' to work, so I'll look into both of those cases. Having to encode the binary into hex chars and reconvert to binary is a horrible way to have to do it, but that was a first pass that got us this far. Thanks, |
Ok, I've re-implemented upload in STDIO via sockets. @jasonphillips , are you able to try this out? You have linux? I think everyone else is on Windows and can't try it. BTW, the original implementation is still in there to compare against. You have to go to the access method to call it though, so:
I'm looking at the IOM access method now, and it will require more changes than what STDIO took. I'll have to chance the java code as well as python. It'll take some time to work through. But, hopefully I can get it working similarly. Tom |
Great, I gave it a try, generating some dummy files of exact sizes, and saw the following speeds (reporting "real time" from log):
Both look like linear scales, but indeed the socket method is about 10-15x faster. I did seem to be having some issues with the socket not being freed up immediately afterward, although haven't investigated thoroughly yet. Just after an upload, any calls that use a socket (uploading another file, or using |
Hey Jason, thanks for verifying that. I am using ephemeral ports for these, so it should use a different port each time and not need to wait for a timeout. Unless, if you are using a tunneling port over SSH, then I have to use that port instead of an ephemeral, and that could be the cause of the delay. I'll dig into this further too to see if I see anything suspicious. I'm going to try to get the IOM access method working first though. |
I am using in tunneling port in my case, so that might explain it; odd that the other methods using the port don't lock it up even with many quick calls in a row, but the file transfer holds it for a bit until another request using sockets can complete. |
Thanks Jason. I just pushed an implementation of binary stream transfer on upload for IOM. It should behave comparably to the download for IOM now, like the up/down for STDIO. I'll look into this STDIO issue next, now that I have the IOM case working. @chrishales709 @mailbagrahul can you guys try out the new upload for you IOM cases and see if it's working and faster for you? Just like the STDIO, I left the otiginal implementation in there so you can compare. See above comments for running (sas._io.upload_slow()) @jasonphillips I have one idea about this delay, given it doesn't happen for the other cases. In all cases except this upload, I'm transferring data from SAS to saspy, and saspy is the socket 'server' (creates and accepts the connection). In this upload case, I'm transferring data the other way, and the socket connection is still the same direction. So, I will try reversing the socket connection to see if that might fix this. It could be that the linger is set when SAS is receiving, not transmitting, since at close, it isn't the one that shut down the socket. I may be able to try this out today and see. Tom |
Ok. I tried with both cases and I see sas.upload() is pretty faster(2Mb - .25seconds) than sas._io.upload_slow() (2Mb - 4minutes) And download() seems to be taking long time(more than 3+ minutest) to download 250kb file. |
@mailbagrahul thanks for trying it out. Something must be wrong w/ your download. It should be very similar to upload. I can download/upload 2M in 1-2 seconds; granted I don't have a significant network delay in these cases. Can you provide any more details on what you're seeing? You are using IOM, right? Thanks! Here's a run with a 2M executable from jupyter:
|
Ok, for STDIO I was able to reproduce that 30 second delay, when tunneling, and only when tunneling. STDIO nor over SSH and STDIO over SSH w/out tunnel has no delay, as it's just ephemeral ports which aren't reused; so no problem. In the SSH w/ tunnel, the delay was only after an upload, not because of download or other sas data to data frame methods (which use the socket). This is all as @jasonphillips described. This did turn out to be a case of the socket being closed in the opposite direction (sequence) compared to all of the other cases (like I suspected). I reworked the upload implementation for this case to have SAS create the socket and accept a connection from saspy (the opposite of the other cases), so that the shutdown sequence was in the right direction. The side that did the connect (not the accept) closed down, and then the creator of the socket (accept) shut down, which eliminated the linger delay. There was a problem with this however, in that there was a delay with the SAS side accepting the connection from saspy. The connect succeeded immediately, because it was really connecting to ssh, but then writing to data to SAS would end up failing when the buffer got full because the connect hadn't been accepted on the SAS side. When I put a delay in, between submitting the SAS code and doing the connect, it worked, but that's an arbitrary delay and I don't care for that implementation. So, I currently just connect and start writing, but catch that exception which happens (if it happens, it could connect first try - it's all timing), and start over. So far I haven't seen this fail, and it succeeds as fast as it can; no arbitrary amount of time to wait. The one thing I had to add for this to work, is a reverse tunnel port. Can't use the tunneling port for the revers server socket. So, I added 'rtunnel' : portnum to the configuration definition and key off of that to do this reverse server upload. If rtunnel isn't there, it's the usual client socket case like all of the others. Here's a sample:
So, what's at the upload-download branch, as of now, has upload() for both IOM and STDIO, and STDIO over SSH (tunnel and not) all working and as fast as they can be. I see no delays anymore except for the case where you have ssh and tunnel but not rtunnel, which I can't help. That still works, but you have to wait 30 seconds before the next socket method. Feel free to try this out and let me know what you see. I've tried it as many ways as I can and it looks good so far. Thanks! |
Hey everyone (@chrishales709 @mailbagrahul @jasonphillips), have you had a chance to try out the latest versions? I've been off on other things myself, so I haven't really messed with this since my last post (which it says was 18 days ago). Everything was working for me, but knowing that you're getting the same for your cases is what I'd like to verify before merging this into master. If so, I will merger this in. If there's any issues you observe, I'd like to address them before merging this in. |
Hey everyone, I've gone ahead merged this in and I've created a new pip and release. V2.4.2 which contain this, and other things that were at master. I'm thinking of closing this issue, and if you run into anything with the code that's now out there, we can start a fresh issue for that. Thanks, |
Closing this issue as all of this functionality is in the current release. I did implement some performance enhancements for the IOM access method for this which were pushed to version 2.4.3. If you find any issues, just open another issue with the specifics. Thanks! |
I'm looking for a way to upload files to the SAS server. I'm also looking for a way to get information on SAS server files (ex: create date, modified date).
The use case I have for this is code deployment. I develop SAS programs locally using the Atom editor. Once I'm done developing and testing, I merge my code into the production branch of the project's git repository. Right now I have to manually copy the production branch files to the SAS server. I would like to develop a python program using saspy to compare production branch files to the files on the SAS server, and then replace outdated files with the newer versions of the file.
First, I was wondering if you could add a method for copying a file to the SAS server. Second, I saw the work done on the dirlist method. I was wondering if you could also return the create date and modified date along with the file name.
The text was updated successfully, but these errors were encountered: