New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAS IOM Interface Using Windows COM #228
Conversation
Added tests to validate SAS IO related functionality. Currently, the only tests written are to check for IO object compliance. That is, the tests make sure that each IO object has the required methods defined so that all objects expose a coherent interface. This was particularly tricky when defining the COM module as some private functions need to be exposed publicly (`_asubmit`). Additional tests would be useful.
1. Resolve issue where any invalid SAS syntax put the LanguageService in to an error state. For instance, the following would return an ERROR in the log, but also cause strange behavior upon issuance of additional commands: `data foo; set bar run;`. The missing semicolon after `set bar` would cause LanguageService token scanning to totally explode. By issuing a reset command, the scanner no longer enters this state. 2. Missing `nosub` arguments in `read_csv` and `write_csv` were causing some tests to fail. Add support for this argument as well. 3. Resolve issue where writing a dataframe with datetime values was setting the improper date format (DATETIME20. instead of E8601DT26.6). After some testing, it appears that ADODB can support writing this format, and the original issue was likely due to the tired programmer.
Hey @hhubbell that's pretty cool! I haven't had a chance to dig through it all or try it, but at a glance, it sounds cool. For the read/write_csv methods, those are server side, so that's no issue. They are just proc import/|export, so the filesystem path has to be accessible to the server. I recently added upload/download methods so that you can upload a client side file and then access it on the server, or you can do write_csv() followed by download() to write it out w/ SAS and then pull it to your client; - same for the other direction. The up/down is a binary transfer, so any kind of file will work. You can transfer SAS Data Sets if you want. A couple things right off the top of my head are support for this, both w/ issues and with enhancements. Also, dependencies; saspy isn't dependent on anything that's not in the standard library, and it can't be dependent on windows or linux specific things, as it can be installed and run on the various platforms with only 1 of any of the access methods. It's not even dependent on pandas, as you don't have to have that to still be able to use 90% of the functionality. So, this would be an optional access method to use (they all are), and saspy should install and run w/out any of it's dependencies, unless, of course, you are trying to use it, then you obviously need them installed for it to work. I don't see any issue with this, They would be runtime dependencies if you try to run it this way, but not be required to install on any platfor if using other access methods. The other thing just skimming over it is the host/port and user/pw are the same as the current IOM access method, since they are IOM, just a different client library, that only works on windows. So, I think those config parameters should be the same as the java IOM client access method; iomhost/iomport omruser/omrpw. So, I'll try to look into this further and see what else I see. Seems like a cool thing. Thanks, |
Hi Tom, Thanks for your feedback.
Great. That was my understanding as well, but I wasn't sure if it worked differently with the Java IOM client.
I plan to provide support as needed going forward. I enjoy working on this project because I like Python and I think the library is useful for my professional work, but this work is a side-project for me. I'll do my best to contribute in a timely manner if there's a critical issue, but it will always come third to my paying job and my sanity :) I think this is a great library and I hope to continue working on it.
Right, the IOMCOM access method does require an additional Python library to interact with the COM API. The library is In extras_require = {':platform_system == "Windows"': ['pypiwin32']} In if platform.system() == 'Windows':
from win32com.client import dynamic
Noted. I will update. |
I don't mind helping with support either, I have been using the COM api in python for years, just never had the time to contribute like @hhubbell has (well done!). I think that it may be the right time to consider moving the different connection types/io types into their own plugins to SASPy. It would cleanup the dependencies problem, at the very least. |
@FriedEgg thanks for offering to help out with this, greatly appreciated. Thanks! |
Well I got the following error doing the install. And, the more specific question I have is why it appeared to try to install pypiwin32, presumably from the etras_requitements??? It references saspy as why it was trying to install it (if I read this right). I thought it wasn't supposed to install that unless I added it to the install line like And, since it couldn't install that, for other reasons (I don't really care the specific reason), saspy install failed. That's what I don't want, as I didn't ask for the extra requirement. If I was only going to use the existing IOM module, I don't need it trying to install win32 and now I can't install saspy. That's one of the issues I have with adding extra optional requirements in a way that isn't optional. Does the extras_requirement in this version of your setup.py just need to be coded different than it is now, and then it won't try to install that unless I ask for it? That's what I thought you were describing. If it can work that way, then that's cool. Thanks!
Here's the full log:
|
Ah, yep, now that I look at the syntax of it, I changed the line to be
and it installs w/out the win32 install and succeeds, and when I install with [win32] added, it tries to install it (and fails, but that's not the problem). So cool, that does work as you described. Now I can test out the access method once I configure it! |
Well, I'll be. I got it to work and it connected and was successfully working with many methods! Good job! That was run on a local system, so the cells (like read/write_csv) have local paths, so they didn't run. I'll try them out another time. I have an abend in df2sd() and I don't get the log from saslog(). Of course upload/download aren't found, and the html that comes back needs a couple tweaks I put in way back when to keep the formatting in Jupyter from getting out of whack. But, as a first pass, it's really pretty functional and works! That's cool. Here's an html (download and remove the .txt. to render it; this fool thing won't let you attach .html files). Again, this is a first pass and I haven't tried out lots of stuff. It will certainly need some tweaks and enhancements, but it looks pretty good to start with! Tom |
Just want to say thanks to @hhubbell for all of this work. I'm glad to see it happening, as then anyone with EG (or the SAS Integration Technologies client, which is a free download from SAS support) will be able to leverage saspy without hunting for Java jars. Great stuff! |
I've made the following updates:
@tomweber-sas, can you point me to the commit you are referencing when you say:
I can't figure out what change you're suggesting.
Can you provide an example of the dataframe that causes this? I'm not able to reproduce the error. |
Hey Harry, that's great. I'll pull this and run it through some paces. The exception in df2sd is in that html I attached above; SGF_SuperDemo1_HarryCOM.html.txt (download and delete '.txt' off it so it's '.html' and you can view it. Here's the short of it though; just the cars data set back and forth:
It's just round tripping the cars data set. I don't know if it could be a version issue with the win32 module I have? See if you can do the same w/out this or if you get it too. I'll dig up the tweaks to the html and send you that. Thanks! |
I'm confused at how those two posts from me got out of order and the first one (second now) says: ??? |
It looks like the The HTML formatting changes were also applied. |
Cool, thanks Harry. Yes, it smelled like a missing values issue. The age old SAS I/O conversion issue; been dealing with that for 30+ years now myself :) |
Got a new error with this. Did this change with the version of Pandas? I see it in some doc but not others. AttributeError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasbase.py in df2sd(self, df, table, libref, results, keep_outer_quotes) C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasbase.py in dataframe2sasdata(self, df, table, libref, results, keep_outer_quotes) C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasiocom.py in dataframe2sasdata(self, df, table, libref, keep_outer_quotes) C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasiocom.py in (x) AttributeError: module 'pandas' has no attribute 'isna' |
Which version of pandas do you have installed on your system? The pandas I have installed supports https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.isna.html EDIT: Ah, according to their git history it was at least 2 years ago. |
FYI: C:\saspy-hh_win32_com>pip show pandas C:\saspy-hh_win32_com> |
`isna` is an alias for `isnull`, added to the pandas library in October, 2017.
Ok, looks like we can use
|
cool, and yeah, I know. :) I'm in the same boat trying to make things work w/ old and new "other people's code". |
df2sd now works for me too! How about that saslog()? I'll try out the others like the csv and up/download tomorrow. This is looking really good. Also, we will need a section in the config doc for this, especially with details about how to get the class_id and what 'provider' means or what options there are. |
By class ID, do you mean the values you get from:
|
@tomweber-sas The issue with >>> ll = sas.submit("%put hello world!;")
>>> ll.keys()
['LOG', 'LST']
>>> ll['LOG'] # Got the log automatically on `submit`
'[Entire log message here]'
>>> sas.saslog() #Nothing left to flush from the server, since we haven't submitted anything else. We can work around this behavior by having the @cjdinger |
Harry, for saslog() you can do what I do in the STDIO interface; I just append each part to continuous log hung off the session. So I just return that for the saslog() call. |
Two little thing I've found to fix.
And I get the saslog now :) Which I needed to see the non-quoted filename error. Looking good! |
1. Fix filename syntax error in `read_csv`. Wrap in quotes and escape quotes in path. 2. Escape quotes in `write_csv` file path. 3. Support writing to directory in `download`.
Hi Tom, I've resolved the outstanding issues and updated the documentation. |
Hey Harry, that's all looking good. I see you had version 3.1.0 in the doc. I think that's good. I'll go ahead and snap a 3.0.0 version off before merging it in and bump the version to 3.1.0 for this. Can you add just a couple things in the doc.
Thanks, |
Hi Tom, Added I mention in the documentation that remote connection are supported just because I simply don't know if a local connection would work. There may be some additional changes required to support local connections, or it simply may not be possible. I don't know enough about how a local install differs from a remote install to say for sure - for instance, does a local install expose an open IOM connection port? I only have a remote instance to test against. Thanks! |
To enable, you should be able to download the SAS Integration Technologies client -- free from our website on support.sas.com, Demos and Downloads. EG and AMO users will already have it, as will Base SAS on Windows. But "greenfield" users can set themselves up for free -- just need a SAS to connect to. In terms of connecting to local SAS without a port, that's possible with a local COM connection. See my C# example here for guidance. |
@cjdinger, what is the enum value for obServer.Protocol = SASObjectManager.Protocols.ProtocolCom; It looks like this might be the only update needed to support local connections. |
Looks like if this works, we can support both remote and local, like the current IOM access method. Local (requires): Then the code would key off 'provider' instead of 'class_id' because that's require for both cases (in sasbase.py SASconfig)? and the code in the access method would be: Remote (includes this): Local (includes this): All the rest would be the same, just those lines different it seems? If that's all it takes, that would be good to be able to support both cases. |
Harry, I changed the following in your access method and it worked:
with that, and changing sasbase SASconfig to get 'provider' instead of 'class_id' as the key to use the sasiocom AM, and the doc changes for local vs. remote (like the existing IOM does, this should be good to go! Also, omruser/pw are also not required for local, so really only 'provider' to specify that and trigger it. Tom |
Looks like Let me know if you can get it running with a local install - I don't have access to one so I'm not able to test. |
Perfect timing! |
Cool, though I would make host be the trigger for remote and it's absense be loacal, so it's consistent w/ the current IOM. And for that code, set user/pw to None in the local case, so they are correct on the connection call. |
I also like to use '127.0.0.1' instead of 'localhost', only because I've see some odd cases where localhost alias isn't set up right. It is for me and I've never had that, but others on occasion. But 127.0.0.1 is always correct. This is really close!, it's looking good! |
Fixed |
Well, that all works on my system for both local and remote. This looks good to merge in to me! Any last changes? |
Ready! Thanks Tom! |
This is V3.1.0 and is the current version out on pypi, and should soon be building for conda! |
Hi Tom,
Users at our org are provisioned an EG installation on their client machine to interact with the SAS server. The install seems a bit unique in that no Java code is installed anywhere on the client - in fact, I don't think any of the clients have a Java runtime installed at all. This has been a major roadblock to using this library, because we are all on Windows machines (no
STDIO
orSSH
IO methods). We have some enthusiastic analysts that are interested in using Python in conjunction with pre-existing datasets/macros/etc.In my spare time over the past few months I've put together an IO module that uses the SAS COM libraries, which gets installed with EG, to create a bridge between the client and server. I'd like to contribute this module back upstream in hopes that it may help some other users that do not want to use Java as the bridge. Perhaps a band-aid for #206?
Overview
Creating the bridge is actually fairly straightforward. The implementation details can be found in Chris Hemedinger's blog post Using Windows PowerShell to connect to a SAS Workspace server. There's a little bit of digging required in the MSDN and SAS LanguageService docs for some stuff, but otherwise nothing too fancy I think.
Known Issues
There is one known issue related to client/server file IO that I hope is not a deal breaker. I think this is something that we can fix in the future, but I'd like a second pair of eyes on it. I don't think it's major, as I experience the same issue in SAS EG anyway. This leads me to believe that the behavior is related to a configuration issue, or that it is expected behavior.
Some IO methods such as
read_csv
andwrite_csv
that take a file path as a parameter may read or write from the server's file path instead of the client's. Here's an example:If that file exists on the SAS server, that file will be read instead. If it does not exists, an error is returned. Like I mentioned, I can reproduce this using SAS EG, so it may be a config issue at my org or expected behavior. If I reference a file on a shared drive, it works fine (both EG and this IO module read/write without issues).
Tests
This module passes 109/113 tests. Five tests are skipped; three are errors that exist regardless of IO method used (they are attempting to skip the tests anyway), and two tests are skipped outright. I introduced a few checks that validated the proper methods were defined. This was a pain point during development as some "underscore" methods are used publicly. They aren't really "unit tests," but they just check to see if the API is consistent.
The four tests that fail are due to the known issue described above. Both
test_read_csv
andtest_write_csv
fail due to the file path, as well asregScoreAssess
andregScoreAssess2
which attempt to write to a temporary file. These show up with unrelated errors during testing as a result.Please let me know if you have any questions or concerns.
Thanks!