Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupt SASSession and clean up properly #592

Closed
nbuck1234 opened this issue Mar 8, 2024 Discussed in #591 · 21 comments
Closed

Interrupt SASSession and clean up properly #592

nbuck1234 opened this issue Mar 8, 2024 Discussed in #591 · 21 comments

Comments

@nbuck1234
Copy link

Discussed in #591

Originally posted by nbuck1234 March 7, 2024
I am using winiom configuration , from a remote machine to our sas server. I happen to be using this with dagster if you're curious.

My scenario is in the case where the job is kicked off, I want to be able to cancel it from the remote machine, rather than tracking it down in the SAS server.

My code looks like this :

        sas= saspy.SASsession(cfgname= 'winiomwin')
        try:
            #sas.set_batch(True)
            context.log.info("The SAS Process ID is : "  + str(sas.SASpid))
            j=sas.submit('%include "' + program_path + '";')         
            with open(logfile_path, 'w') as f:
                f.write(j["LOG"])
            sas._endsas()
            log= j["LOG"]
        except DagsterExecutionInterruptedError as e:
            sas._endsas()
            
            raise e

When I kill the running job I get the following in my stdout, which is what I expect:

SAS Connection established. Subprocess id is 21244

SAS didn't shutdown w/in 5 seconds; killing it to be sure
SAS Connection terminated. Subprocess id was 21244

In this specific test case above i got the SAS process id from the session object :

The SAS Process ID is : 31084

When I logged into the SAS server to see if it was really ended, I see that it is in fact still running, and I must manually end it from there.

Whats the difference between the subprocess id from the stdout and the SAS process id property of the sassession?
How can i be sure that the process is cleaned up properly in this scenario?

I hope this question is useful to the community!

@tomweber-sas
Copy link
Contributor

Thanks for opening this as an issue. From the discussion, just to have my response too:

Hey, yes, I can help with this. Actually, this would be an enhancement request, which would be good to be an Issue. Do you mind changing it to an Issue, or opening it as one? There's a choice to 'reference in new issue' but I think if I click that I'll be the one opening the issue.

The IOM access method uses Java since there's no Python IOM client, so I use the Java IOM client and have an interface between Python and Java to do that. So, the subprocess in this case is the Java process. IOM is an asynchronous interface, so it seems like I may need to be doing something extra that I'm not, to terminate the SAS side, when you terminate the Python side abnormally. I have an idea but need to investigate.

Thanks,
Tom

@tomweber-sas
Copy link
Contributor

I believe I was able to reproduce this. It only seems to happen if there's a running code submission in SAS that hasn't returned yet. If saspy is terminated when the workspace server is idle, the SAS process terminates when the Java process (IOM client) terminates. Also, I see that if the Java process terminates while there is a running job, once that job finished, the SAS server sees the Client had terminated and the workspace server terminates then. So, it is being cleaned up for this case, but not until whatever code was submitted finishes.
I believe that I may be able to preempt that job in SAS so that in this case SAS will terminate when Java is terminating, same as when there's no running job. That's what I will look into next. Would you mind seeing if you see the same situation, where your workspace server does terminate once the code it was running finishes, just to be sure you're seeing the same behavior that I am?

Also, what version of saspy are you running, a current one (I don't think it matters, just checking).

Thanks!
Tom

@nbuck1234
Copy link
Author

nbuck1234 commented Mar 8, 2024 via email

@tomweber-sas
Copy link
Contributor

Great thanks! Yes, I believe I will be able to do this. Trying to code it up now. Fingers crossed :)

@tomweber-sas
Copy link
Contributor

Yep, so much for positivity. So, the submit call (how you submit code to the server) is synchronous, not asynch. Still, I have handlers that try to close the connection when the java side takes an exception, but that's not actually making any difference. This isn't going to be what I was thinking. This is gonna talk longer than I thought, but I'm going to see what I can do!
Tom

@tomweber-sas
Copy link
Contributor

Hey @nbuck1234 I've been working on this and not having much luck catching the termination and trying to cancel the submission to the server. But, one of my attempts, having a separate thread, to try to catch this since the other is blocked in a synchronous wait has allowed me to implement canceling the submission proactively. This is really the implementation I've been missing in both IOM and HTTP access methods. In STDIO I implement the attention handler in SAS (interactive cancel or terminate based upon user response). In the API interfaces (IOM Workspace server and Compute API in HTTP), there is no way to do attention handling like in interactive SAS 9. These API protocols have a 'CANCEL' API call that will terminate whatever code had been submitted.

This would be what you would really want in the first place. Currently if you try to interrupt, you see this (for IOM):
SAS attention handling is not yet supported over IOM. Please enter (T) to terminate SAS or (C) to continue.
But with this new implementation of the separate thread in Java waiting on a separate socket to communicate with it, I've been able to get this CANCEL interface to work. I still have more to do on it; just a prototype at the moment. I'm seeing something strange on the windows side too, but I need more testing on this to be sure it works in all cases. If I can get this working production quality, it'll get you what you need and provide something that's been missing the whole time!

As an example, this is what I'm seeing with a simple test case. You can see the proc print ran and we got that output, and the data step that was sleeping 100 seconds got canceled, as I didn't hit ctl-c until after the proc ran and the DS was still running. And, the server was ready to execute the next things it was sent :)

>>> sas.submitLST('proc print data=sashelp.class;run;data a; x=1; call sleep(100,1); run;', 'text', method='listandlog')

^CException caught!
Please enter (T) to Terminate SAS or (C) to Cancel submitted code or (W) continue to Wait.c
Canceled submitted statements


                                                           The SAS System                    Tuesday, March 12, 2024 03:35:00 PM   1

                                          Obs    Name       Sex    Age    Height    Weight

                                            1    Alfred      M      14     69.0      112.5
                                            2    Alice       F      13     56.5       84.0
                                            3    Barbara     F      13     65.3       98.0
                                            4    Carol       F      14     62.8      102.5
                                            5    Henry       M      14     63.5      102.5
                                            6    James       M      12     57.3       83.0
                                            7    Jane        F      12     59.8       84.5
                                            8    Janet       F      15     62.5      112.5
                                            9    Jeffrey     M      13     62.5       84.0
                                           10    John        M      12     59.0       99.5
                                           11    Joyce       F      11     51.3       50.5
                                           12    Judy        F      14     64.3       90.0
                                           13    Louise      F      12     56.3       77.0
                                           14    Mary        F      15     66.5      112.0
                                           15    Philip      M      16     72.0      150.0
                                           16    Robert      M      12     64.8      128.0
                                           17    Ronald      M      15     67.0      133.0
                                           18    Thomas      M      11     57.5       85.0
                                           19    William     M      15     66.5      112.0


7                                                          The SAS System                        Tuesday, March 12, 2024 03:35:00 PM

34
35         proc print data=sashelp.class;run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


35       !                                   data a; x=1; call sleep(100,1); run;

NOTE: The DATA step has been abnormally terminated.
WARNING: The data set WORK.A may be incomplete.  When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           3.01 seconds
      cpu time            0.00 seconds




8                                                          The SAS System                        Tuesday, March 12, 2024 03:35:00 PM

36

When I get this solid, I can push it to a branch for you to try out, if you can do that. This ought to provide you with what you're looking for!

Thanks,
Tom

@nbuck1234
Copy link
Author

nbuck1234 commented Mar 13, 2024 via email

@tomweber-sas
Copy link
Contributor

ok, I've pushed it to a branch named cancel, go figure :) You can simply

pip uninstall -y saspy
pip install git+https://git@github.com/sassoftware/saspy.git@cancel

to get it and try it out. It's behaving pretty well on both linux and wiindows. I need to do more testing and validation, of course, but it's looking pretty promising.

Let me know what you see.

Thanks,
Tom

@nbuck1234
Copy link
Author

nbuck1234 commented Mar 13, 2024 via email

@tomweber-sas
Copy link
Contributor

How are you running scripts in 'batch' from jupyter? I didn't try this when running scripts (file interpreter), just interactively. Running batch scripts, you're correct, there wouldn't be a way to interact with this functionality. I don't see either of the pics it looks like you tried to attach. So I can't see what that shows.

The cancel, doesn't shut down the SAS process. It cancels the current code that was submitted. The process stays up to continue to be used. I haven't found a way that allows me to cancel the running code when you abnormally kill the Python process. But, even when you do that, the SAS process terminates on it's own once it finishes whatever code you were running in it after Python and Jupyter have been killed. It's just that I can't catch that and cancel the running code so it shuts down right away. If you shutdown Python normally, even while there is code running, then it will stop and shutdown right away.

Ok, I tried this running as a batch script. If you just kill the python process this won't get caught and there's no chance of trying to clean up. If you set prompt=False (in config def or on SASsession) and you interrupt the process instead of killing it (kill -SIGINT pid or ctl-c, but not kill [-9] pid or restart kernel in Jupyter) then my code will catch that and do the 'cancel' implicitly for you which will get you what I expect you want.

Here's my example case of this.

program t2.py

#!/usr/bin/env python3.5

def main():
   import saspy, sys

   cfg = sys.argv[1] if len(sys.argv) > 1 else 'iomj'
   sas = saspy.SASsession(cfgname=cfg, prompt=False)
   print(sas)

   sas.submitLST('proc print data=sashelp.class;run;   \
                  data a; x=1; call sleep(100,1); run; \
                  proc print data=sashelp.class;run;',
                  method='logandlist', results='text')

   print('still here after cancel')

   print(sas)

if __name__ == "__main__":
    main()

batch submitting it:
tom64-5> python3.5 ../misc/t2.py

finding and interrupting (not KILLing) it lets it cancel what's running and finish, coming down clean.

tom64-5> ps -ef
sastpw    6710 30867 38 14:29 pts/0    00:00:01 python3.5 ../misc/t2.py
tom64-5> kill -SIGINT 6710

and the result from the job submission:

tom64-5> python3.5 ../misc/t2.py
SAS Connection established. Subprocess id is 6716

Access Method         = IOM
SAS Config name       = iomj
SAS Config file       = /opt/tom/github/saspy/saspy/sascfg_personal.py
WORK Path             = /sastmp/SAS_workE45A000B4705_tom64-7/SAS_work1C99000B4705_tom64-7/
SAS Version           = 9.04.01M8P01182023
SASPy Version         = 5.6.0
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = utf-8
Python Encoding value = utf_8
SAS process Pid value = 739077


Exception caught!
Canceled submitted statements


7                                                          The SAS System                      Wednesday, March 13, 2024 02:29:00 PM

34
35         proc print data=sashelp.class;run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


35       !                                                        data a; x=1; call sleep(100,1); run;

NOTE: The DATA step has been abnormally terminated.
WARNING: The data set WORK.A may be incomplete.  When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           4.24 seconds
      cpu time            0.00 seconds

35       !                                                                                                               proc print
35       ! data=sashelp.class;run;



8                                                          The SAS System                      Wednesday, March 13, 2024 02:29:00 PM

36

                                                           The SAS System                  Wednesday, March 13, 2024 02:29:00 PM   1

                                          Obs    Name       Sex    Age    Height    Weight

                                            1    Alfred      M      14     69.0      112.5
                                            2    Alice       F      13     56.5       84.0
                                            3    Barbara     F      13     65.3       98.0
                                            4    Carol       F      14     62.8      102.5
                                            5    Henry       M      14     63.5      102.5
                                            6    James       M      12     57.3       83.0
                                            7    Jane        F      12     59.8       84.5
                                            8    Janet       F      15     62.5      112.5
                                            9    Jeffrey     M      13     62.5       84.0
                                           10    John        M      12     59.0       99.5
                                           11    Joyce       F      11     51.3       50.5
                                           12    Judy        F      14     64.3       90.0
                                           13    Louise      F      12     56.3       77.0
                                           14    Mary        F      15     66.5      112.0
                                           15    Philip      M      16     72.0      150.0
                                           16    Robert      M      12     64.8      128.0
                                           17    Ronald      M      15     67.0      133.0
                                           18    Thomas      M      11     57.5       85.0
                                           19    William     M      15     66.5      112.0

still here after cancel
Access Method         = IOM
SAS Config name       = iomj
SAS Config file       = /opt/tom/github/saspy/saspy/sascfg_personal.py
WORK Path             = /sastmp/SAS_workE45A000B4705_tom64-7/SAS_work1C99000B4705_tom64-7/
SAS Version           = 9.04.01M8P01182023
SASPy Version         = 5.6.0
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = utf-8
Python Encoding value = utf_8
SAS process Pid value = 739077


SAS Connection terminated. Subprocess id was 6716
tom64-5>

And, you can get the same from hitting ctl-c if you're running the script in the foreground:

tom64-5> python3.5 ../misc/t2.py
SAS Connection established. Subprocess id is 6823

Access Method         = IOM
SAS Config name       = iomj
SAS Config file       = /opt/tom/github/saspy/saspy/sascfg_personal.py
WORK Path             = /sastmp/SAS_workE45A000B4705_tom64-7/SAS_work68D2000B4705_tom64-7/
SAS Version           = 9.04.01M8P01182023
SASPy Version         = 5.6.0
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = utf-8
Python Encoding value = utf_8
SAS process Pid value = 739077


^CException caught!
Canceled submitted statements


7                                                          The SAS System                      Wednesday, March 13, 2024 02:38:00 PM

34
35         proc print data=sashelp.class;run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds


NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

35       !                                                        data a; x=1; call sleep(100,1); run;                   proc print
35       ! data=sashelp.class;run;


36
37
38

8                                                          The SAS System                      Wednesday, March 13, 2024 02:38:00 PM

39

                                                           The SAS System                  Wednesday, March 13, 2024 02:38:00 PM   1

                                          Obs    Name       Sex    Age    Height    Weight

                                            1    Alfred      M      14     69.0      112.5
                                            2    Alice       F      13     56.5       84.0
                                            3    Barbara     F      13     65.3       98.0
                                            4    Carol       F      14     62.8      102.5
                                            5    Henry       M      14     63.5      102.5
                                            6    James       M      12     57.3       83.0
                                            7    Jane        F      12     59.8       84.5
                                            8    Janet       F      15     62.5      112.5
                                            9    Jeffrey     M      13     62.5       84.0
                                           10    John        M      12     59.0       99.5
                                           11    Joyce       F      11     51.3       50.5
                                           12    Judy        F      14     64.3       90.0
                                           13    Louise      F      12     56.3       77.0
                                           14    Mary        F      15     66.5      112.0
                                           15    Philip      M      16     72.0      150.0
                                           16    Robert      M      12     64.8      128.0
                                           17    Ronald      M      15     67.0      133.0
                                           18    Thomas      M      11     57.5       85.0
                                           19    William     M      15     66.5      112.0

still here after cancel
Access Method         = IOM
SAS Config name       = iomj
SAS Config file       = /opt/tom/github/saspy/saspy/sascfg_personal.py
WORK Path             = /sastmp/SAS_workE45A000B4705_tom64-7/SAS_work68D2000B4705_tom64-7/
SAS Version           = 9.04.01M8P01182023
SASPy Version         = 5.6.0
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = utf-8
Python Encoding value = utf_8
SAS process Pid value = 739077


SAS Connection terminated. Subprocess id was 6823
tom64-5>

Set prompt=False for your configuration and try this out with your use cases. You won't get prompted and it will do the cancel on it's own.

Tom

@tomweber-sas
Copy link
Contributor

Looking back at your original example code, you're trying to catch an interrupt and issue sas.endsas(). I did add the CANCEL code to endsas() as well, so that it would cancel anything that might be running and then shutdown the SAS session. I tried this case, interactively, not the same as how you had it - I don't know how you're 'killing' the process. But, I was able to run a test case, hit ctl-c and get my prompt, hit ctl-c again and get back to the python prompt, while the submitted code was still running. I then submitted sas.endsas() and the SAS server was immediately terminated; meaning the cancel worked and then the server was shut down cleanly.

So, I'm kinda expecting that first case you showed, which, given the output you showed, seemed as though you did catch that and issue sas.endsas(), which should now actually cancel the running code and shut down the server. Again, set prompt=False in your config since you're gonna be running batch and prompting won't work. Then I think this should do what you want. Again, that's based on what you showed actually executing that sas.endsas() in the catch block.

Tom

@nbuck1234
Copy link
Author

nbuck1234 commented Mar 14, 2024 via email

@tomweber-sas
Copy link
Contributor

Awesome, thanks! I do want to get this working as best it can!

@nbuck1234
Copy link
Author

nbuck1234 commented Mar 14, 2024 via email

@tomweber-sas
Copy link
Contributor

That's great! How are you 'killing' the python processes? As I mentioned, if you 'kill -9 pid' then my code can't issue the cancel or try to then shutdown, as the process just terminates out from under me. But, if you interrupt the process, that would give my code a chance to do this, which is working, so that's great. You must be terminating it in a 'normal' as opposed to 'abnormal' way so the interrupting is happening prior to process shutdown. So that's great!

@nbuck1234
Copy link
Author

nbuck1234 commented Mar 14, 2024 via email

@tomweber-sas
Copy link
Contributor

Cool! I'll need to spend some more time validating this out, but I ought to be able to merge it in and build a new release with it after finish testing it. For the abnormal term, there's nothing I can do, but the workspace server still goes away as soon as it sees the client terminated. It just doesn't see that while it's executing code that was submitted. As soon as that code finishes, it notices the client is gone and it terminates. So, it's not necessarily the case that there's always some long running job at the point you terminate the client. And, now for any normal case, it'll stop and shut down right away.

Thanks!
Tom

@tomweber-sas
Copy link
Contributor

Found a couple problems that I need to address. Just better coordination and error handling between the two java threads than I have in the prototype. I've worked that out and pushed those changes. So back to testing and validating again. Just FYI. I'll let you know when I merge this in and build a new release; once I feel it's production quality.

Tom

@tomweber-sas
Copy link
Contributor

ok, I've pushed all of this and built a new release, v5.7.0, which containers this enhancement. It's here on github, on PyPI and on Conda-forge now. Go ahead and grab it and, if you don't mind, verify it's working as expected. Assuming so, we'll close this one!

Thanks!
Tom

@tomweber-sas
Copy link
Contributor

@nbuck1234 I typed all this in but didn't submit it. So, adding it back just so it's here for explanation. I ended up making one last change to this to limit it to the submit methods (submit(), suibmitLST(), submitLOG()), and, of course, endsas(), which was your concern.

My concern was that all of the saspy methods generate and submit SAS code. Many have to generate some code, submit it and get the output from that, generate more, and submit that, ...

So for these multi-sequence methods, allowing a user to just kill my code in the middle of what I'm trying to do, isn't helpful to anyone and I really don't want to have to debug those problems that aren't 'real' saspy problems. But, killing a single user submit with user code, is straight forward and a helpful feature. So, that's what I changed since last you tried this. Again, it shouldn't make any different for what you're trying to do.

This is just to document it here since it's what I ended up doing.

Tom

@tomweber-sas
Copy link
Contributor

closing this. Let me know if you need anything else!
Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants