-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing response to 'stop' command #631
Comments
Hi Harold! First of all, did you realize you were running an old version of Hercules? According to the Hercules log you provided: 08:25:27 HHC00100I Thread id 00007f2d05d7a400, prio 5, name 'impl_thread' started 08:25:27 HHC00100I Thread id 00007f2d05c75640, prio 4, name 'logger_thread' started 08:25:27 HHC01413I Hercules version 4.3.9999.0-SDL-g34f4353b (4.3.9999.0) 08:25:27 HHC01414I (C) Copyright 1999-2020 by Roger Bowler, Jan Jaeger, and others 08:25:27 HHC01417I ** The SoftDevLabs version of Hercules ** 08:25:27 HHC01415I Build date: Jul 8 2020 at 10:45:38 08:25:27 HHC01417I Built with: GCC 9.3.0 08:25:27 HHC01417I Build type: GNU/Linux x86_64 host architecture build The current release version is 4.6.0, and the current development version is 4.7.0. Secondly, there is nothing actually wrong anywhere. You are just confused. Again, according to your provided Hercules log: 08:25:27 HHC01603I t+ 08:25:27 HHC02229I Instruction tracing on 08:25:27 HHC01603I ipl ldid/sadump.txt 08:25:27 HHC02264I Script 1: file s370.rc processing ended 08:25:27 HHC02324I PSW=0008000000002000 INST=0DB0 BASR 11,0 branch_and_save_register 08:25:27 HHC02269I GR00=00000000 GR01=00000000 GR02=00000000 GR03=00000000 08:25:27 HHC02269I GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000 08:25:27 HHC02269I GR08=00000000 GR09=00000000 GR10=00000000 GR11=00000000 08:25:27 HHC02269I GR12=00000000 GR13=00000000 GR14=00000000 GR15=00000000 08:25:27 HHC02324I PSW=0008000000002002 INST=8200B006 LPSW 6(11) load_program_status_word 08:25:27 HHC02326I R:00002008:K:06=000A0000 00000000 00080000 00000010 ................ 08:25:27 HHC02269I GR00=00000000 GR01=00000000 GR02=00000000 GR03=00000000 08:25:27 HHC02269I GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000 08:25:27 HHC02269I GR08=00000000 GR09=00000000 GR10=00000000 GR11=00002002 08:25:27 HHC02269I GR12=00000000 GR13=00000000 GR14=00000000 GR15=00000000 08:25:27 HHC00800I Processor CP00: loaded wait state PSW 000A0000 00000000 08:25:27 HHC00809I Processor CP00: disabled wait state 000A0000 00000000 *** You IPLed! ***Because you IPLed, of course instructions are going to be executed. Duh. That's what an IPL does. As the final step of an IPL, after the initial CCW chain has completed (i.e. after the initial "load" function is complete), an IPL then it loads whatever PSW it finds in low core, and the system enters the running state under the control of that initial PSW. So there is no problem. Hercules is operating as designed. Hercules is operating like a real mainframe would. I'm going to close this issue as Invalid/PEBKAC and WON'T FIX. Sorry! p.s. Your specified 08:25:27 HHC01451E Invalid value 16K specified for MAINSIZE 08:25:27 HHC00007I Previous message from function 'mainsize_cmd' at hsccmd.c(3300) 08:25:27 HHC01441E Config file[20] s370.conf: error processing statement: MAINSIZE 16K 08:25:27 HHC00007I Previous message from function 'process_config' at script.c(431) The minimum for S/370 is 64K: |
And did you realize you ignored the very information you claim is required for problem resolution? You assumed the sequence of the statements used in describing the problem, identified first, was not the order in which the commands were actually issue? If you had taken the trouble to look at the log you would have found the order to be ipl, stop, store, and savecore. But you did not. So, I have upgraded SDL Hercules using Bill Lewis hercules-helper script. Increased the amount of memory so that error is eliminated and re-ran the problem for your convenience: HHC01413I Hercules version 4.7.0.11105-SDL-DEV-g4f518864 HHC01414I (C) Copyright 1999-2024 by Roger Bowler, Jan Jaeger, and others HHC01417I Built for you with Hercules-Helper (version: v0.9.14-75-g084fd84) HHC01415I Build date: Feb 28 2024 at 03:56:46 HHC01417I Built with: GCC 11.4.0 HHC01417I Build type: GNU/Linux x86_64 host architecture build HHC01417I Modes: S/370 ESA/390 z/Arch HHC01417I Max CPU Engines: 128 HHC01417I Using shared libraries HHC01417I Using setresuid() for setting privileges HHC01417I Using POSIX threads Threading Model HHC01417I Using Error-Checking Mutex Locking Model HHC01417I With Shared Devices support HHC01417I With Dynamic loading support HHC01417I With External GUI support HHC01417I With IPV6 support HHC01417I With HTTP Server support HHC01417I With sqrtl support HHC01417I With Signal handling HHC01417I With Watchdog monitoring HHC01417I With CCKD BZIP2 support HHC17500I REXX(Regina) Mode : Command HHC17500I REXX(Regina) MsgLevel : Off HHC17500I REXX(Regina) MsgPrefix : Off HHC17500I REXX(Regina) ErrPrefix : Off HHC17500I REXX(Regina) Resolver : On HHC17500I REXX(Regina) SysPath ( 9) : On HHC17500I REXX(Regina) RexxPath ( 0) : HHC17500I REXX(Regina) Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0) HHC00100I Thread id 00007f6fffefe640, prio 7, name 'timer_thread' started HHC00100I Thread id 00007f6ffffff640, prio 2, name 'Processor CP00' started HHC00811I Processor CP00: architecture mode z/Arch HHC00811I Processor CP00: architecture mode S/370 HHC02204I ARCHLVL set to S/370 HHC02204I LPARNUM set to BASIC HHC17003I MAIN storage is 64K (mainsize); storage is not locked HHC02204I NUMCPU set to 1 HHC02260I Script 1: begin processing file s370.rc HHC01603I t+ HHC02229I Instruction tracing on HHC01603I ipl ldid/sadump.txt HHC02264I Script 1: file s370.rc processing ended HHC02324I PSW=0008000000002000 INST=0DB0 BASR 11,0 branch_and_save_register HHC02269I GR00=00000000 GR01=00000000 GR02=00000000 GR03=00000000 HHC02269I GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000 HHC02269I GR08=00000000 GR09=00000000 GR10=00000000 GR11=00000000 HHC02269I GR12=00000000 GR13=00000000 GR14=00000000 GR15=00000000 HHC02324I PSW=0008000000002002 INST=8200B006 LPSW 6(11) load_program_status_word HHC02326I R:00002008:K:06=000A0000 00000000 00080000 00000010 ................ HHC02269I GR00=00000000 GR01=00000000 GR02=00000000 GR03=00000000 HHC02269I GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000 HHC02269I GR08=00000000 GR09=00000000 GR10=00000000 GR11=00002002 HHC02269I GR12=00000000 GR13=00000000 GR14=00000000 GR15=00000000 HHC00800I Processor CP00: loaded wait state PSW 000A0000 00000000 HHC00809I Processor CP00: disabled wait state 000A0000 00000000 HHC01603I stop HHC00816W Processor CP00: processor is not started HHC00007I Previous message from function 'stop_cmd_cpu' at hscpufun.c(1101) HHC01603I store HHC00817I Processor CP00: store status completed HHC01603I savecore HHC02202E Missing argument(s). Type 'help savecore' for assistance. HHC00007I Previous message from function 'savecore_cmd' at hsccmd.c(7757) HHC01603I savecore memory.bin HHC02248I Saving locations 0-2FFF to file memory.bin HHC02219E Error in function open(): File exists HHC00007I Previous message from function 'savecore_cmd' at hsccmd.c(7873) HHC01603I savecore memory.bin HHC02248I Saving locations 0-2FFF to file memory.bin HHC02249I Operation complete HHC00101I Thread id 00007f6ffffff640, prio 2, name 'Processor CP00' ended HHC01427I Main storage released HHC01427I Expanded storage released HHC01422I Configuration released HHC01424I All termination routines complete HHC01425I Hercules shutdown complete HHC01412I Hercules terminated HHC00101I Thread id 00007f6fffefe640, prio 7, name 'timer_thread' ended HHC00101I Thread id 00007f700467eb80, prio 5, name 'panel_display' ended The S/370 Principles of Operation manual, GA22-7000-4 clearly states on page 30:
This is clearly not true for S/370 mode on Hercules. The PoO also says on page 31:
Also not true for S/370 mode on Hercules. |
I most certainly did! I even highlighted those statements in the log! Just as I did AGAIN in the new log you supplied! We're obviously misunderstanding something between us somewhere. Perhaps terminology? I don't know. But wherever the misunderstanding is, I apologize for it. Let's try to get past that and get to the root of your reported problem. I do sincerely want to help you. Your sequence of commands was:
Yes? And your complaint (reported "problem"/bug) is that the "cpu was not started", but yet "clearly ran" (i.e. executed instructions anyway):
Which is clearly and obviously wrong! An IPL will always cause instructions to begin executed! Unless the IPL PSW is bad (invalid) of course (or is itself a disabled wait PSW to purposely prevent any instruction execution). So I am at a complete loss as to what your complaint (perceived problem/bug) actually is. I'm just not getting it. I'm not seeing the problem. Can you point it out to me? Can you explicitly state exactly what did not happen that should have happened (or what happened that shouldn't have)? Can you explain what you feel the expected sequence of events should have been? Because as I said, I'm just not seeing it!
Yeah, so?
It most certainly is!! Why do you say that?!
Again, yeah, so?
You have me confused as hell, Harold! You are clearly not making any sense to me. From what I can see, Hercules is operating just fine in 370 mode. The "problem" as you have described it is obviously not occurring. Maybe you can describe it in a different way? With different words? Because I'm just not seeing it! If so, I readily admit that that particular message can seem a bit confusing when in fact no stop command had ever been issued previous to that. But that is just the message that Hercules happens to issue when a CPU is already in the internal stopped state. In this case, the CPU was internally stopped as a result of the disabled wait state PSW having been loaded. When the PSW is in a disabled wait state and no interruptions can possibly occur nor any instructions possibly executed, for all intents and purposes the CPU is essentially "stopped". That is why that particular message is being issued. I would have thought that should be obvious. But perhaps it isn't? Perhaps that is what is confusing you? Is that your complaint? Is that it? If that's not what you beef (complaint/concern) is, then you're going to have to explain it in more detail. You're going to have to use smaller words and take me by the hand, walk me directly to the root of the problem, and then smack me firmly on the head with a clue-by-four. Because I'm afraid that's apparently the only way I'm going to ever grok your issue. |
That is precisely my concern. Did I not highlight that in the first set of lines from the original log? All of the lines YOU highlighted except the ipl command had nothing to do with my report. I would likely have highlighted them if they were. And the most significant I highlighted you ignored. Well for quite some time. Mind, it is not in any way impacting what I am doing. But the message appears related to CPU state, but may not be. So it could just be a message problem. It could also be a CPU state management problem when one ipl's from a host LDID structure. Or, it could be some combination or something else entirely. Just letting others know. |
Yes you did, BUT... You then went on to say:
Which made me believe your complaint wasn't with the unexpected "processor is not started" response to your
More than likely from the sounds of it.
I'm doubting that, but I'll check to be sure. What's a "LDID structure" by the way? I've heard of LDIPL (List Directed IPL), but not "LDID". Is that one of your own terms? Before re-classifying this issue as a potential DOC BUG (I'll remove the won't fix and pebkac labels), do you have any suggestion regarding what you feel would be a more appropriate message? What should the message text be in such a situation? Thanks, Harold. And I apologize for the misunderstanding, but am glad we got through it. |
Proposed new response message:
I also added "processor stopped" to the "disabled wait state" message too:
Hopefully these two messages together will prevent any future confusion. I have the change ready to commit, so please let me know whether the above messages are satisfactory, or whether you would prefer different wording for either. If I don't hear back from you by tomorrow, I'll go ahead and make the commit and close this issue. Thanks. |
List-directed Load. Well it was created sort of to describe using a file directory with a certain structure for loading memory during IPL. For those unaware, this technique was in the early days of Linux for IPL'ing it. Hercules picked it up so it could run Linux in this way. I stumbled onto it in the early days of SATK development. I do not know if it had an "official" name. I certainly do not know what that name might be. I picked "list-directed" because there is a control file in the directory that lists the binary images being loaded and where into absolute memory. The use of "list-directed" IPL in the recent PoO's is entirely different. So, yes, one of "my terms" but having some rationale behind it from the past. I do not know if Jay Maynard is on the list, but please do not remove this form of IPL from Aethra. That would make it much more difficult for SATK to be used with Aethra which I plan to do when it is released. The message... If the message relates to the setting of the bits in the PSW it should not use any words that could be easily confused with the CPU state, for example, stop, operating, running, or the ilk. I know it is a bit inelegant, but, if the message is about the PSW bits, it should say that "due to PSW disabled wait state, the CPU is not executing instructions". CPU state is, per the PoO, same section, NOT influenced by the PSW bits. More than the message... That is just a piece of my concern, and these other aspects may be irrelevant. Look at this from the log...
In actuality, what should have been going on, of which we are not aware, is following the successful IPL, the CPU entered the operating state. It executes instructions until the disabled wait is introduced. We will assume HHC00816W is fixed, but then there is no confirmation that the CPU actually changed to the stop state when the command was issued. I think that too is confusing for me. If an acknowledgement that the command did its duty, that command should follow the new HHC00816W message indicating the change. And if in fact the CPU is already in the stop state, that should probably also be reported. I have further concerns with regard to the store command. It executed as seen in the log. Is that because under the covers the CPU was stopped by the stop command. Or is it treating the disabled wait state as a legitimate stopping of the CPU, but it isn't. While I did not have cause to use the start command, there may be similar concerns surrounding it. Where this is headed is simply more comprehensive and clearer reporting surrounding CPU state. While this is not a big deal, particularly for a single CPU, a multiprocessing environment needs to react properly. I am not saying it isn't under the covers, perhaps so. But without good reporting to the operator who knows? BTW, is there a command to actually query the CPU state? Why am I being a "pain" about this when it did not effect what I am working on? The bigger picture. I am developing for SATK a program that will produce a printed dump with formatting of things like PSW's and store status save areas and other things. Where available, it is dependent upon the store status function for preserving CPU state before it is written to a file. While I realize there are not that many people who use SATK, when forced to look at messages that can be confusing and do not seem to match how things are supposed to work, it ends up making use of the tool I am producing more difficult. Just as it is for me right now. And I am also confident some people following these issues are totally unaware of CPU state. But, if anyone is, it would be users of SATK. You know. Like you. Harold |
I didn't know that! I do see several mentions of the term throughout Hercules and they all seem to be in reference to your definition of it. For example, in the help information for the HHC01603I help ipl HHC01603I HHC01602I Command Description HHC01602I ---------------- ------------------------------------------------------- HHC01602I ipl *IPL from device or file HHC01603I HHC01603I Format: "IPL {xxxx | cccc} [CLEAR] [LOADPARM xxxxxxxx] [PARM xxx [xxx ...]]" HHC01603I HHC01603I Performs the Initial Program Load manual control function. If the first HHC01603I operand 'xxxx' is a valid device number then a CCW-type IPL is initiated HHC01603I from the specified device and SCLP disk I/O is disabled. Otherwise a HHC01603I list-directed IPL is performed from the .ins file named 'cccc', and SCLP HHC01603I disk I/O is enabled for the directory path where the .ins file is located. HHC01603I HHC01603I The optional 'CLEAR' keyword will initiate a Load Clear manual control HHC01603I function prior to starting an IPL. HHC01603I HHC01603I The optional 'LOADPARM' keyword followed by a 1-8 character string can be HHC01603I used to override the default value defined by the 'LOADPARM' command. HHC01603I HHC01603I An optional 'PARM' keyword followed by string data can also be used to HHC01603I pass data to the IPL command processor. If specified the string data is HHC01603I loaded into the low-order 32 bits of General Purpose registers R0 - R15 HHC01603I (4 characters per register for a maximum of 64 bytes total; any excess HHC01603I is ignored). The PARM option behaves similarly to the VM IPL command. HHC01603I HHC01603I PLEASE NOTE that because the 'PARM' option supports multiple arguments, HHC01603I if specified, it MUST be the LAST option on the command line since ALL HHC01603I remaining command line arguments following the 'PARM' keyword will be HHC01603I treated as being part of a single blank-separated parameter string. HHC01603I help sclproot HHC01603I HHC01602I Command Description HHC01602I ---------------- ------------------------------------------------------- HHC01602I sclproot *Set SCLP base directory HHC01603I HHC01603I Format: "sclproot [path|NONE]" HHC01603I Enables SCLP disk I/O for the specified directory path, or disables HHC01603I SCLP disk I/O if NONE is specified. A subsequent list-directed IPL HHC01603I resets the path to the location of the .ins file, and a CCW-type IPL HHC01603I disables SCLP disk I/O. If no operand is specified, sclproot displays HHC01603I the current setting.
True. True. But while Hercules may not be strictly adhering to the PoO's definition of "CPU state" in this specific situation, I personally don't feel it's worth the effort and complication involved in trying to "fix" it. IMHO it's "close enough". Besides, Hercules is not strictly required to behave like other mainframes (only to adhere to the architecture), and I think there's enough wiggle room to allow for Hercules's current behavior in this case.
No. It was stopped as a result of the disabled wait state being entered.
Yes, but I don't know what you mean by: ", but it isn't?". As explained, when a disabled wait PSW is loaded, Hercules places the CPU in the stopped state. That is just the way Hercules has always behaved. since DAY ONE (i.e. ever since the original "Initial repository setup" commit back in Feburary of 2001, i.e. it's been that way for the past 23 year, since the very, very beginning of Hercules):
Well I'm hoping my new proposed message text will help make clearer to the operator as to what's going on and what the current CPU state is.
Not specifically just that, no. But yes, there is a command that will show it. It's called the HHC01603I ipending HHC00850I Processor CP00: CPUint=80000000 (State:80000001)&(Mask:8000000A) HHC00851I Processor CP00: interrupt pending HHC00852I Processor CP00: I/O interrupt not pending HHC00853I Processor CP00: clock comparator not pending HHC00854I Processor CP00: CPU timer not pending HHC00855I Processor CP00: interval timer not pending HHC00856I Processor CP00: ECPS vtimer not pending HHC00857I Processor CP00: external call not pending HHC00858I Processor CP00: emergency signal not pending HHC00859I Processor CP00: machine check interrupt not pending HHC00860I Processor CP00: service signal not pending HHC00861I Processor CP00: mainlock held: no HHC00862I Processor CP00: intlock held: no HHC00863I Processor CP00: waiting for intlock: no HHC00864I Processor CP00: lock not held HHC00866I Processor CP00: state STOPPED HHC00867I Processor CP00: instcount 0 HHC00868I Processor CP00: siocount 0 HHC00869I Processor CP00: psw 0000000000000000 0000000000000000 HHC00870I config mask 0000000000000001 started mask 0000000000000000 waiting mask 0000000000000000 HHC00871I syncbc mask 0000000000000000 HHC00872I signaling facility not busy HHC00873I TOD lock not held HHC00874I mainlock not held; owner ffff HHC00875I intlock not held; owner ffff HHC00876I ioq lock not held HHC00885I Channel Report queue: HHC00886I CRW 0x03840000: RSC:3=SUBCH, ERC:4=ALERT, RSID:0=0x0000 Flags:AR HHC00886I CRW 0x03840001: RSC:3=SUBCH, ERC:4=ALERT, RSID:1=0x0001 Flags:AR HHC00886I CRW 0x03840002: RSC:3=SUBCH, ERC:4=ALERT, RSID:2=0x0002 Flags:AR HHC00881I I/O interrupt queue: (NULL) And it displays this information for each defined CPU too, so if you have say, 16 CPUs defined, then you'll see all of the above information repeated 16 times, once for each CPU (engine). The fastest and easiest way to determine a CPU's started or stopped state is to simply look at the status line at the bottom of the screen/panel: The 'M' indicates "Manual mode", which means the CPU has been stopped.
YES, please DO explain that for us please!
Ah. I see. Well I apologize for that. But as I said, that's just the way Hercules has always behaved, since day one. Nevertheless, I will tweak the messages involved to try and be a bit clearer as to what's going on and what the current CPU state is. That said, as soon as I post this reply, I am going to go ahead and commit my proposed changes, which you said nothing about and thus can only presume you have no objection to. I'm glad you're finding time to work on SATK. I know that your other duties you mentioned to me in a private email a while back were sucking up more of your time than you would have liked as I recall. I hope things have improved in that regard. Take care, Harold. It's good "seeing" you again. |
Fixed(?) by commit 2e763e9. Closing. |
I ran a small test (two instructions) in S/370 EC mode. The test was driven by a ipl list-directed-ipl-file.txt
Part of the test involved storing status. Before doing so, I issued a stop command:
In theory the two instructions should not have executed, but they did as can be seen in the log file. For me this is good news.
The store status appears to have worked and the saving of the memory content.
I have not yet examined the memory content. The test involves a program that dumps the saved memory and does some formatting. I am really working on the dump program.
Thought I should report the issue so it is a known phenomenon.
The below tar file contains everything to reproduce the problem as well as assembly listing and a full Hercules console log. A few small script files reproduce the results.
This issue does not appear at this point to impact anything I am doing. So, unless new developments occur it has a low priority for me.
The text was updated successfully, but these errors were encountered: