Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32-bit breakage #5372

Open
trws opened this issue Aug 4, 2023 · 2 comments
Open

32-bit breakage #5372

trws opened this issue Aug 4, 2023 · 2 comments

Comments

@trws
Copy link
Member

trws commented Aug 4, 2023

In adding a jammy builder, I found that the ubuntu 386 builder hasn't been finding some of our issues on 386. I fixed a few 32-bit issues that had a clear source in #5370, but the full check kicked out quite a lot of errors and failures, and failed to even report from some of the tests. This is a tracking issue for drawing these down, make recheck output below for at least a trace on some of the issues:

make recheck
  FAIL: t3202-instance-restart-testexec.t 1 - run a testexec job in persistent instance (long run)
  FAIL: t3202-instance-restart-testexec.t 2 - restart instance, reattach to running job, cancel it (long run)
  FAIL: t3202-instance-restart-testexec.t 3 - restart instance, job completed (long run)
  FAIL: t3202-instance-restart-testexec.t 4 - run a testexec job in persistent instance (exit run)
  FAIL: t3202-instance-restart-testexec.t 5 - restart instance, reattach to running job, its finished (exit run)
  t3202-instance-restart-testexec.t:  FAIL: N=5   PASS=0   FAIL=5 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t3202-instance-restart-testexec.t - exited with status 1
  FAIL: t3200-instance-restart.t 1 - run a job in persistent instance
  FAIL: t3200-instance-restart.t 2 - restart instance and run another job
  FAIL: t3200-instance-restart.t 3 - restart instance and run another job
  FAIL: t3200-instance-restart.t 5 - inactive job list contains all jobs run before
  FAIL: t3200-instance-restart.t 6 - job IDs were issued in ascending order
  FAIL: t3200-instance-restart.t 13 - run a job in persistent instance (content-files)
  FAIL: t3200-instance-restart.t 15 - inactive job list contains job from before restart
    t3200-instance-restart.t:  FAIL: N=20  PASS=8   FAIL=7 SKIP=5 XPASS=0 XFAIL=0
  ERROR: t3200-instance-restart.t - exited with status 1
  FAIL: t3100-flux-in-flux.t 1 - flux can run flux instance as a job
  FAIL: t3100-flux-in-flux.t 2 - flux subinstance sets uri job memo
  FAIL: t3100-flux-in-flux.t 3 - flux --parent works in subinstance
  FAIL: t3100-flux-in-flux.t 4 - flux --parent --parent works in subinstance
  FAIL: t3100-flux-in-flux.t 6 - instance-level attribute = 0 in test instance
  FAIL: t3100-flux-in-flux.t 7 - instance-level attribute = 1 in first subinstance
  FAIL: t3100-flux-in-flux.t 8 - instance-level attribute = 2 in second subinstance
  FAIL: t3100-flux-in-flux.t 9 - flux sets jobid attribute
        t3100-flux-in-flux.t:  FAIL: N=9   PASS=1   FAIL=8 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t3100-flux-in-flux.t - exited with status 139 (terminated by signal 11?)
  FAIL: t0014-runlevel.t 20 - capture the environment for instance run as a job
            t0014-runlevel.t:  FAIL: N=21  PASS=20  FAIL=1 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t0014-runlevel.t - exited with status 1
  t4000-issues-test-driver.t:  SKIP: N=1   PASS=0   FAIL=0 SKIP=1 XPASS=0 XFAIL=0
  ERROR: t4000-issues-test-driver.t - exited with status 139 (terminated by signal 11?)
  FAIL: t1105-proxy.t 17 - flux-proxy works with jobid argument
  FAIL: t1105-proxy.t 18 - flux-proxy works with /jobid argument
  FAIL: t1105-proxy.t 19 - flux-proxy sets FLUX_PROXY_REMOTE for remote URIs
  FAIL: t1105-proxy.t 21 - flux-proxy preserves options in FLUX_PROXY_REMOTE
  FAIL: t1105-proxy.t 22 - parent-uri under remote flux-proxy is rewritten
  FAIL: t1105-proxy.t 23 - cancel test job
  FAIL: t1105-proxy.t 24 - flux-proxy attempts to restore terminal on error
  FAIL: t1105-proxy.t 25 - flux-proxy sends SIGHUP to children without --nohup
               t1105-proxy.t:  FAIL: N=25  PASS=17  FAIL=8 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t1105-proxy.t - exited with status 139 (terminated by signal 11?)
  FAIL: t2111-job-ingest-config.t 6 - job-ingest: job still runs after failed config reload
  FAIL: t2111-job-ingest-config.t 8 - job-ingest: job still runs after failed config reload
  FAIL: t2111-job-ingest-config.t 10 - job-ingest: job still runs after failed config reload
   t2111-job-ingest-config.t:  FAIL: N=10  PASS=7   FAIL=3 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t2111-job-ingest-config.t - missing test plan
  tap-driver.sh: internal error getting exit status
  tap-driver.sh: fatal: I/O or internal error
  make[1]: *** [Makefile:4660: t2111-job-ingest-config.log] Error 1
  make[1]: *** Waiting for unfinished jobs....
  FAIL: t2110-job-ingest-validator.t 12 - job-ingest: test jobspec validator with any version
  FAIL: t2110-job-ingest-validator.t 13 - job-ingest: all valid jobspecs accepted
  FAIL: t2110-job-ingest-validator.t 15 - job-ingest: test python jsonschema validator
  FAIL: t2110-job-ingest-validator.t 17 - job-ingest: valid jobspecs accepted by schema validator
  FAIL: t2110-job-ingest-validator.t 19 - job-ingest: stop the queue so no more jobs run
  FAIL: t2110-job-ingest-validator.t 20 - job-ingest: load feasibilty validator plugin
  FAIL: t2110-job-ingest-validator.t 21 - job-ingest: feasibility check succceeds with ENOSYS
  FAIL: t2110-job-ingest-validator.t 22 - job-ingest: infeasible jobs are now rejected
  FAIL: t2110-job-ingest-validator.t 23 - job-ingest: feasibility validator works with jobs running
  FAIL: t2110-job-ingest-validator.t 24 - job-ingest: load multiple validators
  FAIL: t2110-job-ingest-validator.t 26 - job-ingest: validator unexpected exit is handled
  FAIL: t2110-job-ingest-validator.t 27 - job-ingest: require-instance validator plugin works
  t2110-job-ingest-validator.t:  FAIL: N=27  PASS=15  FAIL=12 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t2110-job-ingest-validator.t - exited with status 139 (terminated by signal 11?)
  FAIL: t5000-valgrind.t 1 - valgrind reports no new errors on 2 broker run
            t5000-valgrind.t:  FAIL: N=1   PASS=0   FAIL=1 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t5000-valgrind.t - exited with status 1
  FAIL: t1200-stats-basic.t 4 - timing packets received immediate
  FAIL: t1200-stats-basic.t 5 - timing packets received basic
         t1200-stats-basic.t:  FAIL: N=9   PASS=7   FAIL=2 SKIP=0 XPASS=0 XFAIL=0
  ERROR: t1200-stats-basic.t - exited with status 1
  make: *** [Makefile:4650: recheck] Error 2
@grondo
Copy link
Contributor

grondo commented Aug 7, 2023

@trws, I think this fixes the remaining 32 bit issue. A really dumb one, but git blame pointed the finger right back at me 🤦

diff --git a/src/modules/job-manager/start.c b/src/modules/job-manager/start.c
index 9a54212b0..c1082c70e 100644
--- a/src/modules/job-manager/start.c
+++ b/src/modules/job-manager/start.c
@@ -316,7 +316,7 @@ int start_send_request (struct start *start, struct job *job)
         if (!(msg = flux_request_encode (start->topic, NULL)))
             return -1;
         if (flux_msg_pack (msg,
-                           "{s:I s:i s:O s:b}",
+                           "{s:I s:I s:O s:b}",
                            "id", job->id,
                            "userid", (json_int_t) job->userid,
                            "jobspec", job->jobspec_redacted,

Once I make this change all tests pass locally.

I'm mystified why the original 32bit tester stopped working, but thank you for fixing this!

@trws
Copy link
Member Author

trws commented Aug 7, 2023

Oh that's awesome, thanks @grondo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants