fixes-459 #460

bennahugo · 2022-03-18T17:25:08Z

This fixes #459.

As near as I can tell the issue is in the use of threads inside numba..... TDD fails to import in the new numba (even if I follow their instructions and install from pip.... @JSKenyon)

I still can't track down this super annoying error -- that part of the code is a bit cryptic maybe @o-smirnov can be of some assistance here

INFO      19:19:39 - data_handler       [x01] [0.3/2.2 2.6/23.0 0.9Gb] reading BITFLAG
INFO      19:19:39 - main               [0.2/2.9 2.1/25.8 0.9Gb] WARNING: unrecognized worker process name 'Process-6'. Please inform the developers.

At least it runs through again now

Running with

gocubical --sol-jones g,dd --data-ms msdir/1491291289.1ghz.1.1ghz.4hrs.ms --data-column CORRECTED_DATA --data-time-chunk 8 --data-freq-chunk 0 --model-list "MODEL_DATA+-output/deep2.DicoModel@msdir/tag.reg:output/deep2.DicoModel@msdir/tag.reg" --model-ddes auto --weight-column WEIGHT --flags-apply FLAG --flags-auto-init legacy --madmax-enable 0 --madmax-global-threshold 0,0 --madmax-threshold 0,0,10 --sol-stall-quorum 0.95 --sol-term-iters 50,90,50,90 --sol-min-bl 110.0 --sol-max-bl 0 --dist-max-chunks 4 --out-name output/deep2cal --out-overwrite 1 --out-mode sr --out-column DE_DATA --out-subtract-dirs 1:  --g-time-int 8 --g-freq-int 0 --g-clip-low 0 --g-clip-high 0 --g-type complex-diag --g-update-type phase-diag --g-max-prior-error 0.35 --g-max-post-error 0.35 --g-max-iter 100 --dd-dd-term 1 --dd-time-int 8 --dd-freq-int 32 --dd-clip-low 0 --dd-clip-high 0 --dd-type complex-diag --dd-fix-dirs 0 --dd-max-prior-error 0.35 --dd-max-post-error 0.35 --dd-max-iter 200 --degridding-OverS 11 --degridding-Support 7 --degridding-Nw 100 --degridding-wmax 0 --degridding-Padding 1.7 --degridding-NDegridBand 15 --degridding-MaxFacetSize 0.15 --degridding-MinNFacetPerAxis 1 --dist-nthread 4 --dist-nworker 4 --dist-ncpu 4

JSKenyon

Just one log message to fix, otherwise looks good to me.

JSKenyon · 2022-03-22T07:42:19Z

cubical/workers.py

+            return nthread
+        except:
+            numba.config.THREADING_LAYER = "default"
+            print("Cannot use TDD threading (check your installation). Dropping the number of solver threads to 1", file=log(0, "red"))


Should be TBB (thread building blocks).

bennahugo · 2022-03-22T07:59:27Z

Yup. As discussed -:- https://numba.pydata.org/numba-doc/latest/user/threading-layer.html
This says TBB can be enabled by installing the python package from pip. This is not my experience though -- it seems to be picking up the system version. This is therefore only a workaround.

This used to work with earlier versions of numba though so I'm not sure what has changed to make things execute "unsafely". Perhaps they switched over their default threaded model.

JSKenyon · 2022-03-22T08:05:41Z

Just putting my two cents here. Note these issues on Numba: numba/numba#6108 and numba/numba#7148. It seems that it has something to do with discovery of the .so files. It is possible to work around this by setting LD_LIBRARY_PATH to wherever pip put the file. In my case this was something like path/to/venv/lib.

bennahugo · 2022-03-22T09:00:44Z

Ok I tried exporting the LD_LIBRARY_PATH (technically it should not be needed when the virtualenv is activated expressly though)

Numba -s reports successful import

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

I do have

Requirement already satisfied: tbb in ./venvddf/lib/python3.6/site-packages (2021.5.1)

However, when I try to execute the basic function

    def set_numba_threading(nthread):
        try:
            numba.config.THREADING_LAYER = "safe"
            @numba.njit(parallel=True)
            def foo(a, b):
                return a + b
            foo(np.arange(5), np.arange(5))
            return nthread
        except:
            numba.config.THREADING_LAYER = "default"
            print("Cannot use TBB threading (check your installation). Dropping the number of solver threads to 1", file=log(0, "red"))
            return 1

I get

/home/hugo/workspace/venvddf/lib/python3.6/site-packages/numba/np/ufunc/parallel.py:365: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
  warnings.warn(problem)
INFO      10:59:42 - main               [0.2 2.0 1.0Gb] Cannot use TBB threading (check your installation). Dropping the number of solver threads to 1

So it is still picking up the older system libraries version. Unfortunately I can't uninstall that version --- it will break several system packages

bennahugo · 2022-03-22T10:14:10Z

Ok hang on... found another place where OMP is being used -- the degridder.... J

It may run into a screwup when forks and threads are used. I suggest we switch to workqueue if TBB fails to load then on top of that set the environment variables accordingly -- if workers.py sets nthread >1 then the degridder needs to go to OMP_NUM_THREADS == 1

bennahugo · 2022-03-22T10:16:42Z

I'm not sure how this did not give issues before. My best guess is numpy now invokes OMP before we fork and then it becomes unsafe to use.....

bennahugo · 2022-03-22T13:10:29Z

Nope workqueue don't completely solve the issue -- things still go pot if threads > 1 is used on workqueue -- and it is not a memory issue. I'm testing this on com08

Edit : at least on 18.04 it looks like one would need to compile packages at system level (possibly just to include headers for TBB ???) I have no idea how to get things working in a venv. I've traced it down to stem from the numba code with these changes

…ative to a mosaic rephased center

bennahugo · 2022-03-23T09:54:31Z

Alright I'm happy this now works as advertised -- can't believe nobody picked up the issue with the previously used small angle approximation for getting the ra dec of the facet center for beam application:
Predicted flux far off axis E evaluated almost at the source :

[278:293,1833:1850] min -0.01585, max 0.05009, mean 0.002003, std 0.01133, sum 0.5107, np 255

original convolved model flux (subject to a slightly different beam evaluation due to the regular facets used in DDF - so a small difference to be expected

[270:297,1831:1857] min -1.119e-09, max 0.04783, mean 0.00147, std 0.005706, sum 1.032, np 702

Apparent peak convolved flux of the source:

[270:297,1832:1856] min -0.0004496, max 0.02002, mean 0.0006666, std 0.002545, sum 0.4319, np 648

@viralp please take note -- you have previously tried using the beam within cubical and not getting decent subtraction when peeling sources from intrinsic models

The example use is (pointing center may be set to be the phase center if you did not mosaic via 'DataPhaseDir':

gocubical --sol-jones g,dd --data-ms msdir/1491291289.1ghz.1.1ghz.4hrs.ms --data-column CORRECTED_DATA --data-time-chunk 8 --data-freq-chunk 0 --model-list "output/deep2.DicoModel@msdir/tag2.reg" --model-ddes auto --weight-column WEIGHT --flags-apply FLAG --flags-auto-init legacy --madmax-enable 0 --madmax-global-threshold 0,0 --madmax-threshold 0,0,10 --sol-stall-quorum 0.95 --sol-term-iters 50 --sol-min-bl 110.0 --sol-max-bl 0 --dist-max-chunks 4 --out-name output/deep2cal --out-overwrite 1 --out-mode sr --out-column DE_DATA --out-subtract-dirs 0  --g-time-int 8 --g-freq-int 0 --g-clip-low 0 --g-clip-high 0 --g-type complex-diag --g-update-type phase-diag --g-max-prior-error 0.35 --g-max-post-error 0.35 --g-max-iter 100 --degridding-OverS 11 --degridding-Support 7 --degridding-Nw 100 --degridding-wmax 0 --degridding-Padding 1.7 --degridding-NDegridBand 15 --degridding-MaxFacetSize 0.15 --degridding-MinNFacetPerAxis 1 --dist-nthread 1 --dist-nworker 16 --dist-ncpu 4 --degridding-NProcess 8 --degridding-BeamModel FITS --degridding-FITSFile 'input/meerkat_pb_jones_cube_95channels_$(corr)_$(reim).fits' --out-model-column MODEL_OUT --sel-field 2 --degridding-PointingCenterAt j2000,4h13m26.40,-80d00m00s

This work is done in preparation for SKA-MID. I will next port heterogeneous beams to this package

bennahugo · 2022-03-23T09:54:49Z

@JSKenyon please review

JSKenyon

Looks fine to me, bar my single comment.

JSKenyon · 2022-03-23T09:59:27Z

cubical/workers.py

+                return a + b
+            foo(np.arange(5), np.arange(5))
+            return nthread
+        except:


Unqualified excepts are generally frowned upon. Does this not raise a consistent exception?

it raises a massively complex numba exception which I'm not sure how to properly catch. let me see if I can reproduce. I'm just worried this exception interface will change (numba seems constantly evolving)

fixes-459

60c24cb

JSKenyon approved these changes Mar 22, 2022

View reviewed changes

TDD->TBB typo in warning

95c5727

[WIP] downgrade to workqueue parallelization

f2a9046

bennahugo added 3 commits March 22, 2022 15:19

Workaround for TBB detection failure

97ecec0

[wip] use WCS to compute ra dec to read out of beam

d5c59d2

Add pointing center specifier in case we are dealing with a model rel…

f824f0e

…ative to a mosaic rephased center

JSKenyon approved these changes Mar 23, 2022

View reviewed changes

bennahugo merged commit 06ea755 into ratt-ru:master Mar 23, 2022

bennahugo deleted the issue-459 branch March 23, 2022 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixes-459 #460

fixes-459 #460

bennahugo commented Mar 18, 2022

JSKenyon left a comment

JSKenyon Mar 22, 2022

bennahugo commented Mar 22, 2022

JSKenyon commented Mar 22, 2022

bennahugo commented Mar 22, 2022

bennahugo commented Mar 22, 2022

bennahugo commented Mar 22, 2022

bennahugo commented Mar 22, 2022 •

edited

bennahugo commented Mar 23, 2022

bennahugo commented Mar 23, 2022

JSKenyon left a comment

JSKenyon Mar 23, 2022

bennahugo Mar 23, 2022

fixes-459 #460

fixes-459 #460

Conversation

bennahugo commented Mar 18, 2022

JSKenyon left a comment

Choose a reason for hiding this comment

JSKenyon Mar 22, 2022

Choose a reason for hiding this comment

bennahugo commented Mar 22, 2022

JSKenyon commented Mar 22, 2022

bennahugo commented Mar 22, 2022

bennahugo commented Mar 22, 2022

bennahugo commented Mar 22, 2022

bennahugo commented Mar 22, 2022 • edited

bennahugo commented Mar 23, 2022

bennahugo commented Mar 23, 2022

JSKenyon left a comment

Choose a reason for hiding this comment

JSKenyon Mar 23, 2022

Choose a reason for hiding this comment

bennahugo Mar 23, 2022

Choose a reason for hiding this comment

bennahugo commented Mar 22, 2022 •

edited