demix steps fail after array exception warning #258

maria-mmtz · 2019-10-11T10:13:30Z

Hi,
I have been trying to reduce my data using prefactor, however, my target is very close to A-team sources. During the pipeline run for the target I received this interesting warning before it failed for each subband:

WARNING node.852592d2a0bb.executable_args.L733787_SB243_uv.MS: /opt/lofarsoft/bin/NDPPP stderr:
std exception detected: ArrayBase::operator()(b,e,i) - incorrectly specified
begin: [0, 0]
end: [74, 74]
incr: [1, 1]

array shape: [62, 62]
required: b >= 0; b <= e; e < shape; i >= 0

I am not sure how to rerun dppp outside the pipeline to check how the demix steps are working, but I am attaching my latest logfile and parset (changed to .txt so I can upload it) if this is of any help.

Pre-Facet-Target.parset.txt

pipeline-Pre-Facet-Target-2019-10-10T12:14:58.log

adrabent · 2019-11-07T09:20:43Z

Hi maria-mmtz,

unfortunately I am not able to reproduce your error with a raw LBA data set. Have these data already been pre-processed? What is the integration time and your frequency resolution as well as the total amount of input frequency channels and time steps compared to your demixing step parameters?

maria-mmtz · 2019-11-07T13:24:36Z

Hi @adrabent,
Thank you for looking into this.

unfortunately I am not able to reproduce your error with a raw LBA data set. Have these data already been pre-processed?

My data is HBA and I downloaded them from the LTA, I suppose they have already been processed up to a certain degree?

What is the integration time and your frequency resolution as well as the total amount of input frequency channels and time steps compared to your demixing step parameters?

I am not sure how to accurately answer these questions... The integration time is 1s with 64 channels/subband and 243 subbands in total, the averaging steps is in time 1.0 and in frequency 4.0. In the demixing the averaging steps is 10 for time and 16 for frequency. Does that help? If not what else should I look into?

adrabent · 2019-11-07T14:51:12Z

Hmm.. I was just wondering if demix might need some regular grid, i.e. if you have, lets's say 600 timesteps and you use like demix_timestep of 77, then it could crash.
But I also tried this, and demix still works fine (I also checked pre-processed HBA data). Is there a possibility to point me to one of your measurement sets? Is the data available on CEP3?

maria-mmtz · 2019-11-07T15:48:39Z

@adrabent yes I had uploaded two MSs, one of the target (L733787_SB145_uv.MS) and one of the calibrator (L733793_SB145_uv.MS) at /data/scratch/moutzouri

adrabent · 2019-11-07T17:04:04Z

I did some modifications on the target pipeline.
Please report if issue still persists.

maria-mmtz · 2019-11-12T11:36:27Z

Hi, I still get an error, however it seems to be a different one, can't quite figure out what happened. I'm attaching the log file
pipeline-Pre-Facet-Target-new-2019-11-08T14:48:43.log

tikk3r · 2019-11-12T12:39:20Z

I seem to recall the new error had something to do with copying data. Are you running out of disk space perhaps?

maria-mmtz · 2019-11-13T11:40:02Z

Hi, I run it again after freeing up some space. The target data folder is about 5TB and I have 12TB free. It fails again, the output is the same.

darafferty · 2019-11-13T11:48:50Z

I just ran into this error myself (though not in prefactor), and it was due to running out of memory. You might watch the memory usage (e.g., with "top") while it's running to see if this could be the problem.

maria-mmtz · 2019-11-14T15:01:53Z

Hello, it seems that after the pipeline stopped, NDPPP was still eating up all the memory so I manually forced it to stop. I ran it again and it looks like it worked a little better but it's still unsuccessful.
I can't upload the log file as it is quite big, is there any other way to share it?

maria-mmtz · 2019-11-15T15:08:16Z

Hi, I compressed the log file from the previous run
pipeline-Pre-Facet-Target-new-2019-11-13T14:35:31.zip

tikk3r · 2019-11-15T16:42:38Z

That log shows

std exception detected: Table file /opt/Data/working/Pre-Facet-Target-new/L733787_SB015_uv.ndppp_prep_target/FIELD/table.dat does not exist

Does it (still) exist? If it does, it might have become corrupted if a previous time it crashed/got killed mid-write for example and you may need to get a fresh copy of it.

adrabent · 2019-11-18T10:01:53Z

@maria-mmtz
I would agree with @tikk3r. The easiest is to remove the working_directory and start a fresh run from scratch. If these types of errors still occur, please check whether your input files are not corrupted.

maria-mmtz · 2019-11-19T11:32:56Z

Hi, I did that and now I get this message that I didn't get before:
std exception detected: Specified source DB name does not exist
How can I fix that?
pipeline-Pre-Facet-Target-new-2019-11-18T12:06:46.log.zip

adrabent · 2019-11-19T13:14:35Z

It tries to predict the A-Team sources and to write this into the MODEL_DATA column. Therefore it looks for this file:
/opt/Data/working/Pre-Facet-Target-new/Ateam_LBA_CC.make_sourcedb_ateam

This file is created in the step make_sourcedb_ateam. Since you have not run from scratch I can't see what this step is doing in your logfile, because it was skipped. Please start to rerun your calibration or provide the logfile from the run before.

maria-mmtz · 2019-11-28T12:10:40Z

Hi,

I think I've tackled this issue and the memory problem (it looks like it was using more than 200GB before it crashed again, is that normal?). I'm now receiving this message:
zero-size array to reduction operation minimum which has no identity
Any ideas?
pipeline-Pre-Facet-Target-new-2019-11-26T10:33:25.zip

adrabent · 2020-08-27T08:40:55Z

@maria-mmtz .. since original issue was solved, I close this issue. If you encounter any other/new issues please open up a new thread.

adrabent pushed a commit that referenced this issue Nov 7, 2019

move filter option, #258

296bbbb

adrabent closed this as completed Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demix steps fail after array exception warning #258

demix steps fail after array exception warning #258

maria-mmtz commented Oct 11, 2019

adrabent commented Nov 7, 2019

maria-mmtz commented Nov 7, 2019

adrabent commented Nov 7, 2019 •

edited

maria-mmtz commented Nov 7, 2019

adrabent commented Nov 7, 2019

maria-mmtz commented Nov 12, 2019

tikk3r commented Nov 12, 2019

maria-mmtz commented Nov 13, 2019

darafferty commented Nov 13, 2019

maria-mmtz commented Nov 14, 2019

maria-mmtz commented Nov 15, 2019

tikk3r commented Nov 15, 2019

adrabent commented Nov 18, 2019

maria-mmtz commented Nov 19, 2019

adrabent commented Nov 19, 2019

maria-mmtz commented Nov 28, 2019

adrabent commented Aug 27, 2020

demix steps fail after array exception warning #258

demix steps fail after array exception warning #258

Comments

maria-mmtz commented Oct 11, 2019

adrabent commented Nov 7, 2019

maria-mmtz commented Nov 7, 2019

adrabent commented Nov 7, 2019 • edited

maria-mmtz commented Nov 7, 2019

adrabent commented Nov 7, 2019

maria-mmtz commented Nov 12, 2019

tikk3r commented Nov 12, 2019

maria-mmtz commented Nov 13, 2019

darafferty commented Nov 13, 2019

maria-mmtz commented Nov 14, 2019

maria-mmtz commented Nov 15, 2019

tikk3r commented Nov 15, 2019

adrabent commented Nov 18, 2019

maria-mmtz commented Nov 19, 2019

adrabent commented Nov 19, 2019

maria-mmtz commented Nov 28, 2019

adrabent commented Aug 27, 2020

adrabent commented Nov 7, 2019 •

edited