Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demix steps fail after array exception warning #258

Closed
maria-mmtz opened this issue Oct 11, 2019 · 17 comments
Closed

demix steps fail after array exception warning #258

maria-mmtz opened this issue Oct 11, 2019 · 17 comments

Comments

@maria-mmtz
Copy link

Hi,
I have been trying to reduce my data using prefactor, however, my target is very close to A-team sources. During the pipeline run for the target I received this interesting warning before it failed for each subband:

WARNING node.852592d2a0bb.executable_args.L733787_SB243_uv.MS: /opt/lofarsoft/bin/NDPPP stderr:
std exception detected: ArrayBase::operator()(b,e,i) - incorrectly specified
begin: [0, 0]
end: [74, 74]
incr: [1, 1]

array shape: [62, 62]
required: b >= 0; b <= e; e < shape; i >= 0

I am not sure how to rerun dppp outside the pipeline to check how the demix steps are working, but I am attaching my latest logfile and parset (changed to .txt so I can upload it) if this is of any help.

Pre-Facet-Target.parset.txt

pipeline-Pre-Facet-Target-2019-10-10T12:14:58.log

@adrabent
Copy link
Collaborator

adrabent commented Nov 7, 2019

Hi maria-mmtz,

unfortunately I am not able to reproduce your error with a raw LBA data set. Have these data already been pre-processed? What is the integration time and your frequency resolution as well as the total amount of input frequency channels and time steps compared to your demixing step parameters?

@maria-mmtz
Copy link
Author

Hi @adrabent,
Thank you for looking into this.

unfortunately I am not able to reproduce your error with a raw LBA data set. Have these data already been pre-processed?

My data is HBA and I downloaded them from the LTA, I suppose they have already been processed up to a certain degree?

What is the integration time and your frequency resolution as well as the total amount of input frequency channels and time steps compared to your demixing step parameters?

I am not sure how to accurately answer these questions... The integration time is 1s with 64 channels/subband and 243 subbands in total, the averaging steps is in time 1.0 and in frequency 4.0. In the demixing the averaging steps is 10 for time and 16 for frequency. Does that help? If not what else should I look into?

@adrabent
Copy link
Collaborator

adrabent commented Nov 7, 2019

Hmm.. I was just wondering if demix might need some regular grid, i.e. if you have, lets's say 600 timesteps and you use like demix_timestep of 77, then it could crash.
But I also tried this, and demix still works fine (I also checked pre-processed HBA data). Is there a possibility to point me to one of your measurement sets? Is the data available on CEP3?

@maria-mmtz
Copy link
Author

@adrabent yes I had uploaded two MSs, one of the target (L733787_SB145_uv.MS) and one of the calibrator (L733793_SB145_uv.MS) at /data/scratch/moutzouri

adrabent pushed a commit that referenced this issue Nov 7, 2019
@adrabent
Copy link
Collaborator

adrabent commented Nov 7, 2019

I did some modifications on the target pipeline.
Please report if issue still persists.

@maria-mmtz
Copy link
Author

Hi, I still get an error, however it seems to be a different one, can't quite figure out what happened. I'm attaching the log file
pipeline-Pre-Facet-Target-new-2019-11-08T14:48:43.log

@tikk3r
Copy link

tikk3r commented Nov 12, 2019

I seem to recall the new error had something to do with copying data. Are you running out of disk space perhaps?

@maria-mmtz
Copy link
Author

Hi, I run it again after freeing up some space. The target data folder is about 5TB and I have 12TB free. It fails again, the output is the same.

@darafferty
Copy link
Contributor

I just ran into this error myself (though not in prefactor), and it was due to running out of memory. You might watch the memory usage (e.g., with "top") while it's running to see if this could be the problem.

@maria-mmtz
Copy link
Author

Hello, it seems that after the pipeline stopped, NDPPP was still eating up all the memory so I manually forced it to stop. I ran it again and it looks like it worked a little better but it's still unsuccessful.
I can't upload the log file as it is quite big, is there any other way to share it?

@maria-mmtz
Copy link
Author

Hi, I compressed the log file from the previous run
pipeline-Pre-Facet-Target-new-2019-11-13T14:35:31.zip

@tikk3r
Copy link

tikk3r commented Nov 15, 2019

That log shows

std exception detected: Table file /opt/Data/working/Pre-Facet-Target-new/L733787_SB015_uv.ndppp_prep_target/FIELD/table.dat does not exist

Does it (still) exist? If it does, it might have become corrupted if a previous time it crashed/got killed mid-write for example and you may need to get a fresh copy of it.

@adrabent
Copy link
Collaborator

@maria-mmtz
I would agree with @tikk3r. The easiest is to remove the working_directory and start a fresh run from scratch. If these types of errors still occur, please check whether your input files are not corrupted.

@maria-mmtz
Copy link
Author

Hi, I did that and now I get this message that I didn't get before:
std exception detected: Specified source DB name does not exist
How can I fix that?
pipeline-Pre-Facet-Target-new-2019-11-18T12:06:46.log.zip

@adrabent
Copy link
Collaborator

It tries to predict the A-Team sources and to write this into the MODEL_DATA column. Therefore it looks for this file:
/opt/Data/working/Pre-Facet-Target-new/Ateam_LBA_CC.make_sourcedb_ateam

This file is created in the step make_sourcedb_ateam. Since you have not run from scratch I can't see what this step is doing in your logfile, because it was skipped. Please start to rerun your calibration or provide the logfile from the run before.

@maria-mmtz
Copy link
Author

Hi,

I think I've tackled this issue and the memory problem (it looks like it was using more than 200GB before it crashed again, is that normal?). I'm now receiving this message:
zero-size array to reduction operation minimum which has no identity
Any ideas?
pipeline-Pre-Facet-Target-new-2019-11-26T10:33:25.zip

@adrabent
Copy link
Collaborator

@maria-mmtz .. since original issue was solved, I close this issue. If you encounter any other/new issues please open up a new thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants