Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #124

Closed
physixfan opened this issue Mar 1, 2017 · 24 comments
Closed

Segmentation fault (core dumped) #124

physixfan opened this issue Mar 1, 2017 · 24 comments

Comments

@physixfan
Copy link

physixfan commented Mar 1, 2017

Hi

I am really frustrated about the Segmentation fault (core dumped) error. It shows up when I try to read the .bp file produced from some large runs. Actually the file is not very large, just ~4GB. I can successfully read some larger files, so I don't think it is because of the limitation of memory. This is the code that I use (in python):

[xfan@gr-fe1 target_shape_46]$ python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import adios as ad
>>> binary_file = ad.file("record.bp")
Segmentation fault (core dumped)
@pnorbert
Copy link
Contributor

pnorbert commented Mar 1, 2017

Hi, can you bpls the content of the file? How about adios 1.11?

@jychoi-hpc
Copy link
Member

jychoi-hpc commented Mar 1, 2017 via email

@physixfan
Copy link
Author

bpls:
[xfan@gr-fe1 target_shape_46]$ bpls -latv record.bp
Segmentation fault (core dumped)

The bp file is produced by a code which uses adios 1.3.1, and my python code is using version 1.11.0. Do you think the difference of the versions have an effect on this as well? It's difficult to recompile my advisor's code with adios 1.11.0, while adios 1.3.1 does not support python...

@pnorbert
Copy link
Contributor

pnorbert commented Mar 1, 2017 via email

@physixfan
Copy link
Author

physixfan commented Mar 1, 2017

Well, not all the record.bp files fail, only some of them. And I still can't figure out when. I am working with a LANL server called Grizzly, it is a new machine so maybe that's the reason. I don't know how to upload the file to ORNL, and I am trying to upload it to Dropbox. It's just 4GB, so I don't think it will take long.

@pnorbert
Copy link
Contributor

pnorbert commented Mar 1, 2017 via email

@physixfan
Copy link
Author

I would say it's big. I usually use 4096 processors to run the code.

@pnorbert
Copy link
Contributor

pnorbert commented Mar 1, 2017 via email

@physixfan
Copy link
Author

https://www.dropbox.com/s/yfbzhden7hlj98a/record.bp?dl=0
This is the dropbox link of the record.bp file. Thanks for your help here~!

@pnorbert
Copy link
Contributor

pnorbert commented Mar 1, 2017 via email

@physixfan
Copy link
Author

Thanks. Since reading the original file causes a segment fault, I copied and restarted this run. I asked for 32 processors to run. The relevant output is:

 rank=           0  ADIOS group_size=               206164
 rank=           0  ADIOS total size=               211574
  ADIOS file write...
r0 offset=(  0,  0,  0)
r0 size=( 33, 65,  3)
r0 i=  0: 32,  0: 64,  0:  2
  ADIOS file close...
 rank=           1  ADIOS group_size=               199736
 rank=           1  ADIOS total size=               205146
r1 offset=( 33,  0,  0)
r1 size=( 32, 65,  3)
r1 i=  1: 32,  0: 64,  0:  2
 rank=           2  ADIOS group_size=               199736
 rank=           2  ADIOS total size=               205146
r2 offset=( 65,  0,  0)
r2 size=( 32, 65,  3)
r2 i=  1: 32,  0: 64,  0:  2

(and some repetitive outputs)

And this is the relevant code in pixie2d:

      call adios_group_size(handle,groupsize,totalsize,err)

      if (err /= 0) then
        write (*,*) 'Problem in writeRecordFile'
        write (*,*) 'rank=',my_rank,'  ERROR in "adios_group_size"'
        stop
      endif

      if (adios_debug) then
        write (*,*) 'rank=',my_rank,' ADIOS group_size=',groupsize
        write (*,*) 'rank=',my_rank,' ADIOS total size=',totalsize
      endif

@pnorbert
Copy link
Contributor

pnorbert commented Mar 2, 2017 via email

@pnorbert
Copy link
Contributor

pnorbert commented Mar 2, 2017 via email

@physixfan
Copy link
Author

I do not save the bp file for the 32 process run. But I made a new run with the same input file with 4096 processors and 30 minutes. There's not much data there yet, and the file can be read correctly. Here's the link: https://www.dropbox.com/s/z21y5s9worh6t56/record2.bp?dl=0

@pnorbert
Copy link
Contributor

pnorbert commented Mar 2, 2017 via email

@physixfan
Copy link
Author

physixfan commented Mar 2, 2017

The latter:

r* offset=( 49,  0,  0)
r* size=(  4,  5,  3)
r* i=  1:  4,  0:  4,  0:  2
 rank=           0  ADIOS group_size=                 2644
 rank=           0  ADIOS total size=                 8054
  ADIOS file write...
r0 offset=(  0,  0,  0)
r0 size=(  5,  5,  3)
r0 i=  0:  4,  0:  4,  0:  2
  ADIOS file close...
 rank=           1  ADIOS group_size=                 1976
 rank=           1  ADIOS total size=                 7386
r1 offset=(  5,  0,  0)
r1 size=(  4,  5,  3)
r1 i=  1:  4,  0:  4,  0:  2
rank= 1 ihip=5 ilom=0 jhip=5 jlom=0 khip=2 klom=0
 rank=           1  ADIOS open: handle=             59689568
 rank=           3  ADIOS group_size=                 1976
 rank=           3  ADIOS total size=                 7386
r3 offset=( 13,  0,  0)
r3 size=(  4,  5,  3)
r3 i=  1:  4,  0:  4,  0:  2
 rank=           2  ADIOS group_size=                 1976
 rank=           2  ADIOS total size=                 7386

@physixfan
Copy link
Author

So what are the implications from these? How can I fix the problem?

@pnorbert
Copy link
Contributor

pnorbert commented Mar 3, 2017 via email

@pnorbert
Copy link
Contributor

pnorbert commented Mar 3, 2017 via email

@physixfan
Copy link
Author

The segment fault is depending on the number of time steps as far as I observed. But since you mentioned even for the small run the bp file is broken, I don't know now... I'll contact Luis about this issue and see whether he can update the version of ADIOS.

@physixfan
Copy link
Author

Hi!

Luis has re-compiled pixie2d with ADIOS version 1.10. However, I still see the segmentation fault error. The following link is a record.bp file produced by the new code, please check whether you can find what caused these errors. Thanks!

https://www.dropbox.com/s/qpk669c1lkh4f6j/record_Mar6.bp?dl=0

@physixfan
Copy link
Author

Hi, do you have time to take a look? Thanks!

@pnorbert
Copy link
Contributor

pnorbert commented Mar 17, 2017 via email

@physixfan
Copy link
Author

Thank you very much! Luis has solved this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants