Segmentation fault when doing a "for reduction" with openmp #1143

dominikbell · 2022-06-28T09:00:05Z

dominikbell
Jun 28, 2022

When pyccelizing a function that does a for reduction of two arrays with openmp, a segmentation fault occurs. The same code with only one array in the for reduction does not encounter this error.

As an example take a file pyccel_test.py with two functions mult_reduct1 and mult_reduct2:

def mult_reduct1(arr : 'float[:,:]', arr1 : 'float[:,:]'):
    """
    Parameters:
        2 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1)
    for i in range(nx):
        j = i * i

        arr1[i,0] = arr[i,0]**2
        arr1[i,1] = arr[i,1]**2

    #$ omp end parallel

def mult_reduct2(arr : 'float[:,:]', arr1 : 'float[:,:]', arr2 : 'float[:,:]'):
    """
    Parameters:
        3 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1, arr2)
    for i in range(nx):
        j = i * i

        arr1[i,0] = arr[i,0]**2
        arr1[i,1] = arr[i,1]**2

        arr2[i,0] = arr[i,0]**2
        arr2[i,1] = arr[i,1]**2


    #$ omp end parallel

which is pyccelized with the command pyccel --openmp pyccel_test.py. The pyccelization itself gives no errors.

When calling these functions in another python file with e.g.

import pyccel_test as pt

nx = 1000
ny = 1000

array = np.random.rand(nx,ny)
array1 = np.zeros(np.shape(array), dtype=float)
array2 = np.zeros(np.shape(array), dtype=float)

pt.mult_reduct1(array, array1)
print('\n results \n')
print(np.sum(array1))

pt.mult_reduct2(array, array1, array2)
print('\n results \n')
print(np.sum(array1))
print(np.sum(array2))

the output reads


 results 

674.2675057646197
Segmentation fault (core dumped)

I am using the newest pyccel version 1.5.2

Answered by EmilyBourne

Jun 28, 2022

I think it is the memory problem described in the stack overflow link.
If I translate:

def mult_reduct1(arr : 'float[:,:]', arr1 : 'float[:,:]'):
    """
    Parameters:
        2 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1)
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

    #$ omp end parallel


def mult_reduct2(arr : 'float[:,:]', arr1 : 'float[:,:]', arr2 : 'float[:,:]'):
    """
    Parameters:
        3 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ om…

View full answer

EmilyBourne · 2022-06-28T09:38:51Z

EmilyBourne
Jun 28, 2022
Collaborator

Please can you share the generated code (see the __pyccel__ folder)

0 replies

dominikbell · 2022-06-28T11:00:26Z

dominikbell
Jun 28, 2022
Author

The fortran code from the .f90 file reads as follows:

module pyccel_test


  use, intrinsic :: ISO_C_Binding, only : i64 => C_INT64_T , f64 => &
        C_DOUBLE
  implicit none

  contains

  !........................................
  !________________________________!
  !                                !
  !    Parameters:                 !
  !        2 arrays of the same size!
  !                                !
  !________________________________!

  subroutine mult_reduct1(arr, arr1) 

    implicit none

    real(f64), intent(in) :: arr(0:,0:)
    real(f64), intent(inout) :: arr1(0:,0:)
    integer(i64) :: nx
    integer(i64) :: ny
    integer(i64) :: i
    integer(i64) :: j

    nx = size(arr, 1_i64, i64)
    ny = size(arr, 2_i64, i64)
    !$omp parallel private(i, j)
    !$omp do reduction(+: arr1)
    do i = 0_i64, nx - 1_i64, 1_i64
      j = i * i
      arr1(0_i64, i) = arr(0_i64, i) ** 2_i64
      arr1(1_i64, i) = arr(1_i64, i) ** 2_i64
    end do
    !$omp end do
    !$omp end parallel

  end subroutine mult_reduct1
  !........................................

  !........................................
  !________________________________!
  !                                !
  !    Parameters:                 !
  !        3 arrays of the same size!
  !                                !
  !________________________________!

  subroutine mult_reduct2(arr, arr1, arr2) 

    implicit none

    real(f64), intent(in) :: arr(0:,0:)
    real(f64), intent(inout) :: arr1(0:,0:)
    real(f64), intent(inout) :: arr2(0:,0:)
    integer(i64) :: nx
    integer(i64) :: ny
    integer(i64) :: i
    integer(i64) :: j

    nx = size(arr, 1_i64, i64)
    ny = size(arr, 2_i64, i64)
    !$omp parallel private(i, j)
    !$omp do reduction(+: arr1, arr2)
    do i = 0_i64, nx - 1_i64, 1_i64
      j = i * i
      arr1(0_i64, i) = arr(0_i64, i) ** 2_i64
      arr1(1_i64, i) = arr(1_i64, i) ** 2_i64
      arr2(0_i64, i) = arr(0_i64, i) ** 2_i64
      arr2(1_i64, i) = arr(1_i64, i) ** 2_i64
    end do
    !$omp end do
    !$omp end parallel

  end subroutine mult_reduct2
  !........................................


end module pyccel_test

0 replies

EmilyBourne · 2022-06-28T11:04:50Z

EmilyBourne
Jun 28, 2022
Collaborator

I don't see anything obviously wrong in the translation. But your example doesn't implement a reduction. Given that you ask for one could this be causing the problem?

0 replies

rmahjoubi · 2022-06-28T11:14:57Z

rmahjoubi
Jun 28, 2022
Collaborator

Can you please show us where you call the functions, I assumed you are not using the provided example because np.random.rand is yet to be implemented. I tested the pyccelized functions with array = np.array([[1,2,3],[1,2,3],[1,2,3]], dtype=float) and they appear to work fine for me and give the same results as in python, I can show you the python code

0 replies

EmilyBourne · 2022-06-28T11:17:41Z

EmilyBourne
Jun 28, 2022
Collaborator

Can you please show us where you call the functions, I assumed you are not using the provided example because np.random.rand is yet to be implemented. I tested the pyccelized functions with array = np.array([[1,2,3],[1,2,3],[1,2,3]], dtype=float) and they appear to work fine for me and give the same results as in python, I can show you the python code

I don't think the file with np.random.rand is pyccelised so it doesn't matter if we don't implement it for arrays yet.

If it is working for you then it may be a compiler issue. What compiler are you both using?

0 replies

EmilyBourne · 2022-06-28T11:19:42Z

EmilyBourne
Jun 28, 2022
Collaborator

I would expect the implemented code to behave differently with different compilers as putting a reduction clause on a loop that doesn't reduce is non-standard behaviour

0 replies

dominikbell · 2022-06-28T11:22:50Z

dominikbell
Jun 28, 2022
Author

As far as

I don't see anything obviously wrong in the translation. But your example doesn't implement a reduction. Given that you ask for one could this be causing the problem?

In this example there is not reduction, you are right, but the application in which I encountered this problem is much more complex (in which a reduction is necessary). Even if I change the code above into something where reduction is needed, e.g.

def mult_reduct1(arr : 'float[:,:]', arr1 : 'float[:,:]'):
    """
    Parameters:
        2 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1)
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

    #$ omp end parallel


def mult_reduct2(arr : 'float[:,:]', arr1 : 'float[:,:]', arr2 : 'float[:,:]'):
    """
    Parameters:
        3 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1, arr2)
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

        arr2[0,0] += arr[i,0]**2
        arr2[1,1] += arr[i,1]**2

    #$ omp end parallel

the segmentation fault persists.

0 replies

dominikbell · 2022-06-28T11:24:39Z

dominikbell
Jun 28, 2022
Author

Can you please show us where you call the functions, I assumed you are not using the provided example because np.random.rand is yet to be implemented. I tested the pyccelized functions with array = np.array([[1,2,3],[1,2,3],[1,2,3]], dtype=float) and they appear to work fine for me and give the same results as in python, I can show you the python code

I don't think the file with np.random.rand is pyccelised so it doesn't matter if we don't implement it for arrays yet.

If it is working for you then it may be a compiler issue. What compiler are you both using?

Yes, the file from which I call the file with the for reduction loop is a standard, non-pyccelized python file.

I am running a Ubuntu VM on a MacOS host. How can I check which compiler is used? (I would guess its the gcc compiler installed on the Mac host).

0 replies

EmilyBourne · 2022-06-28T11:28:16Z

EmilyBourne
Jun 28, 2022
Collaborator

I am running a Ubuntu VM on a MacOS host. How can I check which compiler is used? (I would guess its the gcc compiler installed on the Mac host).

It should be the default gcc compiler in your environment. gcc --version

0 replies

dominikbell · 2022-06-28T11:34:33Z

dominikbell
Jun 28, 2022
Author

When I run gcc --version in the Ubuntu VM I get

gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

When I run it in my host Mac machine I get

Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: arm64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

In order to compile Psydac kernel with pyccel, I had to change the flags to

PSYDAC_BACKEND_GPYCCEL['flags'] = '-O3 -march=native -mtune=native -ffast-math -ffree-line-length-none'

so take out the -mavx flag, that's why I think the host compiler might be the relevant one.

0 replies

EmilyBourne · 2022-06-28T11:36:23Z

EmilyBourne
Jun 28, 2022
Collaborator

Could this be related to this: https://stackoverflow.com/questions/45885660/fortran-openmp-with-multiple-simultaneous-reductions-results-in-seg-fault ?

I wonder if it has more success if you specify the actual reduced variable:

def mult_reduct2(arr : 'float[:,:]', arr1 : 'float[:,:]', arr2 : 'float[:,:]'):
    """
    Parameters:
        3 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1[0,0], arr1[1,1], arr2[0,0], arr2[1,1])
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

        arr2[0,0] += arr[i,0]**2
        arr2[1,1] += arr[i,1]**2

    #$ omp end parallel

But I'm not sure if pyccel handles this

0 replies

EmilyBourne · 2022-06-28T11:38:40Z

EmilyBourne
Jun 28, 2022
Collaborator

The good news is, I can reproduce this on my computer (Ubuntu gcc9.4) so I can play around and see if I find anything

0 replies

dominikbell · 2022-06-28T11:40:46Z

dominikbell
Jun 28, 2022
Author

Could this be related to this: https://stackoverflow.com/questions/45885660/fortran-openmp-with-multiple-simultaneous-reductions-results-in-seg-fault ?

I wonder if it has more success if you specify the actual reduced variable:
def mult_reduct2(arr : 'float[:,:]', arr1 : 'float[:,:]', arr2 : 'float[:,:]'):
    """
    Parameters:
        3 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1[0,0], arr1[1,1], arr2[0,0], arr2[1,1])
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

        arr2[0,0] += arr[i,0]**2
        arr2[1,1] += arr[i,1]**2

    #$ omp end parallel
But I'm not sure if pyccel handles this

This doesn't work; when trying something like

#$ omp for reduction(+: arr1[0,0], arr1[1,1])

the following error pops up when compiling:

Expected ',' or ')' => 'on(+: arr1*[0,0], arr'

Also, this would have been a highly unpractical solution :)

0 replies

EmilyBourne · 2022-06-28T11:53:52Z

EmilyBourne
Jun 28, 2022
Collaborator

I think it is the memory problem described in the stack overflow link.
If I translate:

def mult_reduct1(arr : 'float[:,:]', arr1 : 'float[:,:]'):
    """
    Parameters:
        2 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1)
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

    #$ omp end parallel


def mult_reduct2(arr : 'float[:,:]', arr1 : 'float[:,:]', arr2 : 'float[:,:]'):
    """
    Parameters:
        3 arrays of the same size
    """
    nx = arr.shape[1]
    ny = arr.shape[0]

    #$ omp parallel private (i, j)
    #$ omp for reduction(+: arr1, arr2)
    for i in range(nx):
        j = i * i

        arr1[0,0] += arr[i,0]**2
        arr1[1,1] += arr[i,1]**2

        arr2[0,0] += arr[i,0]**2
        arr2[1,1] += arr[i,1]**2

    #$ omp end parallel

if __name__ == '__main__':
    import numpy as np

    nx = 1000
    ny = 1000

    array = np.empty((nx,ny))
    for i in range(nx):
        for j in range(ny):
            array[i,j] = np.random.rand()
    array1 = np.zeros_like(array)
    array2 = np.zeros_like(array)

    mult_reduct1(array, array1)
    print('\n results \n')
    print(np.sum(array1))

    mult_reduct2(array, array1, array2)
    print('\n results \n')
    print(np.sum(array1))
    print(np.sum(array2))

and run the executable I get:


 results 

675.554951655578
Segmentation fault (core dumped)

but if I change:

    nx = 1000
    ny = 1000

to

    nx = 100
    ny = 100

then I get:


 results 

68.344419098268

 results 

136.688838196536
68.344419098268

1 reply

EmilyBourne Jun 30, 2022
Collaborator

Copying the link here to have all information together:
https://stackoverflow.com/questions/45885660/fortran-openmp-with-multiple-simultaneous-reductions-results-in-seg-fault

EmilyBourne · 2022-06-28T11:55:09Z

EmilyBourne
Jun 28, 2022
Collaborator

I'm also on a VM, so @rmahjoubi presumably just has more memory available than we do

0 replies

dominikbell · 2022-06-28T12:23:44Z

dominikbell
Jun 28, 2022
Author

My VM has 4GB of RAM, so if I set

export OMP_STACKSIZE=999m

then each process (I run with 4 openmp processes) has 1GB of memory. Does a 1000x1000 array already exceed that?

When I run a python via command line

>>> import numpy as np
>>> x = np.random.rand(1000,1000)
>>> x.nbytes
  8000000

the size of a 1000x1000 array is just 8MB; so even if two arrays are stored, it comes nowhere near 1GB.

0 replies

EmilyBourne · 2022-06-28T12:28:39Z

EmilyBourne
Jun 28, 2022
Collaborator

Python can be quite greedy. That's why I ran with a program rather than via python. Can you confirm the behaviour? (i.e. does it start working for you if you reduce the size?)

It is possible that python isn't grabbing all your environment variables. Does setting export OMP_STACKSIZE=999m work for the program version?

0 replies

dominikbell · 2022-06-28T12:35:28Z

dominikbell
Jun 28, 2022
Author

Python can be quite greedy. That's why I ran with a program rather than via python. Can you confirm the behaviour? (i.e. does it start working for you if you reduce the size?)

It is possible that python isn't grabbing all your environment variables. Does setting export OMP_STACKSIZE=999m work for the program version?

I can confirm that when I reduce to a 100x100 array, the program runs without any issues.

It seems to me that python is recognizing the environment variable; when I set export OMP_STACKSIZE=1m the code runs into a segmentation fault even for 100x100 arrays.

I'm just amazed by the fact that 4GB of memory might not be enough for handling these arrays ...

0 replies

EmilyBourne · 2022-06-28T12:57:03Z

EmilyBourne
Jun 28, 2022
Collaborator

I am very confused by this. Even for nx=ny=500 valgrind/massif only detects 18MB used:

0 replies

EmilyBourne · 2022-06-28T12:57:35Z

EmilyBourne
Jun 28, 2022
Collaborator

I'm not sure if massif handles openmp properly though

0 replies

dominikbell · 2022-06-30T08:34:48Z

dominikbell
Jun 30, 2022
Author

So I tried the ulimit -s command, increasing this to 16000 enables me to run the code for the 1000x1000 array.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when doing a "for reduction" with openmp #1143

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 21 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Segmentation fault when doing a "for reduction" with openmp #1143

dominikbell Jun 28, 2022

Replies: 21 comments · 1 reply

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 28, 2022 Author

EmilyBourne Jun 28, 2022 Collaborator

rmahjoubi Jun 28, 2022 Collaborator

EmilyBourne Jun 28, 2022 Collaborator

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 28, 2022 Author

dominikbell Jun 28, 2022 Author

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 28, 2022 Author

EmilyBourne Jun 28, 2022 Collaborator

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 28, 2022 Author

EmilyBourne Jun 28, 2022 Collaborator

EmilyBourne Jun 30, 2022 Collaborator

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 28, 2022 Author

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 28, 2022 Author

EmilyBourne Jun 28, 2022 Collaborator

EmilyBourne Jun 28, 2022 Collaborator

dominikbell Jun 30, 2022 Author

dominikbell
Jun 28, 2022

Replies: 21 comments 1 reply

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 28, 2022
Author

EmilyBourne
Jun 28, 2022
Collaborator

rmahjoubi
Jun 28, 2022
Collaborator

EmilyBourne
Jun 28, 2022
Collaborator

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 28, 2022
Author

dominikbell
Jun 28, 2022
Author

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 28, 2022
Author

EmilyBourne
Jun 28, 2022
Collaborator

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 28, 2022
Author

EmilyBourne
Jun 28, 2022
Collaborator

EmilyBourne Jun 30, 2022
Collaborator

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 28, 2022
Author

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 28, 2022
Author

EmilyBourne
Jun 28, 2022
Collaborator

EmilyBourne
Jun 28, 2022
Collaborator

dominikbell
Jun 30, 2022
Author