-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation violation when using MPI_BGQ trnasport layer #138
Comments
Can you please tell us the mpi size and parameters you have run with? Thank
you.
…On Tue, Jul 11, 2017 at 12:24 PM, Jeremy FOURIAUX ***@***.***> wrote:
When developping a benchmarking test I have observed that a segmentation
fault appears in ADIOS, when using MPI_BGQ transport layer. This
segmentation fault dont appears when using MPI or MPI_Aggregate transport
layers.
Attached is stack of segfault from totalview run.
https://github.com/fouriaux/adios_experiments/tree/master/
adios_buffer_size
[image: adios_core_dump_srun]
<https://user-images.githubusercontent.com/28300081/28078743-234d7a1e-6666-11e7-8652-3e6d57eb360b.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#138>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADGMLUzt1htmG83RF-hsBbkz_HF0YygFks5sM6HXgaJpZM4OUfRo>
.
|
Hello, I have tested with mpi size between 4 -> 8192 mpi ranks. and with size of write of 4k per rank, 1000 times each rank. |
If I do not turn off verbosity completely (note that default is 2 for
errors+warnings), I get this errors:
ERROR: error allocating memory to build var index. Index aborted
This error happens on Mira/Cetus at ANL with 16 mpi tasks per node. If I
run it with 8 mpi tasks per node, the test runs fine.
Also, I fixed the example to correctly use 64bit offsets and global
dimensions in the global array:
…--- a/adios_buffer_size/adios_buffer_size.cpp
+++ b/adios_buffer_size/adios_buffer_size.cpp
@@ -23,7 +23,7 @@ static const char*
method; // adios met
static int rank; // mpi
rank id of a process
static int nb_ranks; // nb
ranks in MPI_COMM_WORLD
-static int total_size; //
total_size to be written
+static uint64_t total_size; //
total_size to be written
static int* buffer; //
allocated buffer to write of file per rank
static int batch_size; //
size of one write in
static int nb_batchs; //
number of batchs to write per rank
@@ -42,7 +42,7 @@ void open (const char* filename) {
void write (int* buffer){
adios_write (adios_handle, "global_size", (void*) &total_size);
adios_write (adios_handle, "batch_size", (void*) &batch_size);
- int offset = batch_size * rank;
+ uint64_t offset = batch_size * rank;
for (int i = 0 ; i < nb_batchs; i++) {
adios_write (adios_handle, "offset", (void*) &offset);
adios_write (adios_handle, "data", buffer);
@@ -59,11 +59,11 @@ void initAdios (const char* method, int
max_buffer_size) {
adios_init_noxml ( comm);
// adios_set_max_buffer_size ( max_buffer_size);
adios_declare_group ( &adios_group_id,"report", "", adios_stat_no);
- adios_select_method ( adios_group_id, method, "verbose=0", "");
- adios_define_var ( adios_group_id, "global_size", "",
adios_integer, "", "", "");
+ adios_select_method ( adios_group_id, method, "verbose=2", "");
+ adios_define_var ( adios_group_id, "global_size", "",
adios_long, "", "", "");
adios_define_var ( adios_group_id, "batch_size", "",
adios_integer, "", "", "");
for (int i = 0; i < nb_batchs; i++) {
- offset_ids.push_back ( adios_define_var ( adios_group_id, "offset",
"", adios_integer, "", "", ""));
+ offset_ids.push_back ( adios_define_var ( adios_group_id, "offset",
"", adios_long, "", "", ""));
data_ids.push_back ( adios_define_var ( adios_group_id, "data",
"", adios_integer, "batch_size", "global_size", "offset"));
}
}
@@ -95,7 +95,7 @@ int main (int argc, char** argv) {
MPI_Comm_size(comm, &nb_ranks);
initAdios(method, max_buffer_size);
- total_size = batch_size * nb_ranks * nb_batchs;
+ total_size = (uint64_t)batch_size * (uint64_t)nb_ranks *
(uint64_t)nb_batchs;
initBuffer (buffer);
open (out_file);
I don't know why it runs out of memory, and why only with the MPI_BGQ
method.
On Wed, Jul 12, 2017 at 3:56 AM, Jeremy FOURIAUX ***@***.***> wrote:
Hello, I have tested with mpi size between 4 -> 8192 mpi ranks. and with
size of write of 4k per rank, 1000 times each rank.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#138 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADGMLXmtbbQgxerRd5nI7L-z9PhyR2onks5sNHw8gaJpZM4OUfRo>
.
|
Awesome,
I will test it in my morning.
And if I have time I will try to dig in ADIOS source code.
Have a nice day and thanks for the correction.
Best regards,
Jeremy.
________________________________
From: pnorbert <notifications@github.com>
Sent: Thursday, July 20, 2017 9:08 PM
To: ornladios/ADIOS
Cc: Fouriaux Jérémy Pierre Benoit; Author
Subject: Re: [ornladios/ADIOS] Segmentation violation when using MPI_BGQ trnasport layer (#138)
If I do not turn off verbosity completely (note that default is 2 for
errors+warnings), I get this errors:
ERROR: error allocating memory to build var index. Index aborted
This error happens on Mira/Cetus at ANL with 16 mpi tasks per node. If I
run it with 8 mpi tasks per node, the test runs fine.
Also, I fixed the example to correctly use 64bit offsets and global
dimensions in the global array:
--- a/adios_buffer_size/adios_buffer_size.cpp
+++ b/adios_buffer_size/adios_buffer_size.cpp
@@ -23,7 +23,7 @@ static const char*
method; // adios met
static int rank; // mpi
rank id of a process
static int nb_ranks; // nb
ranks in MPI_COMM_WORLD
-static int total_size; //
total_size to be written
+static uint64_t total_size; //
total_size to be written
static int* buffer; //
allocated buffer to write of file per rank
static int batch_size; //
size of one write in
static int nb_batchs; //
number of batchs to write per rank
@@ -42,7 +42,7 @@ void open (const char* filename) {
void write (int* buffer){
adios_write (adios_handle, "global_size", (void*) &total_size);
adios_write (adios_handle, "batch_size", (void*) &batch_size);
- int offset = batch_size * rank;
+ uint64_t offset = batch_size * rank;
for (int i = 0 ; i < nb_batchs; i++) {
adios_write (adios_handle, "offset", (void*) &offset);
adios_write (adios_handle, "data", buffer);
@@ -59,11 +59,11 @@ void initAdios (const char* method, int
max_buffer_size) {
adios_init_noxml ( comm);
// adios_set_max_buffer_size ( max_buffer_size);
adios_declare_group ( &adios_group_id,"report", "", adios_stat_no);
- adios_select_method ( adios_group_id, method, "verbose=0", "");
- adios_define_var ( adios_group_id, "global_size", "",
adios_integer, "", "", "");
+ adios_select_method ( adios_group_id, method, "verbose=2", "");
+ adios_define_var ( adios_group_id, "global_size", "",
adios_long, "", "", "");
adios_define_var ( adios_group_id, "batch_size", "",
adios_integer, "", "", "");
for (int i = 0; i < nb_batchs; i++) {
- offset_ids.push_back ( adios_define_var ( adios_group_id, "offset",
"", adios_integer, "", "", ""));
+ offset_ids.push_back ( adios_define_var ( adios_group_id, "offset",
"", adios_long, "", "", ""));
data_ids.push_back ( adios_define_var ( adios_group_id, "data",
"", adios_integer, "batch_size", "global_size", "offset"));
}
}
@@ -95,7 +95,7 @@ int main (int argc, char** argv) {
MPI_Comm_size(comm, &nb_ranks);
initAdios(method, max_buffer_size);
- total_size = batch_size * nb_ranks * nb_batchs;
+ total_size = (uint64_t)batch_size * (uint64_t)nb_ranks *
(uint64_t)nb_batchs;
initBuffer (buffer);
open (out_file);
I don't know why it runs out of memory, and why only with the MPI_BGQ
method.
On Wed, Jul 12, 2017 at 3:56 AM, Jeremy FOURIAUX <notifications@github.com>
wrote:
Hello, I have tested with mpi size between 4 -> 8192 mpi ranks. and with
size of write of 4k per rank, 1000 times each rank.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#138 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADGMLXmtbbQgxerRd5nI7L-z9PhyR2onks5sNHw8gaJpZM4OUfRo>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#138 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Aa_TMTqFoq0VlweUwl4pKAzmHR20G152ks5sP6WwgaJpZM4OUfRo>.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ornladios/ADIOS","title":"ornladios/ADIOS","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/ornladios/ADIOS"}},"updates":{"snippets":[{"icon":"PERSON","message":"@pnorbert in #138: If I do not turn off verbosity completely (note that default is 2 for\nerrors+warnings), I get this errors:\n\nERROR: error allocating memory to build var index. Index aborted\n\nThis error happens on Mira/Cetus at ANL with 16 mpi tasks per node. If I\nrun it with 8 mpi tasks per node, the test runs fine.\n\nAlso, I fixed the example to correctly use 64bit offsets and global\ndimensions in the global array:\n\n--- a/adios_buffer_size/adios_buffer_size.cpp\n+++ b/adios_buffer_size/adios_buffer_size.cpp\n@@ -23,7 +23,7 @@ static const char*\nmethod; // adios met\n\n static int rank; // mpi\nrank id of a process\n static int nb_ranks; // nb\nranks in MPI_COMM_WORLD\n-static int total_size; //\ntotal_size to be written\n+static uint64_t total_size; //\ntotal_size to be written\n static int* buffer; //\nallocated buffer to write of file per rank\n static int batch_size; //\nsize of one write in\n static int nb_batchs; //\nnumber of batchs to write per rank\n@@ -42,7 +42,7 @@ void open (const char* filename) {\n void write (int* buffer){\n adios_write (adios_handle, \"global_size\", (void*) \u0026total_size);\n adios_write (adios_handle, \"batch_size\", (void*) \u0026batch_size);\n- int offset = batch_size * rank;\n+ uint64_t offset = batch_size * rank;\n for (int i = 0 ; i \u003c nb_batchs; i++) {\n adios_write (adios_handle, \"offset\", (void*) \u0026offset);\n adios_write (adios_handle, \"data\", buffer);\n@@ -59,11 +59,11 @@ void initAdios (const char* method, int\nmax_buffer_size) {\n adios_init_noxml ( comm);\n // adios_set_max_buffer_size ( max_buffer_size);\n adios_declare_group ( \u0026adios_group_id,\"report\", \"\", adios_stat_no);\n- adios_select_method ( adios_group_id, method, \"verbose=0\", \"\");\n- adios_define_var ( adios_group_id, \"global_size\", \"\",\nadios_integer, \"\", \"\", \"\");\n+ adios_select_method ( adios_group_id, method, \"verbose=2\", \"\");\n+ adios_define_var ( adios_group_id, \"global_size\", \"\",\nadios_long, \"\", \"\", \"\");\n adios_define_var ( adios_group_id, \"batch_size\", \"\",\nadios_integer, \"\", \"\", \"\");\n for (int i = 0; i \u003c nb_batchs; i++) {\n- offset_ids.push_back ( adios_define_var ( adios_group_id, \"offset\",\n\"\", adios_integer, \"\", \"\", \"\"));\n+ offset_ids.push_back ( adios_define_var ( adios_group_id, \"offset\",\n\"\", adios_long, \"\", \"\", \"\"));\n data_ids.push_back ( adios_define_var ( adios_group_id, \"data\",\n\"\", adios_integer, \"batch_size\", \"global_size\", \"offset\"));\n }\n }\n@@ -95,7 +95,7 @@ int main (int argc, char** argv) {\n MPI_Comm_size(comm, \u0026nb_ranks);\n initAdios(method, max_buffer_size);\n\n- total_size = batch_size * nb_ranks * nb_batchs;\n+ total_size = (uint64_t)batch_size * (uint64_t)nb_ranks *\n(uint64_t)nb_batchs;\n initBuffer (buffer);\n open (out_file);\n\n\nI don't know why it runs out of memory, and why only with the MPI_BGQ\nmethod.\n\n\nOn Wed, Jul 12, 2017 at 3:56 AM, Jeremy FOURIAUX \u003cnotifications@github.com\u003e\nwrote:\n\n\u003e Hello, I have tested with mpi size between 4 -\u003e 8192 mpi ranks. and with\n\u003e size of write of 4k per rank, 1000 times each rank.\n\u003e\n\u003e —\n\u003e You are receiving this because you commented.\n\u003e Reply to this email directly, view it on GitHub\n\u003e \u003chttps://github.com/ornladios/ADIOS/issues/138#issuecomment-314685893\u003e,\n\u003e or mute the thread\n\u003e \u003chttps://github.com/notifications/unsubscribe-auth/ADGMLXmtbbQgxerRd5nI7L-z9PhyR2onks5sNHw8gaJpZM4OUfRo\u003e\n\u003e .\n\u003e\n"}],"action":{"name":"View Issue","url":"#138 (comment)"}}}
|
Hi Jeremy, do you still have this issue? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When developping a benchmarking test (https://github.com/fouriaux/adios_experiments/tree/master/adios_buffer_size) I have observed that a segmentation fault appears in ADIOS, when using MPI_BGQ transport layer. This segmentation fault dont appears when using MPI or MPI_Aggregate transport layers.
Attached is stack of segfault from totalview run.
The text was updated successfully, but these errors were encountered: