Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak Remote class and test multi-threaded file remote access #3834

Merged
merged 4 commits into from
Oct 14, 2023

Conversation

eisenhauer
Copy link
Member

No description provided.

Copy link
Contributor

@anagainaru anagainaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read about Singleton yesterday to make sure I understand what you are doing here and this code is correct, but I have a few comments that I want to check with you.

  1. The only reason you need the mutex is because you need the first variable to be true only for the thread that creates the singleton. The recommendation is to create the singleton in a lambda function that also does all the operations that need to be executed only once. The lambda is guaranteed to be called by a single thread. So, my first question is, could you do all the RegisterFormats and CMregister_handler calls in the Singleton? By looking at the code I would say you can. If yes we don't need the mutex.
  2. There is a small memory leak caused by new. Unless there is a reason to return a pointer I would return the singleton by reference.

I wrote a bunch of code to try to understand how the singleton works so if you agree with the two points I can send a draft commit with what I'm thinking and you can modify it or trash it. This being said your code does the job, I just think it might be cleaner with the changes based of what I've read.

@anagainaru
Copy link
Contributor

These are the changes I'm proposing:

Author: anagainaru <ana.gainaru@gmail.com>
Date:   Sat Oct 7 13:17:28 2023 -0400

    Clean the singleton code

diff --git a/source/adios2/toolkit/remote/Remote.cpp b/source/adios2/toolkit/remote/Remote.cpp
index 0fe319942..2adb8bb80 100644
--- a/source/adios2/toolkit/remote/Remote.cpp
+++ b/source/adios2/toolkit/remote/Remote.cpp
@@ -50,15 +50,12 @@ void ReadResponseHandler(CManager cm, CMConnection conn, void *vevent, void *cli
     return;
 };

-void Remote::InitCMData()
+CManagerSingleton& CManagerSingleton::Instance(RemoteCommon::Remote_evpath_state &ev_state)
 {
-    std::lock_guard<std::mutex> lockGuard(m_CMInitMutex);
-    bool first = true;
-    auto CM = CManagerSingleton::Instance(first);
-    ev_state.cm = CM->m_cm;
-    RegisterFormats(ev_state);
-    if (first)
-    {
+    static CManagerSingleton self = [&ev_state] {
+        CManagerSingleton instance;
+        ev_state.cm = instance.m_cm;
+        RegisterFormats(ev_state);
         CMfork_comm_thread(ev_state.cm);
         CMregister_handler(ev_state.OpenResponseFormat, (CMHandlerFunc)OpenResponseHandler,
                            &ev_state);
@@ -68,7 +65,14 @@ void Remote::InitCMData()
                            (CMHandlerFunc)OpenSimpleResponseHandler, &ev_state);
         CMregister_handler(ev_state.ReadResponseFormat, (CMHandlerFunc)ReadResponseHandler,
                            &ev_state);
-    }
+        return instance;
+    } ();
+    return self;
+}
+
+void Remote::InitCMData()
+{
+    (void) CManagerSingleton::Instance(ev_state);
 }

 void Remote::Open(const std::string hostname, const int32_t port, const std::string filename,
diff --git a/source/adios2/toolkit/remote/Remote.h b/source/adios2/toolkit/remote/Remote.h
index 0e39d8964..aaa1b89e9 100644
--- a/source/adios2/toolkit/remote/Remote.h
+++ b/source/adios2/toolkit/remote/Remote.h
@@ -63,34 +63,20 @@ private:
     bool m_Active = false;
 };

+#ifdef ADIOS2_HAVE_SST
 class CManagerSingleton
 {
 public:
-#ifdef ADIOS2_HAVE_SST
+    static CManagerSingleton& Instance(RemoteCommon::Remote_evpath_state &ev_state);
+
+private:
     CManager m_cm = NULL;
-#endif
-    static CManagerSingleton *Instance(bool &first)
-    {
-        static std::mutex init_mutex;
-        const std::lock_guard<std::mutex> lock(init_mutex);
-        static CManagerSingleton *ptr = new CManagerSingleton();
-        static bool internal_first = true;
-        first = internal_first;
-        internal_first = false;
-        return ptr;
-    }
-
-protected:
-#ifdef ADIOS2_HAVE_SST
+
     CManagerSingleton() { m_cm = CManager_create(); }

     ~CManagerSingleton() { CManager_close(m_cm); }
-#else
-    CManagerSingleton() {}
-
-    ~CManagerSingleton() {}
-#endif
 };
+#endif

 } // end namespace adios2

@eisenhauer
Copy link
Member Author

Hmm. I tend to agree with your comments. I had been following my original thinking, where my focus was on making sure we shared one CM per process, so I put it into the singleton. That was fine with the Get() level stuff, but the transports being threaded wasn't something I had thought about in advance and I ran into data races. Returning the first flag was my reaction, but probably not best. Let me re-evaluate Monday...

@anagainaru
Copy link
Contributor

anagainaru commented Oct 8, 2023

I went down a C++ rabbit hole and I have a better idea. There is std::call_once where we can do all of CManager create and hander register calls, and we call in it std::atexit where we call CManager_close. We can discuss tomorrow.

@eisenhauer
Copy link
Member Author

I went down a C++ rabbit hole and I have a better idea. There is std::call_once where we can do all of CManager create and hander register calls, and we call in it std::atexit where we call CManager_close. We can discuss tomorrow.

Well, I've rewritten and done this without std::atexit. Normal C++ shutdown calls the destructor for the singleton, so there's no need for atexit. BUT, two builds, ascent-nvhpc and el8-icc-ompi, are failing. I could ignore the former as an anomoly, but I think I have to dig into the latter. Both may be failing because the server is dying, but since that's run as a background ctest fixture, there's no indication if it dies.

Anyhow, you might look at the code as it stands and see if you like it better. I may not get to the failures today.

Copy link
Contributor

@anagainaru anagainaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is why some builds fail but see below for a possible explanation.

ev_state = instance.internalEvState;
return instance;
}();
ev_state = self.internalEvState;
Copy link
Contributor

@anagainaru anagainaru Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a thread might reach this line before the thread calling the lambda finished in which case there will be a thread with an invalid internal ev state. If this is the thread that ends up calling std::call_once we will have a problem. Is there a reason why you want the CMregister_handler calls outside the singleton?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state should be complete once we have registered the formats. We don't need the handlers to have been registered until we get messages arriving. (I had them outside of the Singleton because they weren't visible inside it. I could have added the externs before it, but I don't think it should be necessary.). My understanding (which may be flawed), is that the singleton creation is thread safe, so anyone that gets it will get it after its constructor has been completed. So, the formats should have been registered and the internalEvState fully populated. So, the ev_state in every Remote instance should be appropriately populated. One of them will register the handlers, and from what I read of the call_once docs, there will be no race conditions. I.E. if a thread invokes the call_once, it will only proceed past that once someone has completed the call, even if it's some other thread doing it. I don't think there are holes here, but I can't explain the failures. Will be trying to get it into a docker image to see if I can get more info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it wasn't the internal state thing. Instead icc seems not agree that this pattern produces a singleton. Specifically, the constructor gets called (creating the CManager), then the destructor gets called (closing the CManager), then we start registering handlers but get a segfault because the CManager has been closed and deallocated. I'm guessing that the other failure, from nvhpc is from the same problem. The gcc-based compilers seem happy, but not the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, the internal state is guaranteed to be created when threads reach that line. Let me play with godbolt a bit to look at the assembly created by icc. I am very surprised only gcc likes this.

@eisenhauer eisenhauer enabled auto-merge (squash) October 14, 2023 01:26
Copy link
Contributor

@anagainaru anagainaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why the lambda doesn't behave like a mutex for icc/nvhpc but I'm giving up. Let's merge it like this and if I feel like spending some time on understanding this we can revisit in the future.

@eisenhauer eisenhauer merged commit 107164d into ornladios:master Oct 14, 2023
33 checks passed
@eisenhauer eisenhauer deleted the TestFileRemote branch October 14, 2023 11:18
dmitry-ganyushin added a commit to dmitry-ganyushin/ADIOS2 that referenced this pull request Nov 1, 2023
* master: (183 commits)
  Adding tests for writing null blocks with and without compression
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly.
  Merge pull request ornladios#3805 from pnorbert/fix-bpls-string-scalar
  Merge pull request ornladios#3804 from pnorbert/fix-aws-version
  Merge pull request ornladios#3759 from pnorbert/bp5dbg-metadata
  new attempt to commit query support of local array. (ornladios#3868)
  MPI::MPI_Fortran should be INTERFACE not PUBLIC
  Fix hip example compilation error (ornladios#3865)
  Server Improvements (ornladios#3862)
  ascent,ci: remove unshallow flag
  Remove Slack as a contact mechanism (ornladios#3866)
  bug fix:  syntax error in json  output (ornladios#3857)
  Update the bpWriterReadHip example's cmake to run on crusher
  Examples: Use BPFile instead of BP3/4/5 for future-proof
  inlineMWE example: Close files at the end
  Examples: Add BeginStep/EndStep wherever it was missing
  BP5Serializer: handle local variables that use operators (ornladios#3859)
  Blosc2 USE ON: Fix Module Fallback (ornladios#3774)
  SST,MPI,DP: soft handle peer error
  SST,MPI,DP: improve uniq identifier
  Fix destdir install test (ornladios#3850)
  cmake: include ctest before detectoptions
  ci: enable tau check
  Add/Improve the ReadMe.md files in examples directory
  Disable BUILD_TESTING and ADIOS2_BUILD_EXAMPLES by default
  Remove testing based on ADIOS2-examples
  Fix formatting issue in DetectOptions.cmake
  Add examples from ADIOS2-Examples
  Improve existing examples
  MPI_DP: do not call MPI_Init (ornladios#3847)
  cmake: update minimum cmake to 3.12 (ornladios#3849)
  MPI: add timeout for conf test for MPI_DP (ornladios#3848)
  Tweak Remote class and test multi-threaded file remote access (ornladios#3834)
  Add prototype testing of remote functionality (ornladios#3830)
  Try always using the MPI version
  Try always using the MPI version
  Import tests from bp to staging common, implement memory selection in SST
  ci: fix codeql ignore path (ornladios#3772)
  install: export adios2 device variables (ornladios#3819)
  added support to query BP5 files (ornladios#3809)
  ffs 2023-09-19 (67e411c0)
  Fix abs/rel step in BP5 DoCount
  fix dummy Win build
  Pass Array Order of reader to remote server for proper Get() operation
  Fix the ADIOS_USE_{_} variable names to use ADIOS2
  Remove BP5 BetweenStepPairs variable that hides Engine.h counterpart
  Delete experimental examples
  yaml-cpp: support 0.8.0 version
  Mod not to overload to make some compilers happy
  Mod to keep MinBlocksInfo private to BP5Writer, not reuse Engine.h API.
  gha,ci: update checkout to v4
  Limit testing changes to BP5
  tests: fix windows 64bit type issues
  Add location of hdf5.dll to PATH for windows tests
  Fix more compiler warnings from ompi build
  clang format
  Fix size_t -> int conversion warnings
  Allow building with Visual Studio and HDF5 shared libs
  Fixup local var reading by block with test, master branch
  use documented initial cache variable to find hdf5 on windows
  ci: Add HDF5 to a windows build
  Remove unused SelectionType values
  Update ADIOS2 HDF5 VOL with basic set of capability flags
  defines
  reorder
  Remove file before writing, catch exceptions
  format
  Reader-side Profiling
  removed  a comment
  ci: disable MGARD static build
  cmake: fix ffs dependency
  removed comments and cleaned up more code
  bp5: remove ADIOS2_USE_BP5 option
  operators: fix module library
  ci: Create static minimal build
  cmake: correct prefer_shared_blosc behavior
  removing warning from auto
  clang-format
  match type of timestep for h5 engine to size_t (same as adios VariableBase class) when storing to hdf5, check the limit of size_t and match to the right type (uint, ulong, default ulonglong)
  BP5Deserialize: modify changes to keep abi compt
  Rename test. fix windows compile errors.
  add codeql workflow
  Fix ChunkV maintaining CurOffset when downsizing current chunk in Allocate() and creating the next chunk.
  ci: fix docker images names
  removed kwsys gitattributes
  KWSys 2023-05-25 (c9f0da47)
  fixup! ci: add mgard dependency to spack builds
  ci: remove unused script part
  cmake: remove redundant pugixml dependency
  ci: add mgard dependency to spack builds
  cmake: Remove enet config warning
  cmake: resolve cmake python deprecation
  enet 2023-08-15 (7587eb15)
  EVPath 2023-08-15 (657c7fa4)
  dill 2023-08-15 (235dadf0)
  atl 2023-08-15 (7286dd11)
  remove data.py since BP5 data file has no format
  format
  bp5dbg parse records and check sizes during it for mmd.0 and md.0. No decoding FFS records.
  cmake: correct info.h installation path
  ...
vicentebolea added a commit to vicentebolea/ADIOS2 that referenced this pull request Nov 14, 2023
pnorbert added a commit to pnorbert/ADIOS2 that referenced this pull request Nov 20, 2023
* master: (126 commits)
  ReadMe.md: Mention 2.9.2 release
  Cleanup server output a bit (ornladios#3914)
  ci: set openmpi and openmp params
  Example using Kokkos buffers with SST
  Changes to MallocV to take into consideration the memory space of a variable
  Change install directory of Gray scott files again
  ci,crusher: increase supported num branches
  ci: add shellcheck coverage to source and testing
  Change install directory of Gray scott files
  Only rank 0 should print the initialization message in perfstub
  Defining and computing derived variables (ornladios#3816)
  Add Remote "-status" command to see if a server is running and where (ornladios#3911)
  examples,hip: use find_package(hip) once in proj
  Add Steps Tutorial
  Add Operators Tutorial
  Add Attributes Tutorial
  Add Variables Tutorial
  Add Hello World Tutorial
  Add Tutorials' Download and Build section
  Add Tutorials' Overview section
  Improve bpStepsWriteRead* examples
  Rename bpSZ to bpOperatorSZWriter
  Convert bpAttributeWriter to bpAttributeWriteRead
  Improve bpWriter/bpReader examples
  Close file after reading for hello-world.py
  Fix names of functions in engine
  Fix formatting warnings
  Add dataspaces.rst in the list of engines
  Add query.rst
  cmake: find threads package first
  docs: update new_release.md
  Bump version to v2.9.2
  ci: update number of task for mpich build
  clang-format: Correct format to old style
  Merge pull request ornladios#3878 from anagainaru/test-null-blocks
  Merge pull request ornladios#3588 from vicentebolea/fix-mpi-dp
  Adding tests for writing null blocks with and without compression
  bp5: make RecMap an static anon namespaced var
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly.
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly. (ornladios#3877)
  Fix data length calculation for hash (ornladios#3875)
  Merge pull request ornladios#3823 from eisenhauer/SstMemSel
  Merge pull request ornladios#3805 from pnorbert/fix-bpls-string-scalar
  Merge pull request ornladios#3804 from pnorbert/fix-aws-version
  Merge pull request ornladios#3759 from pnorbert/bp5dbg-metadata
  new attempt to commit query support of local array. (ornladios#3868)
  MPI::MPI_Fortran should be INTERFACE not PUBLIC
  Fix hip example compilation error (ornladios#3865)
  Server Improvements (ornladios#3862)
  ascent,ci: remove unshallow flag
  Remove Slack as a contact mechanism (ornladios#3866)
  bug fix:  syntax error in json  output (ornladios#3857)
  Update the bpWriterReadHip example's cmake to run on crusher
  Examples: Use BPFile instead of BP3/4/5 for future-proof
  inlineMWE example: Close files at the end
  Examples: Add BeginStep/EndStep wherever it was missing
  BP5Serializer: handle local variables that use operators (ornladios#3859)
  gha,ci: update checkout to v4
  Blosc2 USE ON: Fix Module Fallback
  cmake: correct prefer_shared_blosc behavior
  cmake: correct info.h installation path
  ci: disable MGARD static build
  operators: fix module library
  ci: add downloads readthedocs
  cmake: Add Blosc2 2.10.1 compatibility.
  Blosc2 USE ON: Fix Module Fallback (ornladios#3774)
  Fix destdir install test (ornladios#3850)
  cmake: update minimum cmake to 3.12 (ornladios#3849)
  MPI: add timeout for conf test for MPI_DP (ornladios#3848)
  MPI_DP: do not call MPI_Init (ornladios#3847)
  install: export adios2 device variables (ornladios#3819)
  Merge pull request ornladios#3799 from vicentebolea/support-new-yaml-cpp
  Merge pull request ornladios#3737 from vicentebolea/fix-evpath-plugins-path
  SST,MPI,DP: soft handle peer error
  SST,MPI,DP: improve uniq identifier
  Fix destdir install test (ornladios#3850)
  cmake: include ctest before detectoptions
  ci: enable tau check
  Add/Improve the ReadMe.md files in examples directory
  Disable BUILD_TESTING and ADIOS2_BUILD_EXAMPLES by default
  Remove testing based on ADIOS2-examples
  Fix formatting issue in DetectOptions.cmake
  Add examples from ADIOS2-Examples
  Improve existing examples
  MPI_DP: do not call MPI_Init (ornladios#3847)
  cmake: update minimum cmake to 3.12 (ornladios#3849)
  MPI: add timeout for conf test for MPI_DP (ornladios#3848)
  Tweak Remote class and test multi-threaded file remote access (ornladios#3834)
  Add prototype testing of remote functionality (ornladios#3830)
  Try always using the MPI version
  Try always using the MPI version
  Import tests from bp to staging common, implement memory selection in SST
  ci: fix codeql ignore path (ornladios#3772)
  install: export adios2 device variables (ornladios#3819)
  added support to query BP5 files (ornladios#3809)
  Partial FFS Upstream, only changes to type_id
  ffs 2023-09-19 (67e411c0)
  Fix abs/rel step in BP5 DoCount
  fix dummy Win build
  Pass Array Order of reader to remote server for proper Get() operation
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants