Have a more sensible default for `TFileMerger::fMaxOpenedFiles` #11276

chaen · 2022-08-30T09:07:33Z

Explain what you would like to see improved

By default, hadd/TFileMerger will open the maximum number of allowed opened files.

Lines 61 to 93 in 3dd47b9

    
           //////////////////////////////////////////////////////////////////////////////// 
        
           /// Return the maximum number of allowed opened files minus some wiggle room 
        
           /// for CINT or at least of the standard library (stdio). 
        
           static Int_t R__GetSystemMaxOpenedFiles() 
        
           { 
        
              int maxfiles; 
        
           #ifdef WIN32 
        
              maxfiles = _getmaxstdio(); 
        
           #else 
        
              rlimit filelimit; 
        
              if (getrlimit(RLIMIT_NOFILE,&filelimit)==0) { 
        
                 maxfiles = filelimit.rlim_cur; 
        
              } else { 
        
                 // We could not get the value from getrlimit, let's return a reasonable default. 
        
                 maxfiles = 512; 
        
              } 
        
           #endif 
        
              if (maxfiles > kCintFileNumber) { 
        
                 return maxfiles - kCintFileNumber; 
        
              } else if (maxfiles > 5) { 
        
                 return maxfiles - 5; 
        
              } else { 
        
                 return maxfiles; 
        
              } 
        
           } 
        
           //////////////////////////////////////////////////////////////////////////////// 
        
           /// Create file merger object. 
        
           TFileMerger::TFileMerger(Bool_t isLocal, Bool_t histoOneGo) 
        
                       : fMaxOpenedFiles( R__GetSystemMaxOpenedFiles() ), 
        
                         fLocal(isLocal), fHistoOneGo(histoOneGo)

This seems too large as a default, and even led to problems with sites (https://ggus.eu/?mode=ticket_info&ticket_id=153653)

Optional: share how it could be improved

Have a more sensible default ? 32 ? 64 ?

The text was updated successfully, but these errors were encountered:

pcanal · 2022-09-01T14:02:04Z

The performance of hadd/TFileMerger for some times of files (eg. with histograms) is proportional to the number of batches of files processed. So reducing it the max number of files could significantly decrease performance.

As a compromise, I propose to reduce the max value to 256 (already a factor 4 slower compared to the typical 1024) ?

Would that be okay for your situation?

onnozweers · 2022-09-02T07:48:24Z

256 wouldn't have helped in the mentioned GGUS ticket. There was an input set of 133 files, of which more than 24 were on a single pool with a limit of 24 concurrent transfers.

Another approach could be to detect this situation in hadd and then suggest the use of -n.

pcanal · 2022-09-02T12:24:18Z

I see. So actually this is a case where the limit is not known to the OS/through-getrlimit. How can we detect locally the limit of concurrent transfers for a given server?

pcanal · 2022-09-07T16:14:07Z

We can not access the original ticket. What is the actual hardware that was being accessed? What is mounted as a local -appearing file system or was it being accessed via a remote protocols (i.e the file name were prefixed with root://...).

In first approximation, I don't how we could detect your use case? If you were able to set ulimit locally to whatever limit fit your need, hadd would automatically adjust.

onnozweers · 2022-09-08T07:56:49Z

Hi @pcanal,

The input is a list of 133 files in the format root://x509up_u@xrootd.grid.surfsara.nl//pnfs/grid.sara.nl/data/lhcb/LHCb_USER/lhcb/user/v/username/2021_08/520789/520789382/x24mu__wmomsc_a.root.

The limit is in the dCache storage system (xrootd.grid.surfsara.nl), not on the client side. This limit is there for a reason: it prevents the storage pools from being overloaded with transfers and crashing.

When hadd tries to open all files at once, it tries to read more files concurrently than the limit per dCache storage pool allows. The first files are served, but the rest of the transfers are queued. This means, that they remain open but zero bytes are served, until some of the other transfers finish. But hadd never finishes those because it insists on reading from all files at the same time. So it gets stuck into a deadlock situation.

If hadd would detect this situation (I'm getting data for some files but zero bytes for other files), it would make sense to stop reading all files concurrently, but instead continue reading from all files that it can, close those files, and then receive data for the other files.

If hadd could do that, such a deadlock would be prevented, while performance would still be the maximum available.

Cheers,
Onno

pcanal · 2022-09-21T15:44:21Z

This means, that they remain open but zero bytes are served,
(I'm getting data for some files but zero bytes for other files)

Do you know if the xrootd routine just "hang" in that case or return with request to retry later? If they just hang there is not much I can see doing to detect the case unless there is an xrootd routine that detect/support this case that we could replace the current call with. (and we would need some help to update the xrootd plugin in ROOT to support and test this).

onnozweers · 2022-09-27T20:13:00Z

I'm afraid I don't know answer your question. But I understand the difficulty.

There might be a more pragmatic approach: if reading remote files fails, hadd could quit with the suggestion to try again with the -n parameter. There might be problems to which -n is not the solution, but it takes only a minute to try, so why not suggest it.

chaen added the improvement label Aug 30, 2022

guitargeek assigned pcanal Sep 1, 2022

guitargeek added the in:I/O label Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have a more sensible default for `TFileMerger::fMaxOpenedFiles` #11276

Have a more sensible default for `TFileMerger::fMaxOpenedFiles` #11276

chaen commented Aug 30, 2022

pcanal commented Sep 1, 2022

onnozweers commented Sep 2, 2022

pcanal commented Sep 2, 2022

pcanal commented Sep 7, 2022

onnozweers commented Sep 8, 2022 •

edited

Loading

pcanal commented Sep 21, 2022

onnozweers commented Sep 27, 2022

Have a more sensible default for TFileMerger::fMaxOpenedFiles #11276

Have a more sensible default for TFileMerger::fMaxOpenedFiles #11276

Comments

chaen commented Aug 30, 2022

Explain what you would like to see improved

Optional: share how it could be improved

pcanal commented Sep 1, 2022

onnozweers commented Sep 2, 2022

pcanal commented Sep 2, 2022

pcanal commented Sep 7, 2022

onnozweers commented Sep 8, 2022 • edited Loading

pcanal commented Sep 21, 2022

onnozweers commented Sep 27, 2022

Have a more sensible default for `TFileMerger::fMaxOpenedFiles` #11276

Have a more sensible default for `TFileMerger::fMaxOpenedFiles` #11276

onnozweers commented Sep 8, 2022 •

edited

Loading