Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatiblity issue: File writting with root 6.32/02 cannot be read back with root 6.10/06 #15964

Open
1 task
wlampl opened this issue Jul 2, 2024 · 14 comments
Open
1 task
Assignees
Labels
bug experiment Affects an experiment / reported by its software & computimng experts

Comments

@wlampl
Copy link

wlampl commented Jul 2, 2024

Check duplicate issues.

  • Checked for duplicates

Description

While trying to update to LCG_106_ATLAS_3 (root 6.32/02) we encountered a test failure. An intermediate file produce with this release could not be read back with an older release (6.10/06, 6.08.06), we encounter a segfault when the file is closed.

Background: ATLAS Trigger simulation of run 2 uses the release that was used for data-taking during run 2.

Reproducer

I copied the intermediate file + reproducer script to /afs/cern.ch/work/w/wlampl/public/ATEAM-1001
The script is quite simple:

from ROOT import TFile
f=TFile.Open("tmp.RDO")
f.ls()
t=f.Get("CollectionTree")
n=t.GetEntries()
for i in range(n):
    s=t.GetEntry(i)
    print(s)
f.Close()

For root versions back to about 6.16.00 it works as expected. Running with 6.08.06 and 6.10.06 (in a centos7 container), I encounter a segfault as the end. A log can be found in /afs/cern.ch/work/w/wlampl/public/ATEAM-1001/log.22.0.0

ROOT version

Writing: 6.32/02
Reading: 6.10/06 or 6.08.06

Installation method

SFT/LCG

Operating system

CentOS7

Additional context

No response

@wlampl wlampl added the bug label Jul 2, 2024
@elmsheus elmsheus added the experiment Affects an experiment / reported by its software & computimng experts label Jul 2, 2024
@Nowakus
Copy link
Contributor

Nowakus commented Jul 2, 2024

Let me add a reproducer where you only need to open the file and try to exit:

% setupATLAS -c centos7 --pwd /afs/cern.ch/work/w/wlampl/public/ATEAM-1001
% asetup Athena,21.0,latest
% root -b tmp.RDO

| Welcome to ROOT 6.08/06 http://root.cern.ch |
Attaching file tmp.RDO as _file0...
Warning in TClass::Init: no dictionary for class ROOT::TIOFeatures is available
(TFile *) 0x29cf190
root [1] .q

*** Break *** segmentation violation
This is the entire stack trace of all threads:

#0 0x00007f6cdd6c560c in waitpid () from /lib64/libc.so.6
#1 0x00007f6cdd642f62 in do_system () from /lib64/libc.so.6
#2 0x00007f6cdecce102 in TUnixSystem::StackTrace() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#3 0x00007f6cdecd061c in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#4
#5 0x0000000001209080 in ?? ()
#6 0x00007f6cdec52005 in TList::FindObject(TObject const*) const () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#7 0x00007f6cdec5237c in TList::Clear(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#8 0x00007f6cdec50a01 in THashTable::Clear(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#9 0x00007f6cdec504dd in THashList::Clear(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#10 0x00007f6cdec9d1a7 in TListOfDataMembers::Unload() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#11 0x00007f6cdec7f2d0 in TClass::SetUnloaded() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#12 0x00007f6cdec4a574 in ROOT::RemoveClass(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#13 0x00007f6cdec9926e in ROOT::TGenericClassInfo::~TGenericClassInfo() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so
#14 0x00007f6cdd639ce9 in __run_exit_handlers () from /lib64/libc.so.6

@jcatmore
Copy link

jcatmore commented Jul 2, 2024

Hi @martamaja10 ,

thanks for looking at this. We see you've assigned @dpiparo but we understand that he's away for a couple of weeks, and ideally we'd like this to be addressed sooner if possible. Is there someone else in the team who could look at this before?

The problem is, this issue prevents us from using LCG106 and so it holds up several developments.

Thanks!

James

@martamaja10
Copy link
Contributor

Hi @jcatmore,

sure, I'll find another person in the team to take a look at this ASAP.

Cheers,
Marta

@pcanal
Copy link
Member

pcanal commented Jul 2, 2024

Most likely backporting this commit: 08b34d7 will fix the problem.

@pcanal
Copy link
Member

pcanal commented Jul 2, 2024

See #15968 and #15969

@jblomer
Copy link
Contributor

jblomer commented Jul 2, 2024

This issue is most likely due to a change that inadvertently broke forward compatibility: #14793

You should have seen this already with 6.30 though. Is there an explanation why 6.30 did not trigger the error?

There are two ways to proceed (if the issue is what we think it is):

  • Backport the fix to 6.10 and 6.08 (as Philippe suggested/submitted)
  • Set the compatibility flag file->SetBit(TFile::k630forwardCompatibility) (see #15006) when you produce the file with 6.32.

The second option would be useful to run at least once to confirm that we identified the right cause.

@Nowakus
Copy link
Contributor

Nowakus commented Jul 2, 2024

Is there any drawback in doing SetBit(TFile::k630forwardCompatibility) for every file we produce now?

@pcanal
Copy link
Member

pcanal commented Jul 2, 2024

Is there any drawback in doing SetBit(TFile::k630forwardCompatibility) for every file we produce now?

The main drawbacks is forgetting to eventually remove it :). The technical drawback is slightly worse and unstable (see for example; #12438) compression.

@jcatmore
Copy link

jcatmore commented Jul 2, 2024

You should have seen this already with 6.30 though. Is there an explanation why 6.30 did not trigger the error?

Just to comment about 6.30: we didn't look at this release apart from to do a compilation test, so indeed, most likely the issue is there as well as per your expectation.

@dpiparo
Copy link
Member

dpiparo commented Jul 13, 2024

Hi. I just wanted to understand whether on the ATLAS side the issue was further investigated

@jchapman-hep
Copy link

We have added a call to SetBit(TFile::k630forwardCompatibility) when writing files that will need to be read by old release branches as part of our standard workflows for earlier LHC runs. This allowed the jobs using older releases to run successfully. This is necessary as the ability to simulate our Trigger is tied to the releases that were being used for data-taking at that time.
We would rather that we didn't have to do this though of course.

@dpiparo
Copy link
Member

dpiparo commented Jul 26, 2024

I am sorry ROOT did not work out of the box in this case. We are really working hard to provide not only backward but also forward compatibility. In this particular situation, it was not possible.

@dpiparo dpiparo closed this as not planned Won't fix, can't repro, duplicate, stale Jul 26, 2024
@jchapman-hep
Copy link

Hi @dpiparo,

We understand why a fix on your side was not possible in this case, but can you confirm that the workaround of reading files in older releases (6.10/06, 6.08.06) will be part of your tests going forward please?
ATLAS will need this feature to be supported for new ROOT versions until such time as we decide to change our support policy for legacy data. (This currently requires Trigger Simulation to be run in the data-taking release from the year in question.)

@pcanal pcanal reopened this Aug 13, 2024
@pcanal pcanal assigned pcanal and unassigned jblomer and dpiparo Aug 13, 2024
@pcanal
Copy link
Member

pcanal commented Aug 13, 2024

On a side note, we back-ported the ability to read the files without the forward compatibility bit to the patch branch for v6.10 and v6.08.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug experiment Affects an experiment / reported by its software & computimng experts
Projects
Development

No branches or pull requests

9 participants