Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in CreateGeometry.h in DEV #455

Closed
genevb opened this issue Dec 5, 2022 · 49 comments · Fixed by #462
Closed

Error in CreateGeometry.h in DEV #455

genevb opened this issue Dec 5, 2022 · 49 comments · Fixed by #462
Assignees
Labels
bug Something isn't working

Comments

@genevb
Copy link
Contributor

genevb commented Dec 5, 2022

The following reconstruction chain in DEV (ROOT 5, not ROOT 6):

root4star -b -q -l bfc.C\(10,\"P2021a,StiCA,EbyET0,BEmcChkStat,-hitfilt,SCScalerCal,-mtd,-btof,-etofA,-picoWrite\",\"/star/data03/daq/2021/180/22180043/st_physics_22180043_raw_2000004.daq\"\)

....results in the following error after approximately 1-2 minutes of execution, and after approximately 2120 lines of output.

LoadTable: .L /afs/rhic.bnl.gov/star/packages/DEV/StarDb/AgMLGeometry/Geometry.y2021a.C
Error: cannot open file "StBFChain/StBFChain.h" /afs/rhic.bnl.gov/star/packages/DEV/StarDb/AgMLGeometry/CreateGeometry.h:2:
*** Interpreter error recovered ***
root4star: .sl73_gcc485/OBJ/StRoot/St_db_Maker/St_db_Maker.cxx:932: virtual TDataSet* St_db_Maker::LoadTable(TDataSet*): Assertion `!ee' failed.
Abort

@genevb genevb added the bug Something isn't working label Dec 5, 2022
@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

I should add that this error does not occur in SL22c, so it is likely to do with something merged in the last couple months.

@klendathu2k
Copy link
Contributor

klendathu2k commented Dec 5, 2022 via email

@klendathu2k
Copy link
Contributor

Culprit is PR #448. (Confirmed by wrapping the #include statement with #ifdef CLING). Suspect that ROOT5 / CINT does not have the include search path for StarDb. Easy technical fix.. once I find the right place to make it.

@klendathu2k
Copy link
Contributor

Okay... I shouldn't debug without a cup of coffee handy. StarDb should not be the issue. Include path needs to see StRoot/StBFChain, and as far as I can tell... it should be setup correctly. Investigating...

@plexoos
Copy link
Member

plexoos commented Dec 5, 2022

I can't see this test job in the usual location in /star/u/starreco/devtest/. What am I missing?

Run 22180043 does not show up anywhere on this page https://www.star.bnl.gov/devcgi/weekDEVjobStatus.pl either
But there is a bunch of jobs with Assertion failed status. Is it related? When did this happen?

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

I can't see this test job in the usual location in /star/u/starreco/devtest/. What am I missing?

Run 22180043 does not show up anywhere on this page https://www.star.bnl.gov/devcgi/weekDEVjobStatus.pl either But there is a bunch of jobs with Assertion failed status. Is it related? When did this happen?

This is from calibration work, not a nightly test. A user reported this to me last week, I gave the user suggestions for things to try first, and with the user's continued failures as of this morning, I tried it myself and then reported this issue on their behalf given that I could reproduce the error.

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

Hm....that said, there were also similar failures in the nightly test jobs:

> tail /star/rcf/test/dev/daq_sl302.stica/Fri/year_2021/production_OO_200GeV_2021/st_physics_22134030_raw_4500015.log
StChain:ERROR - St_db_Maker::Time : Unrecognised File name Geometry.y2005c.C IGNORED
StChain:ERROR - St_db_Maker::Time : Unrecognised File name Geometry.y2005d.C IGNORED
StChain:ERROR - St_db_Maker::Time : Unrecognised File name Geometry.y2005e.C IGNORED
StChain:ERROR - St_db_Maker::Time : Unrecognised File name Geometry.y2006a.C IGNORED
StChain:ERROR - St_db_Maker::Time : Unrecognised File name Geometry.y2006b.C IGNORED
StChain:ERROR - St_db_Maker::Time : Unrecognised File name Geometry.y2007a.C IGNORED
Error: cannot open file "StBFChain/StBFChain.h" /afs/rhic.bnl.gov/star/packages/DEV/StarDb/AgMLGeometry/CreateGeometry.h:2:
*** Interpreter error recovered ***
root4star: .sl73_gcc485/obj/StRoot/St_db_Maker/St_db_Maker.cxx:932: virtual TDataSet* St_db_Maker::LoadTable(TDataSet*): Assertion `!ee' failed.
LoadTable: .L /afs/rhic.bnl.gov/star/packages/DEV/StarDb/AgMLGeometry/Geometry.y2021a.C

In fact, the vast majority of nightly test jobs for data from years 2012-2022 have been failing, but I've not been getting reports of the failures from Amol's watcher scripts :-( That's something for me to look into.

@klendathu2k
Copy link
Contributor

Simple (simplest?) macro which demonstrates the problem:

void issue445(const char* simple= "ry2021a agml sdt20210125 nodefault") {                                                                                                                                  
  gROOT->LoadMacro("bfc.C");                                                                                                                                                                               
  bfc(0,simple);                                                                                                                                                                                           
  chain->GetDataBase("VmcGeometry");                                                                                                                                                                       
};      

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

It seems all nightly test jobs for data from 2013 and later are failing. All test jobs for 2011 and earlier are succeeding. The 2012 jobs are a mix of failures for embedding jobs, and successes for real data.

@klendathu2k
Copy link
Contributor

Somewhat expected, Gene. The issue is with loading the StBFChain.h header file in the CreateGeometry.h header... loaded whenever the AgML geometry is invoked. We switched to AgML around the 2012 / 2013 timeframe.

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

I can also confirm that the nightly test job failures began on November 24th, following the merging of #448 .

@klendathu2k
Copy link
Contributor

There is a problem loading the StBFChain/StBFChain.h header file which does not really make sense. The geometry is loaded through St_db_Maker, which ultimately calls

gInterpreter->ProcessLine(".L StarDb/AgMLGeometry/CreateGeometry.h")

In DEV, the rootlogon macro is located under $ROOTSYS/etc/rootlogon.C. It does not contain the recent change that Dmitri made to the StRoot/macros/rootlogon.C to add include paths to the gInterpreter, rather than the gSystem, object. But ... that does not appear to resolve this issue.

void issue445a(const char* simple= "ry2021a agml sdt20210125 nodefault") {                                                                                                                                 
                                                                                                                                                                                                           
  gROOT->LoadMacro("bfc.C");                                                                                                                                                                               
  bfc(0,simple);                                                                                                                                                                                                                                                                                                                                                                                                     
  std::cout << gSystem->GetIncludePath() << std::endl;                                                                                                                                                     
  gInterpreter->AddIncludePath( gSystem->GetIncludePath() );                                                                                                                                               
  std::cout << gInterpreter->GetIncludePath() << std::endl;                                                                                                                                                                                                                                                                                                                                            
  gInterpreter->ProcessLine(".L StarDb/AgMLGeometry/CreateGeometry.h"); // FAIL                                                                                                                            
                                                                                                                                                                                                           
};              

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

Just another observation to add:
In case their was some sort of corruption in the DEV library on AFS, I tried obtaining the code from github and compiling the library from scratch in an empty directory... I still get the same error.

@plexoos
Copy link
Member

plexoos commented Dec 5, 2022

Have you tried to update your rootlogon.C as in https://github.com/star-bnl/star-sw/pull/441/files

@klendathu2k
Copy link
Contributor

Yes. Same result.

@plexoos
Copy link
Member

plexoos commented Dec 5, 2022

How do you point ROOT to your custom rootlogon.C? Care to provide instructions?

@klendathu2k
Copy link
Contributor

There is nothing custom. The standard STAR development environment is being used.

In DEV, the rootlogon macro is located under $ROOTSYS/etc/rootlogon.C.

$ ls $ROOTSYS/etc/rootlogon.C
/afs/rhic.bnl.gov/star/ROOT/5.34.38/.sl73_gcc485/rootdeb/etc/rootlogon.C

$ROOTSYS/etc/rootlogon.C looks for a rootlogon.C in the current working directory and then executes it.

@plexoos
Copy link
Member

plexoos commented Dec 5, 2022

There is nothing standard about $ROOTSYS/etc/rootlogon.C... It is neither a part of ROOT distribution nor it is under our version control.

$ROOTSYS/etc/rootlogon.C looks for a rootlogon.C in the current working directory and then executes it.

And? Is that what you tried? Did you copy $ROOTSYS/etc/rootlogon.C to the CWD and modified according to #441?

@klendathu2k
Copy link
Contributor

$ROOTSYS/etc/rootlogon.C executes all of its commands, then looks for a local rootlogon to execute. In my tests I did the following...

$ cat rootlogon.C
{
gInterpreter->AddIncludePath(" -I.");
gInterpreter->AddIncludePath(" -I./.$STAR_HOST_SYS/include -I./StRoot -I$STAR/.$STAR_HOST_SYS/include -I$STAR/StRoot -I/usr/include/mysql");
gInterpreter->AddIncludePath(" -I$ROOTSYS/include");
}

@plexoos
Copy link
Member

plexoos commented Dec 5, 2022

There is a bug in your rootlogon.C line 3: gInterpreter->AddIncludePath() accepts only one path at a time. And again I refer to #441 to see how it is done

There are actually more bugs... You don't need to prefix the path with "-I"

@klendathu2k
Copy link
Contributor

Okay. That works. Thanks! I guess that CINT never really needed the include paths to be setup properly, so we never discovered the issue.

The $ROOTSYS/etc/rootlogon.C needs to be updated along the lines of #441.

Someone with write access to the logon script (probably @genevb) needs to do the update.

@plexoos
Copy link
Member

plexoos commented Dec 5, 2022

Great! Thanks for the confirmation Jason.

Someone with write access to the logon script (probably @genevb) needs to do the update.

Gene, can you please do that?

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2022

The ROOT installation and any files with it have been maintained by Jerome in the past, but it looks like writing privileges were granted to the current S&C Infrastructure Team for further maintenance:

> cd /afs/rhic.bnl.gov/star/ROOT/36/5.34.38/root/etc/
/afs/rhic.bnl.gov/star/ROOT/36/5.34.38/root/etc
> fs la
Access list for . is
Normal rights:
system:administrators rlidwka
system:anyuser rlk
jeromel rlidwka
smirnovd rlidwk
gnigmat rlidwk
starlib rlidwka

So the changes for ROOT6 require a change to our ROOT5-only files? Yuck. But I think you guys know what you're doing better than I do, and this part has been identified as an Infrastructure Team area anyhow, so better for me not to put my fingers there.

-Gene

@veprbl
Copy link
Member

veprbl commented Dec 5, 2022

$ROOTSYS/etc/rootlogon.C executes all of its commands, then looks for a local rootlogon to execute. In my tests I did the following...

AFAIK the $ROOTSYS/etc/rootlogon.C is enabled using STAR-specific patches, that is not a standard ROOT behavior.

@klendathu2k
Copy link
Contributor

$ROOTSYS/etc/rootlogon.C executes all of its commands, then looks for a local rootlogon to execute. In my tests I did the following...

AFAIK the $ROOTSYS/etc/rootlogon.C is enabled using STAR-specific patches, that is not a standard ROOT behavior.

That is correct. The rootlogon at that location is customized, As is the system.rootrc which points to it.

When I used the word "standard" above, I was referring to STAR practice. We have maintained a rootlogon.C in $ROOTSYS/etc for many years (10+) in order to provide a common setup of root for all STAR users. I make no claim that keeping it in this location is best practice, just stating how we have historically managed a common development and production environment.

I don't understand everything that the rootlogon provides for us. Some things are cosmetic (setting defaults for canvases, histograms, etc...) Some of it is deprecated (QT initialization). And some are essential, such as preloading certain libraries when running root4star.

@plexoos
Copy link
Member

plexoos commented Dec 6, 2022

better for me not to put my fingers there

There is another solution without modifying the rootlogon file in /afs/rhic.bnl.gov/star/ROOT/36/5.34.38/root/etc. You can copy the StRoot/macros/.rootrc file to $HOME/.rootrc Will this work for you?

klendathu2k added a commit to klendathu2k/star-sw-1 that referenced this issue Dec 8, 2022
@klendathu2k
Copy link
Contributor

This would, of course, work for me. But it is a change in our practice of providing a common rootlogon.C which is transparently applied for all STAR users. I'll have a PR tomorrow that I think is a reasonable compromise.

@klendathu2k
Copy link
Contributor

Just realized there might be an issue with my solution, and confirmed with Jerome. We modify in place the $ROOTSYS/etc/rootlogon.C, and do not keep it on github so that it is not publicly accessible.

Can we just fix this in place so that DEV can be repaired / released?

@plexoos
Copy link
Member

plexoos commented Dec 9, 2022

As I commented in #458 I am fine with modifying $ROOTSYS/etc/rootlogon.C as I don't see a good quick fix for this

Although I have write access to some /afs volumes, it appears only Jerome can update the file due to the file permissions:

$ pwd
/afs/rhic.bnl.gov/star/ROOT/36/5.34.38/root/etc
$ ls -l rootlogon.C
-rwxr-xr-x 1 jeromel rhstar 6773 Feb 20  2020 rootlogon.C

@plexoos
Copy link
Member

plexoos commented Dec 9, 2022

there might be an issue with my solution

BTW, what is your solution? Actually, I also could think of something else... We can avoid all this trouble with ROOT "interpreter" if we simply compile the geometry macros in a library :)

@klendathu2k
Copy link
Contributor

klendathu2k commented Dec 9, 2022 via email

klendathu2k added a commit to klendathu2k/star-sw-1 that referenced this issue Dec 9, 2022
The StBFChain dependence supported two features in this macro.
1) It allowed us to name the geometry cache file based on the
   output filename for the chain.
2) It allowed us to detect when Sti (CA or V) was present in
   the chain options.

Item 1 is nice to have but not required.

Given the "assert(0)" in the block of code which detects Sti...
we don't actually use item 2.  (I suspect it was deprecated
at some point).

Therefore we remove the dependency on StBFChain.
plexoos pushed a commit that referenced this issue Dec 10, 2022
…462)

This PR Should resolve #455.

The StBFChain dependence supported two features in this macro. 

1) It allowed us to name the geometry cache file based on the output
filename for the chain.
2) It allowed us to detect when Sti (CA or V) was present in the chain
options.

Item 1 is nice to have but not required.

Given the "assert(0)" in the block of code which detects Sti... we don't
actually use item 2. (I suspect it was deprecated at some point).

Therefore we remove the dependency on StBFChain.
@plexoos
Copy link
Member

plexoos commented Dec 12, 2022

Looks like the fix in #462 helped to address the problem with some test jobs on the farm but there are still jobs with segfaults: https://www.star.bnl.gov/devcgi/weekDEVjobStatus.pl

@genevb
Copy link
Contributor Author

genevb commented Dec 13, 2022 via email

@genevb
Copy link
Contributor Author

genevb commented Dec 13, 2022 via email

@plexoos
Copy link
Member

plexoos commented Dec 13, 2022

Gene, can you change your scripts to build the libraries in /afs/rhic.bnl.gov/star/packages/DEV from a branch other than main? I suggest we create a stable branch (or dev branch if you want) and set it behind main to a commit when all nightly tests worked fine.

#444 was merged 25 days ago and I would be really surprised if the problem with nightly tests has not been notices since then. Another candidate is #448 merged 20 days ago.

Introducing a stable branch will also help with the fast offline processing during the run so, we don't worry about new changes merged into main.

@genevb
Copy link
Contributor Author

genevb commented Dec 13, 2022 via email

@plexoos
Copy link
Member

plexoos commented Dec 14, 2022

Very easy to do. I have the branch name as a variable in the script.

Perfect. So, just pick a name for the branch and let me know if you need help with the setup. If you suspect that the problem appeared after merging PR444 then we can start the new branch at 25274b6 which precedes it.

I also confirm that I was able to run a couple of jobs currently reporting failures manually on the farm without any issue.

The one you mentioned earlier as problematic had no issues last night:

Some crashed on Monday, but not Sunday:
/star/rcf/test/dev/daq_sl302.ittf/Sun/year_2019/production_14p5GeV_2019/st_physics_20112004_raw_2000001.log
/star/rcf/test/dev/daq_sl302.ittf/Mon/year_2019/production_14p5GeV_2019/st_physics_20112004_raw_2000001.log

today's log looks ok:

/star/rcf/test/dev/daq_sl302.ittf/Tue/year_2019/production_14p5GeV_2019/st_physics_20112004_raw_2000001.log

Further, the following two lines seem to appear in all logs including those with success status

StStrangeMuDstMaker:ERROR - TClass::BuildRealData : Cannot find any ShowMembers function for StV0I!
StStrangeMuDstMaker:ERROR - TClass::BuildRealData : Cannot find any ShowMembers function for StXiI!

I went back as far as Sep 22, 2022

@genevb
Copy link
Contributor Author

genevb commented Dec 14, 2022 via email

@plexoos
Copy link
Member

plexoos commented Dec 14, 2022

I can do that for tonight's build of DEV, but if there's any ongoing work using DEV that needs commits since 2022-11-18 (e.g. the sTGC geometry?), that will be impacted tomorrow.

Great. So you will create a branch at 25274b6 and push it. Your scripts will check it out and build libraries. If you can't adopt the test scripts to use the libraries from a path other than DEV then it means the DEV has to be overwritten. Given that the problem is not reproducible, we need this test to assess the situation. But, again, if you can run tests from a different location, any ongoing work using DEV will not be impacted.

@genevb
Copy link
Contributor Author

genevb commented Dec 17, 2022 via email

@plexoos
Copy link
Member

plexoos commented Dec 17, 2022

Thank you Gene for checking. So, it means that likely one or more commits after 25274b6 cause the instability.

As far as I remember, we did not merge anything that has not passed the CI tests. But we do know that the errors could not be easily reproduced and also CI tests only 64-bit builds. Recently, I also noticed some instabilities in the CI tests. For example, it is a complete mystery to me why #467 would cause failures in the test jobs.

plexoos pushed a commit to plexoos/star-sw that referenced this issue Dec 19, 2022
…tar-bnl#462)

This PR Should resolve star-bnl#455.

The StBFChain dependence supported two features in this macro. 

1) It allowed us to name the geometry cache file based on the output
filename for the chain.
2) It allowed us to detect when Sti (CA or V) was present in the chain
options.

Item 1 is nice to have but not required.

Given the "assert(0)" in the block of code which detects Sti... we don't
actually use item 2. (I suspect it was deprecated at some point).

Therefore we remove the dependency on StBFChain.
@plexoos
Copy link
Member

plexoos commented Dec 19, 2022

I investigated this a bit over the weekend and I can see more or less reproducible crashes when the changes from 5a07179 are included. When merging #462 we didn't see this and I believe this may happen in combination with something else. For now I'd suggest to revert 5a07179 and see if it stabilizes the code. Any other suggests are welcome.

@plexoos
Copy link
Member

plexoos commented Dec 20, 2022

Gene, have you switched back to the main branch? If I interpret the Monday status at https://www.star.bnl.gov/devcgi/weekDEVjobStatus.pl

Are you OK with merging #468 tonight?

@genevb
Copy link
Contributor Author

genevb commented Dec 20, 2022 via email

@genevb
Copy link
Contributor Author

genevb commented Dec 20, 2022

OK...DEV is re-compiled with the patch for tonight. We can look at the results in the morning.

@genevb
Copy link
Contributor Author

genevb commented Dec 20, 2022 via email

@plexoos
Copy link
Member

plexoos commented Dec 20, 2022

All pull requests synced with the main head last night passed without a single failure, so I think the fix worked.
It is just that we are back to the original problem with rootlogon/rootrc on the farm which are missing the search paths for headers included in interpreted macros.

Since we have no control over the system wide rootlogon/rootrc, I'll submit a patch hiding the include statements from ROOT5 shortly.

plexoos added a commit that referenced this issue Dec 20, 2022
Proposed fix for Issue #455

It is not clear to me from the tests I've done on issue #455 why
StBFChain/StBFChain.h is not located in the include path by root5 /
cint. Simplest fix is to hide the include directive from CINT.

Co-authored-by: Dmitri Smirnov <dmixsmi@gmail.com>
@plexoos
Copy link
Member

plexoos commented Dec 21, 2022

How do the nightly tests look today?

@genevb
Copy link
Contributor Author

genevb commented Dec 21, 2022

The nightly tests look good to me. 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants