-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New BUILD_ROOT not being respected #182
Comments
I should mention, this behavior is repeatable. I.e. I can kill the running client and start a new one. First time it runs the job, it has the correct BUILD_ROOT. The next time, it uses the "old" one. Clearly it's being cached somewhere, but not sure where. |
I take it you are just running |
I'm just setting it in the environment: The command I use for the client is:
|
Hmm, I can't seem to reproduce it. Any custom scripts in |
I see it as an error in the first step of the recipe, in particular when it's calling https://femputer.eng.buffalo.edu:8000/job/3/ /tmp/tmpG7CBZ2: line 123: cd: /femptuer/pbauman/civet_build_testing_root: No such file or directory ERROR: exiting with code 1 I grepped through the scripts and I don't see any of them setting [01:06:58][pbauman@femputer:/femputer/pbauman/civet_recipes/scripts][master] $ grep BUILD_ROOT *.sh cleanup_build_testing_dir.sh:#REQUIRED: BUILD_ROOT cleanup_build_testing_dir.sh:SUBDIR=$BUILD_ROOT/$FEMPUTER_BUILD_DIRNAME cleanup_build_testing_dir.sh:cd "$BUILD_ROOT" functions.sh:# Bad things can happen if BUILD_ROOT is not set functions.sh:if [ -z "$BUILD_ROOT" ]; then functions.sh: echo "You need to set BUILD_ROOT" functions.sh: local b="$BUILD_ROOT/" functions.sh: local cwd=${p/#$b/BUILD_ROOT/} functions.sh: printf "Build Root: $BUILD_ROOT\n" functions.sh: export REPO_DIR=$BUILD_ROOT/$APPLICATION_NAME functions.sh: cd "$BUILD_ROOT" make_build_testing_dir.sh:#REQUIRED: BUILD_ROOT make_build_testing_dir.sh:SUBDIR=$BUILD_ROOT/$FEMPUTER_BUILD_DIRNAME run_cmd.sh:REPO_DIR=$BUILD_ROOT/$APPLICATION_NAME I only have one recipe at the moment and it doesn't set Could this be cached somehow on the GitHub side? (I'm sure that's a stupid suggestion.) |
Annnnnnd now the server isn't responding. Will report back in a few mins. |
OK, I guess you probably want |
Correct.
Does the server maybe cache any of the variables anywhere that would be read by the client?
No. I only have one .cfg file in recipes. (https://github.com/ubaceslab/civet_recipes/tree/master/recipes) I've tried the following.
There were no changes to recipe or scripts between steps 7, 8, 9, 10, and 11, just invalidating.
This is from the new terminal started in step 4. |
Yeah, I don't even. I'm getting random behavior with each invalidate. Any clues about what error code -15 could be? The nuclear option is rebooting the server, wiping out the current civet install, and starting from scratch, but it would be nice to try and understand what's causing this behavior. |
Nothing seems wrong with your environment. |
I put a print statement at the top of functions.sh and it looks like /femptuer/pbauman/civet_build_testing_root Date: Mon Dec 12 16:16:56 EST 2016 Machine: femputer.eng.buffalo.edu LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.8 (Final) Release: 6.8 Codename: Final Cannot find MOOSE Package version. Build Root: /femptuer/pbauman/civet_build_testing_root Trigger: Pull request Step: Load GRINS Modules (0) |
I'm using Lmod. Does Lmod do some weird caching business? Is each step in the recipe starting a new environment? |
There must be something funky happening between Lmod and the shell. I'm printing out |
I have seen weird Lmod caching behavior with environment variables but it is usually solved by quitting the current terminal and starting again. I notice that you are loading modules in the scripts. I don't think that will work like you think it should. Each step is run in its own process so it shouldn't carry over (which is why I find it weird that |
Yes, this was really perplexing.
Thanks for the info. I'd started inferring this through my trial and error. :)
I'd planned on controlling the environment with the scripts+module system - any pointers to what I should look at in the |
After rebooting the node and restarting the server and client, everything seems to be working now: I've invalidated twice, I'm not getting -15 exit messages or invalid I guess this is good, but it would be nice to understand really what happened. Is there any chance that screen would be interfering? I was running the CIVET server and client within a screen session. |
I don't see screen could have affected it but then I can't really see how Lmod screwed things up so badly. I bet it has something to do with Lmod. It seems to do some magic under the hood that I haven't gotten around to trying to figure out. Are you going to be using multiple different module configurations? If not then you could use something like what is in Regardless, I will update the wiki on how to setup the |
OK, hopefully I won't make anymore typos. :P Will report in if I ever figure it out (unlikely).
Yes? I guess I'm not quite sure what you mean. I plan to several recipes that will build with different compiler options on the same PR, e.g. dbg vs. opt libMesh. I'll have different modules for the dbg build vs. the opt build. But I can easily just load the different modules in each of those recipes.
OK, thanks, I'll have a look.
Awesome, thanks! |
So you have different modules for debug vs opt? We don't do that much here, we just have different compiler targets or configuration targets. Like |
I updated the wiki for the INL client . Please let me know if there is something that isn't clear. |
Sorry I was slow here (travel+end of semester). This is all clear for me now and my server and client are running smoothly so far. Thanks very much for the very quick and helpful comments and updates! |
Great! Let us know if you run into any problems or need something added. |
So, the first time I started my client, I had a typo in the
BUILD_ROOT
variable. I fixed it, killed the current running client, and restarted a new client with the correct value ofBUILD_ROOT
exported. The first time the client runs my job, it's fine. When I invalidate it to try again (I'm currently debugging my setup, so I'm just invalidating a job on a PR), it's reverting back to the old, mistyped value ofBUILD_ROOT
.Any suggestions? Any other information I can provide? Thanks!
The text was updated successfully, but these errors were encountered: