Skip to content
This repository has been archived by the owner on Aug 11, 2022. It is now read-only.

Random failures when building, no npm-debug.log present #6789

Closed
nullstyle opened this issue Nov 25, 2014 · 21 comments
Closed

Random failures when building, no npm-debug.log present #6789

nullstyle opened this issue Nov 25, 2014 · 21 comments
Labels

Comments

@nullstyle
Copy link

When building from jenkins I often get non-deterministic failures; Click rebuild and it will succeed the next time around. No npm-debug.log ends up getting created, but a sample stdout of a failed run looks like this:

npm WARN package.json fbgiveaway@0.0.1 No repository field.
npm WARN engine cors@2.4.1: wanted: {"node":">=0.10.0","npm":"^1.4"} (current: {"node":"0.10.32","npm":"2.1.9"})
npm WARN engine koa@0.8.1: wanted: {"node":">= 0.11.9"} (current: {"node":"0.10.32","npm":"2.1.9"})

> ws@0.4.32 install /var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/ws
> (node-gyp rebuild 2> builderror.log) || (exit 0)

make: Entering directory `/var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/ws/build'
  CXX(target) Release/obj.target/bufferutil/src/bufferutil.o
  SOLINK_MODULE(target) Release/obj.target/bufferutil.node
  SOLINK_MODULE(target) Release/obj.target/bufferutil.node: Finished
  COPY Release/bufferutil.node
  CXX(target) Release/obj.target/validation/src/validation.o
  SOLINK_MODULE(target) Release/obj.target/validation.node
  SOLINK_MODULE(target) Release/obj.target/validation.node: Finished
  COPY Release/validation.node
make: Leaving directory `/var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/ws/build'
npm WARN engine hawk@0.10.2: wanted: {"node":"0.8.x"} (current: {"node":"0.10.32","npm":"2.1.9"})
npm WARN engine cryptiles@0.1.3: wanted: {"node":"0.8.x"} (current: {"node":"0.10.32","npm":"2.1.9"})
npm WARN engine sntp@0.1.4: wanted: {"node":"0.8.x"} (current: {"node":"0.10.32","npm":"2.1.9"})
npm WARN engine boom@0.3.8: wanted: {"node":"0.8.x"} (current: {"node":"0.10.32","npm":"2.1.9"})
npm WARN engine hoek@0.7.6: wanted: {"node":"0.8.x"} (current: {"node":"0.10.32","npm":"2.1.9"})

> v8flags@1.0.1 install /var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/stellar-payments/node_modules/gulp/node_modules/v8flags
> node fetch.js

flags for v8 3.14.5.9 cached.

> contextify@0.1.9 install /var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/juice/node_modules/jsdom/node_modules/contextify
> node-gyp rebuild

make: Entering directory `/var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/juice/node_modules/jsdom/node_modules/contextify/build'
  CXX(target) Release/obj.target/contextify/src/contextify.o
  SOLINK_MODULE(target) Release/obj.target/contextify.node
  SOLINK_MODULE(target) Release/obj.target/contextify.node: Finished
  COPY Release/contextify.node
make: Leaving directory `/var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/juice/node_modules/jsdom/node_modules/contextify/build'

> ws@0.4.31 install /var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/stellar-lib/node_modules/ws
> (node-gyp rebuild 2> builderror.log) || (exit 0)

make: Entering directory `/var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/stellar-lib/node_modules/ws/build'
  CXX(target) Release/obj.target/bufferutil/src/bufferutil.o
  SOLINK_MODULE(target) Release/obj.target/bufferutil.node
  SOLINK_MODULE(target) Release/obj.target/bufferutil.node: Finished
  COPY Release/bufferutil.node
  CXX(target) Release/obj.target/validation/src/validation.o
  SOLINK_MODULE(target) Release/obj.target/validation.node
  SOLINK_MODULE(target) Release/obj.target/validation.node: Finished
  COPY Release/validation.node
make: Leaving directory `/var/lib/jenkins/jobs/stellar-fbgive/workspace/fbgive/node_modules/stellar-lib/node_modules/ws/build'
npm ERR! cb() never called!

As per https://twitter.com/seldo/status/534888065186414592, WTFFFFFFF

:)

@smikes
Copy link
Contributor

smikes commented Nov 25, 2014

Thanks. That looks nasty.

My first guess is some sort of race condition between building the two dependencies (contextify and ws) that require npm build.

Can you add -d or -dd to your npm command on jenkins? That puts more logging to stderr, useful since the npm-debug.log is not generated (or not captured)

@nullstyle
Copy link
Author

Will do!

@nullstyle
Copy link
Author

Got another random failure. full output is here: https://gist.githubusercontent.com/nullstyle/3c66cb86454e4946cc2f/raw/output.txt

@smikes
Copy link
Contributor

smikes commented Nov 25, 2014

Thanks -- will review this afternoon

On Tue, Nov 25, 2014 at 1:14 PM, Scott Fleckenstein
notifications@github.com wrote:

Got another random failure. full output is here: https://gist.githubusercontent.com/nullstyle/3c66cb86454e4946cc2f/raw/output.txt

Reply to this email directly or view it on GitHub:
#6789 (comment)

@smikes
Copy link
Contributor

smikes commented Nov 25, 2014

Nothing definitive yet.

I do notice that ws gets installed twice -

npm verb linkBins ws@0.4.32
npm verb linkBins ws@0.4.31

Is this necessary? Does the shrinkwrap say to do that? (This shouldn't be causing the problem, but hey, grasping at straws here...)

@othiym23
Copy link
Contributor

The annoying thing about cb() never called is that it's evidence of absence rather than an explicit error. Somewhere in the continuation-passing guts of npm, something failed to call a callback, and so the build is probably incomplete. What does running npm ls on the project after the build bails out show?

The answer is probably in that second build log, but it may also be something that's impossible to reconstruct by a normal human being. It seems like maybe it's something to do with an interaction between the shrinkwrap-driven install and native module builds. I don't have time to look at this right now, but if you two make any progress in getting to the root cause of this, let me know.

@smikes
Copy link
Contributor

smikes commented Nov 26, 2014

So here's a concrete request:

can you add npm list both before and after your failed npm install ? But of course we still want the build to fail after the install fails, so you'd have to trap and then re-throw that failure.

Is your Jenkins install local or hosted? If local, can you set it up so that the directory sticks around long enough for you to play around in it? I'm curious if any of these commands succeed (after the npm install fails once)

npm -d ls
npm -d build
npm -d install

On the one hand, no hurry; but on the other hand, sooner is better --- both because this is a bad build situation for you and it probably indicates a race condition / unreported failure somewhere inside npm, and I'd like to find and fix that.

@theverything
Copy link

I'm having a very similar problem when running npm install.

It happens intermittently and only with packages that use node-gyp.

If I had to guess I think it might have something to do with node-gyp fetching node from https://nodejs.org/dist/. But it is just a guess.

npm info postinstall libxmljs@0.12.0
npm verb unlock done using /root/.npm/_locks/libxmljs-90ce49ff9a08d63c.lock for /app/node_modules/libxmljs
npm verb stack Error: 0-byte tarball
npm verb stack Please run `npm cache clean`
npm verb stack     at /usr/local/lib/node_modules/npm/lib/utils/tar.js:125:14
npm verb stack     at Object.oncomplete (evalmachine.<anonymous>:93:15)
npm verb cwd /app
npm ERR! Linux 3.16.4-tinycore64
npm ERR! argv "node" "/usr/local/bin/npm" "--loglevel=verbose" "install"
npm ERR! node v0.11.14
npm ERR! npm  v2.1.9

npm ERR! 0-byte tarball
npm ERR! Please run `npm cache clean`
npm ERR!
npm ERR! If you need help, you may report this error at:
npm ERR!     <http://github.com/npm/npm/issues>
npm verb exit [ 1, true ]

@smikes
Copy link
Contributor

smikes commented Nov 27, 2014

If I had to guess I think it might have something to do with node-gyp fetching node from https://nodejs.org/dist/. But it is just a guess.

That's a testable theory. What happens if you put

node-gyp install

before the npm install ?

@mlehner616
Copy link

Tried node-gyp install and that seems to be fine but that's not necessarily a testable theory as I'll explain. I've narrowed it down a little bit and have some clues. Hopefully this information is helpful. I will try to get the more detailed logs for you tomorrow.

clue 1

Instead of running npm install I explicitly pulled all the packages from the package.json file and ran
npm install pg
npm install libxmljs
...
All ~30 modules built 100% of the time with exit code 0. Build order didn't seem to have a direct impact. I don't really have a way to confirm this but it seems node-gyp install running outside of npm install could produce different results based on these symptoms.

clue 2

When running just npm install, the build exits with exit code 1 and no debug log immediately after any random node-gyp run (not necessarily the same module and not always the first one).
I'm not sure if node-gyp is the culprit or if it's the next module that's kicking it out but the point at which the build fails is somewhat predictable to the area immediately adjacent to node-gyp.

Jenky work-around

becuase I can't get this app to build at all without this

npm install -g npm-install-missing
npm install || echo "Install exited with non-zero exit code, cleaning up the mess" && echo && npm-install-missing

@smikes
Copy link
Contributor

smikes commented Dec 2, 2014

What I'm wondering about is whether there's a race when multiple node-gyp processes are running at once.

@mlehner616
Copy link

Hmm, that would certainly line up with this. It definitely seems that the
more node-gyp modules being installed the more likely it will happen. The
real fun part is that the fail rate can differ significantly between local
and a server. And sometimes a random build will succeed completely but
definitely if i start removing node-gyp modules, the fail-rate decreases.
If i only have one node-gyp run, i think the issue pretty much goes away,
but i will test this more thoroughly a bit later today.
On Tue, Dec 2, 2014 at 4:17 AM Sam Mikes notifications@github.com wrote:

What I'm wondering about is whether there's a race when multiple node-gyp
processes are running at once.


Reply to this email directly or view it on GitHub
#6789 (comment).

@mlehner616
Copy link

To further complicate things, if I remove all modules but the ones that compile binaries (including some non node-gyp ones) the build completes successfully. I ran about 150 builds with a bash script like this. If I have the full package.json and run builds with the same bash script almost every build fails. A gist of each package.json is below

node v0.11.14
npm v2.1.9

2 package.json files (1 fails consistently ~90% fail rate, 1 passes consistently ~99% success rate):package.json

@nullstyle
Copy link
Author

Hey y'all,

Sorry for the non-responsiveness, was away on a trip. I'll catch back up with this issue today.

@mlehner616
Copy link

Anyone get a chance to take a dive into this issue?

@theverything
Copy link

After updating to npm 2.1.12 the issue went away. I believe the issue was do to 756f3d4

@smikes
Copy link
Contributor

smikes commented Dec 13, 2014

That would be cool if true. @nullstyle, @mlehner616 , can you try updating to 2.1.12 and see if the problem still persists?

you should be able to add

npm i -g npm@latest

in your CI scripts.

@mlehner616
Copy link

@theverything and I were working together on this and determined this
probably fixed the issue. So far, the builds have leveled out.
On Fri, Dec 12, 2014 at 7:19 PM Sam Mikes notifications@github.com wrote:

That would be cool if true. @nullstyle https://github.com/nullstyle,
@mlehner616 https://github.com/mlehner616 , can you try updating to
2.1.12 and see if the problem still persists?

you should be able to add

npm i -g npm@latest

in your CI scripts.


Reply to this email directly or view it on GitHub
#6789 (comment).

@smikes
Copy link
Contributor

smikes commented Dec 19, 2014

@mlehner616 Great to hear that it seems to be fixed.

@nullstyle Since there are a number of people subscribed here, I'd like to close this one - when you get back to being able to check it out, if there are still problems, you can create a new issue. How's that sound? (If we don't hear back within a week, we'll close this one.)

@nullstyle
Copy link
Author

@smikes that sounds fine... we havent seen a failure recently (the repo that it was happening in often hasn't been pushed to much lately).

Thanks!

@othiym23
Copy link
Contributor

Cool! Closing as resolved (for some). If you do run into issues again, @nullstyle, file a new issue and we can pick up the thread again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants