Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Now: Jenkins is in a weird mood #761

Closed
refack opened this issue Jun 15, 2017 · 25 comments
Closed

Now: Jenkins is in a weird mood #761

refack opened this issue Jun 15, 2017 · 25 comments

Comments

@refack
Copy link
Contributor

refack commented Jun 15, 2017

Saying something about reserving the machines
image

@refack refack changed the title Now: Jenkins is a wierd mood Now: Jenkins is in a weird mood Jun 15, 2017
@refack
Copy link
Contributor Author

refack commented Jun 15, 2017

Ohh the master node is offline:
image

@gibfahn
Copy link
Member

gibfahn commented Jun 15, 2017

I can bring it back online, but we fundamentally need to clean up that disk (or allocate more space).

@gibfahn
Copy link
Member

gibfahn commented Jun 15, 2017

Oh wait, I can't bring it back online:

image

@refack
Copy link
Contributor Author

refack commented Jun 15, 2017

sad emoji

@gibfahn
Copy link
Member

gibfahn commented Jun 15, 2017

@refack Have you tried the node-build IRC channel? You need someone with the nodejs_build_infra ssh key (who can actually get into the machines).

@refack
Copy link
Contributor Author

refack commented Jun 15, 2017

@refack Have you tried the node-build IRC channel? You need someone with the nodejs_build_infra ssh key (who can actually get into the machines).

Yep, and #node-dev (posted here last since I thought maybe someone will get an e-mail)

@rvagg
Copy link
Member

rvagg commented Jun 15, 2017

on it

@rvagg
Copy link
Member

rvagg commented Jun 15, 2017

working again ... but we're using up so much disk, I've even gone back from 7 days to 5 days retention and we're still at 95% disk. We're going to have to switch to block storage for this

@gibfahn
Copy link
Member

gibfahn commented Jun 15, 2017

I've even gone back from 7 days to 5 days retention and we're still at 95% disk.

Do we know what's using this up? Is it just loads of copies of Node repos? Jenkins artifacts? Something else?

@refack
Copy link
Contributor Author

refack commented Jun 15, 2017

Maybe it's time to revisit #739, gate the big jobs (or maybe just PR) on first passing linux one, it's also gives a very quick feedback.

@joaocgreis
Copy link
Member

Found a huge job log file, cctest produced 40Gb of output (stack traces in the beginning). We are back at 50+ Gb free, which is about normal.

@rvagg rvagg closed this as completed Jun 16, 2017
@rvagg
Copy link
Member

rvagg commented Jun 16, 2017

we should move to block storage on DO for this now that we can

@refack
Copy link
Contributor Author

refack commented Jun 16, 2017

@joaocgreis
Copy link
Member

@refack it was node-test-binary-windows 9196 (started to test nodejs/node#13482), probably run until it crashed Jenkins because it is no longer accessible. Output was very similar to your comment there. If this is a pattern it'll be a problem, please let us know if you see it again. Right now, there is no other log file that big on the server.

@jbergstroem
Copy link
Member

@rvagg agere that we should move to block storage. It's just so incredibly annoying knowing that jenkins doesn't compress the build logs and these basically stand for the the majority of the space consumption. A gzip post job would do. Perhaps we even do it manually? There is a jenkins plugin that is supposed to to this for us but it obviously doesn't work.

@joaocgreis
Copy link
Member

@refack
Copy link
Contributor Author

refack commented Jul 9, 2017

The golden snippet:

[ RUN      ] EnvironmentTest.AtExitWithArgument

==== C stack trace ===============================

	lh_node_usage_stats_bio [0x00007FF6CE172151+200673]
	lh_node_usage_stats_bio [0x00007FF6CE16B8DD+173933]
	lh_node_usage_stats_bio [0x00007FF6CE16B6BE+173390]
	lh_node_usage_stats_bio [0x00007FF6CE16B8B3+173891]
	lh_node_usage_stats_bio [0x00007FF6CE16B7B6+173638]
	RtlProcessFlsData [0x00007FFF3790D9F7+295]
	LdrShutdownThread [0x00007FFF3790BA49+73]
	RtlExitUserThread [0x00007FFF379085DE+62]
	FreeLibraryAndExitThread [0x00007FFF34DA2C7C+76]
	FreeLibraryAndExitThread [0x00007FFF35369CDA+10]
	lh_node_usage_stats_bio [0x00007FF6CE160B5D+129517]
	lh_node_usage_stats_bio [0x00007FF6CE160CA5+129845]
	lh_node_usage_stats_bio [0x00007FF6CE160AE0+129392]
	BaseThreadInitThunk [0x00007FFF353613D2+34]
	RtlUserThreadStart [0x00007FFF379054E4+52]

Stack repeats forever.
P.S. @joaocgreis maybe delete the log, it's ~1GB

@refack
Copy link
Contributor Author

refack commented Jul 16, 2017

Ping ping ping.
Happening again.

@refack
Copy link
Contributor Author

refack commented Jul 16, 2017

Well it's not the same: The master node is ok - https://ci.nodejs.org/computer/(master)/
Only one job suck
image
...
image

@refack
Copy link
Contributor Author

refack commented Jul 16, 2017

It probably boils down to https://ci.nodejs.org/job/node-test-binary-arm/ being slow, and the backlog limited to 15 jobs

@Trott
Copy link
Member

Trott commented Jul 16, 2017

@refack Yeah, it's an alarming thing to see that in the interface, but the issue fixes itself if you wait long enough.

I won't start any more C+L jobs unless the backlog gets into single digits. That should hopefully prevent that from happening too much more today at least.

@refack
Copy link
Contributor Author

refack commented Jul 17, 2017

Can we learn something for this? Increase the backlog limit? Make Jenkins shut-up about reserving stuff and just say "Pending"?

@jbergstroem
Copy link
Member

We can increase teh backlog but my experience is that we start failing/bleeding elsewhere :/

@refack
Copy link
Contributor Author

refack commented Jul 18, 2017

And now something new, subjob are finishing but not propagating their status:
image

@refack
Copy link
Contributor Author

refack commented Jul 18, 2017

And this is new... Doing this "Loading" thing, but on "submit" does not create new jobs.
2017-07-17-jenkins

@gibfahn gibfahn mentioned this issue Jul 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants