atomic.AddUint32 in UUID generation causing global lock in HTTP handler #6658

daviesalex · 2016-05-18T09:44:53Z

Running a startup on HEAD with a large dataset shows clearly (in perf) that the overwhelming majority of time is spent in atomic.AddUint32. This only appears because of performance improvements in HEAD; this does not appear in 0.13 stable (which spends most of its time in runtime.mapiternext and runtime.indexbytebody). This was introduced in "Replace code.google.com/p/go-uuid with TimeUUID from gocql" (62434fb) at the end of March.

Since I have issues loosing data in HEAD, i've not been able to see if this also is a significant user of CPU once influxd is started, but it seems plausible that it will, particularly under high loads. This code is, I think, in ./uuid/uuid.go and effectively introduces a global lock around anything that wants a UUID:

376bc8ce (Mint          2015-09-18 22:47:15 -0500 54) // FromTime generates a new time based UUID (version 1) as described in
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 55) // RFC 4122. This UUID contains the MAC address of the node that generated
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 56) // the UUID, the given timestamp and a sequence number.
376bc8ce (Mint          2015-09-18 22:47:15 -0500 57) func FromTime(aTime time.Time) UUID {
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 58)   var u UUID
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 59)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 60)   utcTime := aTime.In(time.UTC)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 61)   t := uint64(utcTime.Unix()-timeBase)*10000000 + uint64(utcTime.Nanosecond()/100)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 62)   u[0], u[1], u[2], u[3] = byte(t>>24), byte(t>>16), byte(t>>8), byte(t)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 63)   u[4], u[5] = byte(t>>40), byte(t>>32)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 64)   u[6], u[7] = byte(t>>56)&0x0F, byte(t>>48)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 65)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 66)   clock := atomic.AddUint32(&clockSeq, 1)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 67)   u[8] = byte(clock >> 8)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 68)   u[9] = byte(clock)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 69)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 70)   copy(u[10:], hardwareAddr)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 71)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 72)   u[6] |= 0x10 // set version to 1 (time based uuid)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 73)   u[8] &= 0x3F // clear variant
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 74)   u[8] |= 0x80 // set to IETF variant
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 75)
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 76)   return u
62434fb8 (Matt Robenolt 2015-03-30 08:56:05 -0700 77) }

This is clearly putting a global lock for a good reason (to ensure we dont get duplicate UUIDs within the same nanosecond), but this should be fairly simple to improve while either guaranteeing, or making extremely unlikely, that we have duplicate IDs.

There are various ways to improve this that occur to me; since (AFAIK) nobody cares about the time to the nanosecond for these, one way to do this is to have a bunch of goroutines producing these atomic values, sharing the nanoseconds in each millisecond between them. This would still give timestamps accurate to the ms (or a fraction of ms, whatever we wanted).

The second option would be to remove the node MAC (which isnt adding a lot), and replace it with a random number (per nanosecond), which is (from a quick read of https://www.ietf.org/rfc/rfc4122.txt allowed). This of course would make it technically possible for a duplicate value to appear, but fantastically unlikely (technically, restarting the process today with a different time could also cause this, i'm not actually sure what is more likely!).

Thoughts? This is low hanging fruit for performance optimization I think...

Tagging @jwilder FYI; I found this while testing his PR and @mattrobenolt who I think (?) wrote this code.

Raw perf data (this remains pretty much static during the whole startup). Our dataset is a few TB all on SSD, with loads of CPU cores.

Samples: 4M of event 'cycles', Event count (approx.): 430757131359
  28.53%  influxd                       [.] sync/atomic.AddUint32
  11.64%  influxd                       [.] strings.Index
   5.80%  influxd                       [.] scanblock
   4.72%  [kernel]                      [k] _raw_spin_lock
   4.60%  influxd                       [.] runtime.MSpan_Sweep
   2.33%  influxd                       [.] github.com/influxdata/influxdb/tsdb.(*DatabaseIndex).Series
   1.82%  influxd                       [.] runtime.memmove
   1.76%  influxd                       [.] runtime.aeshashbody
   1.54%  influxd                       [.] runtime.memeqbody
   1.52%  [kernel]                      [k] down_read_trylock
   1.39%  influxd                       [.] runtime.mallocgc
   1.34%  influxd                       [.] runtime.mapaccess1_faststr
   1.22%  influxd                       [.] runtime.deferreturn
   1.18%  influxd                       [.] runtime.newdefer
   1.14%  influxd                       [.] runtime.writebarrierptr
   1.10%  influxd                       [.] runtime.cas64
   1.02%  influxd                       [.] runtime.xchg
   0.95%  influxd                       [.] runtime.readvarint
   0.89%  influxd                       [.] github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).WalkKeys
   0.82%  [kernel]                      [k] page_fault
   0.80%  influxd                       [.] github.com/influxdata/influxdb/tsdb/engine/tsm1.(*indirectIndex).KeyAt
   0.78%  [kernel]                      [k] up_read
   0.76%  influxd                       [.] runtime.findfunc
   0.74%  influxd                       [.] runtime.gentraceback
   0.73%  influxd                       [.] runtime.releasem
   0.69%  influxd                       [.] getfull
   0.66%  influxd                       [.] runtime.freedefer
   0.55%  influxd                       [.] runtime.acquirem
   0.53%  influxd                       [.] runtime.atomicload64
   0.53%  influxd                       [.] runtime.memclr

The text was updated successfully, but these errors were encountered:

e-dard · 2016-05-18T13:49:36Z

@daviesalex this seems a bit strange to me because FromTime is only called from within the HTTP handler in httpd. I know you said you were running HEAD but could you confirm the commit you ran from?

jwilder · 2016-05-18T14:10:06Z

atomic.AddUint32 showing up in startup might be due to lock contention when setting up the index. See: #6250 (comment)

I'm not sure how FromTime could be called at startup. AFAIK, it's only used in the HTTP handler as @e-dard mentioned.

daviesalex · 2016-05-18T14:10:58Z

@e-dard branch jw-compact-fix, latest 0b25541

I do see your point though, its very odd - seems to only be called (through TimeUUID()) in ./services/httpd/handler.go. Its possible that we have a lot of HTTP requests coming in, although i'm pretty sure we saw this during startup.

I'm trying to reproduce on a test box now... not having much joy. This test box does not have the same production HTTP load in, although it does have a similar amount of data. Its doing what i'd expect (spending time in scanblock and runtime.MSpan_Sweep). Still spending 5% of time in this atomic though.

One thing I notice is that even though the DB is not started, the HTTP port is open and accepting connections. Perhaps this is simply a problem with high HTTP loads? (it does effectively add a lock to all HTTP requests in, right?)

Trying to reproduce more...

jwilder · 2016-05-18T14:23:50Z

@daviesalex On that branch, does it prevent the #6652?

daviesalex · 2016-05-18T14:32:31Z

@jwilder yes. No panic, but still missing data.

daviesalex · 2016-05-18T17:51:31Z

Right, so here is a theory about this issue. The node I tested under is under some load from HTTP (not much, certainly compared to our UDP load, but thousands of inserts a minute). The HTTP handler starts listening immediately (so runs during startup). The % was particularly high during startup because the rest of Influx does a relatively poor job of paralleling the start, so the percentage of CPU time achievable by a HTTP handler that will spin up a goroutine for each request is high.

I'm pretty sure the issue we saw is irrelevant to start time... it simply happens when we have significant HTTP in load. This basically serializes HTTP requests that can otherwise be handled in parallel (we have 80 logical cores, so this is more obvious for us than perhaps others).

I think this is a legitimate thing to improve, but dont know enough about the consequences of my suggested changes (particularly #2, which is trivial to code). For us, we dont really use the HTTP handler much and loosing <1 core for this isnt a huge deal, but its silly. For other users, this may be more of a problem.

I renamed the issue to reflect what I (now) think the issue is, but I could still be wrong!

Thoughts?

jwilder · 2017-01-26T23:27:24Z

Closing this since it was related to 0.13/1.0. If it's still an issue in 1.2, let us know and we can re-open.

daviesalex mentioned this issue May 18, 2016

Compaction fixes #6653

Merged

3 tasks

daviesalex changed the title ~~atomic.AddUint32 causing global lock~~ atomic.AddUint32 in UUID generation causing global lock (massively slowing down startup in HEAD) May 18, 2016

daviesalex changed the title ~~atomic.AddUint32 in UUID generation causing global lock (massively slowing down startup in HEAD)~~ atomic.AddUint32 in UUID generation causing global lock (significantly slowing down startup in HEAD) May 18, 2016

e-dard added area/performance performance-testing labels May 18, 2016

e-dard assigned e-dard and jwilder and unassigned e-dard May 18, 2016

daviesalex changed the title ~~atomic.AddUint32 in UUID generation causing global lock (significantly slowing down startup in HEAD)~~ atomic.AddUint32 in UUID generation causing global lock in HTTP handler May 18, 2016

jwilder closed this as completed Jan 26, 2017

russorat removed the area/performance label Feb 16, 2019

russorat unassigned jwilder Feb 16, 2019

russorat added area/performance and removed area/performance labels Feb 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atomic.AddUint32 in UUID generation causing global lock in HTTP handler #6658

atomic.AddUint32 in UUID generation causing global lock in HTTP handler #6658

daviesalex commented May 18, 2016 •

edited

Loading

e-dard commented May 18, 2016 •

edited

Loading

jwilder commented May 18, 2016

daviesalex commented May 18, 2016

jwilder commented May 18, 2016

daviesalex commented May 18, 2016

daviesalex commented May 18, 2016 •

edited

Loading

jwilder commented Jan 26, 2017

atomic.AddUint32 in UUID generation causing global lock in HTTP handler #6658

atomic.AddUint32 in UUID generation causing global lock in HTTP handler #6658

Comments

daviesalex commented May 18, 2016 • edited Loading

e-dard commented May 18, 2016 • edited Loading

jwilder commented May 18, 2016

daviesalex commented May 18, 2016

jwilder commented May 18, 2016

daviesalex commented May 18, 2016

daviesalex commented May 18, 2016 • edited Loading

jwilder commented Jan 26, 2017

daviesalex commented May 18, 2016 •

edited

Loading

e-dard commented May 18, 2016 •

edited

Loading

daviesalex commented May 18, 2016 •

edited

Loading