New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBase Storage #274

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
7 participants
@elliottneilclark
Contributor

elliottneilclark commented Jul 15, 2013

Here's the start of an HBase storage module. It's still in pretty early form but I wanted to get this up so that reviews could start early.

Tables

  • zipkin.traces
  • zipkin.duration
  • zipkin.idxService
  • zipkin.idxServiceSpanName
  • zipkin.idxServiceAnnotation
  • zipkin.topAnnotations
  • zipkin.dependencies
  • zipkin.mappings
  • zipkin.idGen

Internals

I created a non-blocking futures based version of HBase's HTable. This HBaseTable uses a thread pool to execute the calls to HBase.

Currently it's using HBase 0.94.9 and hadoop 1

Row Keys Construction

Whenever possible I am using a fixed length row key. That necessitates a set id for any variable length strings that can be on the row key. This mapping is created using the mapping table and the id gen table. The idGen table holds the last idGenerated.

  • There can be up to Long.MaxValue services
  • Each service can have Long.MaxValue spanNames and annotations

The mapping table holds the mapping of Array[Byte] to id and the reverse.

Still ToDo

  • counters, counters, more counters
  • Allow this to build against current hbase trunk
@mosesn

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@mosesn

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@mosesn

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@mosesn

This comment has been minimized.

Show comment
Hide comment
@mosesn

mosesn Jul 15, 2013

Contributor

this is pretty neat! have you found that hbase is fast enough to keep a fairly snappy web ui?

Contributor

mosesn commented Jul 15, 2013

this is pretty neat! have you found that hbase is fast enough to keep a fairly snappy web ui?

@mosesn

View changes

Show outdated Hide outdated project/Project.scala
@mosesn

View changes

Show outdated Hide outdated project/Project.scala
@mosesn

View changes

Show outdated Hide outdated zipkin-hbase/src/main/scala/com/twitter/hbase/HBaseTable.scala
@mosesn

View changes

Show outdated Hide outdated zipkin-hbase/src/main/scala/com/twitter/hbase/HBaseTable.scala
@mosesn

View changes

Show outdated Hide outdated zipkin-hbase/src/main/scala/com/twitter/hbase/HBaseTable.scala
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 15, 2013

Contributor

this is pretty neat! have you found that hbase is fast enough to keep a fairly snappy web ui?

Yep HBase is happily running as a backend server for lots of different web ui's. I haven't finished this part yet, but it should be plenty fast enough and scalable.

Contributor

elliottneilclark commented Jul 15, 2013

this is pretty neat! have you found that hbase is fast enough to keep a fairly snappy web ui?

Yep HBase is happily running as a backend server for lots of different web ui's. I haven't finished this part yet, but it should be plenty fast enough and scalable.

@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 15, 2013

Contributor

I changed the hbase version to a released version so it should be good to go on travis-ci now.

Contributor

elliottneilclark commented Jul 15, 2013

I changed the hbase version to a released version so it should be good to go on travis-ci now.

@dvryaboy

This comment has been minimized.

Show comment
Hide comment
@dvryaboy

dvryaboy Jul 16, 2013

@ghelmling you and Chris might be interested :)

dvryaboy commented Jul 16, 2013

@ghelmling you and Chris might be interested :)

@ghelmling

This comment has been minimized.

Show comment
Hide comment
@ghelmling

ghelmling Jul 16, 2013

Pretty cool, thanks for the pointer!

On Mon, Jul 15, 2013 at 6:12 PM, dvryaboy notifications@github.com wrote:

@ghelmling https://github.com/ghelmling you and Chris might be
interested :)


Reply to this email directly or view it on GitHubhttps://github.com/openzipkin/zipkin/pull/274#issuecomment-21015278
.

ghelmling commented Jul 16, 2013

Pretty cool, thanks for the pointer!

On Mon, Jul 15, 2013 at 6:12 PM, dvryaboy notifications@github.com wrote:

@ghelmling https://github.com/ghelmling you and Chris might be
interested :)


Reply to this email directly or view it on GitHubhttps://github.com/openzipkin/zipkin/pull/274#issuecomment-21015278
.

@ghelmling

This comment has been minimized.

Show comment
Hide comment
@ghelmling

ghelmling Jul 16, 2013

I think the Travis CI failure is due to build environment (MiniDFSCluster expects a umask of 022). Funny enough, I just answered another question on this earlier today:
http://stackoverflow.com/questions/17625938/hbase-minidfscluster-java-fails-in-certain-environments/17665439#17665439

For another project (hRaven), we had to modify the .travis.yml file to ensure umask was set to "022" prior to the test run by adding the line:

script: umask 0022 && mvn test

ghelmling commented Jul 16, 2013

I think the Travis CI failure is due to build environment (MiniDFSCluster expects a umask of 022). Funny enough, I just answered another question on this earlier today:
http://stackoverflow.com/questions/17625938/hbase-minidfscluster-java-fails-in-certain-environments/17665439#17665439

For another project (hRaven), we had to modify the .travis.yml file to ensure umask was set to "022" prior to the test run by adding the line:

script: umask 0022 && mvn test
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 16, 2013

Contributor

For another project (hRaven), we had to modify the .travis.yml file to ensure umask was set to "022" prior to the test run

Awesome thanks. That just saved me a whole lot of debugging time. I'll add that in the next push.

Contributor

elliottneilclark commented Jul 16, 2013

For another project (hRaven), we had to modify the .travis.yml file to ensure umask was set to "022" prior to the test run

Awesome thanks. That just saved me a whole lot of debugging time. I'll add that in the next push.

@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 16, 2013

Contributor

So I pushed out more commits. Mostly this is just so that everyone can see the idea behind how the indexes will look.

I also added some scaladoc comments. There are still a lot of places that need more (They're coming I promise).

Thanks everyone for all of the reviews already.

Contributor

elliottneilclark commented Jul 16, 2013

So I pushed out more commits. Mostly this is just so that everyone can see the idea behind how the indexes will look.

I also added some scaladoc comments. There are still a lot of places that need more (They're coming I promise).

Thanks everyone for all of the reviews already.

@bmdhacks

View changes

Show outdated Hide outdated zipkin-common/src/main/scala/com/twitter/zipkin/common/Span.scala
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 19, 2013

Contributor

The latest push runs the web ui and is pretty well complete.

I think there's still a timestamp bug and indexing annotation values.

Contributor

elliottneilclark commented Jul 19, 2013

The latest push runs the web ui and is pretty well complete.

I think there's still a timestamp bug and indexing annotation values.

@bmdhacks

This comment has been minimized.

Show comment
Hide comment
@bmdhacks

bmdhacks Jul 19, 2013

Contributor

It'd be nice to get an aggregates store in there too. I've been going over the code, I'll give you some more review feedback next week. This looks really good. We might even try using this at twitter.

Contributor

bmdhacks commented Jul 19, 2013

It'd be nice to get an aggregates store in there too. I've been going over the code, I'll give you some more review feedback next week. This looks really good. We might even try using this at twitter.

@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 19, 2013

Contributor

It'd be nice to get an aggregates store in there too.

I'll give that a shot once I have some more tests and a lot more polish on the code.

I'll give you some more review feedback next week.

Awesome, thanks for your help.

Contributor

elliottneilclark commented Jul 19, 2013

It'd be nice to get an aggregates store in there too.

I'll give that a shot once I have some more tests and a lot more polish on the code.

I'll give you some more review feedback next week.

Awesome, thanks for your help.

@bmdhacks

This comment has been minimized.

Show comment
Hide comment
@bmdhacks

bmdhacks Jul 19, 2013

Contributor

Yeah, I'm actually fixing up the aggregate store tests anyways, so feel free to wait a bit.
Also, I need to to sit down with Chris Trezzo to learn about hbase schema design to make sure it's good. Really stoked on this contribution though and I'm chomping to get some more time to help out with it.

Contributor

bmdhacks commented Jul 19, 2013

Yeah, I'm actually fixing up the aggregate store tests anyways, so feel free to wait a bit.
Also, I need to to sit down with Chris Trezzo to learn about hbase schema design to make sure it's good. Really stoked on this contribution though and I'm chomping to get some more time to help out with it.

@franklinhu

View changes

Show outdated Hide outdated zipkin-hbase/src/main/scala/com/twitter/hbase/HBaseTable.scala
@franklinhu

View changes

Show outdated Hide outdated zipkin-hbase/src/main/scala/com/twitter/hbase/HBaseTable.scala
@franklinhu

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@franklinhu

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@franklinhu

View changes

Show outdated Hide outdated ...n/scala/com/twitter/zipkin/storage/hbase/mapping/AnnotationMapping.scala
@franklinhu

This comment has been minimized.

Show comment
Hide comment
@franklinhu

franklinhu Jul 21, 2013

Contributor

Gave some cursory comments, but will sit down and do a more in-depth review in the near future.
Also, slight style nit: spaces after colons for type declarations

myThing: Array[Byte]

rather than

myThing:Array[Byte]
Contributor

franklinhu commented Jul 21, 2013

Gave some cursory comments, but will sit down and do a more in-depth review in the near future.
Also, slight style nit: spaces after colons for type declarations

myThing: Array[Byte]

rather than

myThing:Array[Byte]
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 22, 2013

Contributor

Pushed a new commit that squashed the old ones. It contains:

  • rebase to master.
  • Moved to HTablePool so that the usage inside of HBaseTable is a little clearer on why we're doing that.
  • Changed annotation index to use annotation values
  • Cleaned up most of the source code nits that @franklinhu was talking about. Thought the test code still could have some. I'll address that soon.
Contributor

elliottneilclark commented Jul 22, 2013

Pushed a new commit that squashed the old ones. It contains:

  • rebase to master.
  • Moved to HTablePool so that the usage inside of HBaseTable is a little clearer on why we're doing that.
  • Changed annotation index to use annotation values
  • Cleaned up most of the source code nits that @franklinhu was talking about. Thought the test code still could have some. I'll address that soon.
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 23, 2013

Contributor

Pushed aggregates.

Contributor

elliottneilclark commented Jul 23, 2013

Pushed aggregates.

@franklinhu

View changes

Show outdated Hide outdated ...se/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseAggregates.scala
@franklinhu

View changes

Show outdated Hide outdated ...n-hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseIndex.scala
@franklinhu

View changes

Show outdated Hide outdated ...n-hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseIndex.scala
@franklinhu

View changes

Show outdated Hide outdated ...n-hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseIndex.scala
@franklinhu

View changes

Show outdated Hide outdated ...n-hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseIndex.scala
@franklinhu

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@franklinhu

View changes

Show outdated Hide outdated ...hbase/src/main/scala/com/twitter/zipkin/storage/hbase/HBaseStorage.scala
@franklinhu

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@franklinhu

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@franklinhu

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@franklinhu

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@franklinhu

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@franklinhu

View changes

Show outdated Hide outdated ...main/scala/com/twitter/zipkin/storage/hbase/mapping/ServiceMapping.scala
@franklinhu

View changes

Show outdated Hide outdated ...ain/scala/com/twitter/zipkin/storage/hbase/mapping/SpanNameMapping.scala
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 23, 2013

Contributor

I pushed a new version. The mapping code still has some blocking; keeping some blocking just makes the code much cleaner and allows a fully tail recursive version of the retry loop. In addition it allows me to completely stay off of all thread executors on cache hits.

Contributor

elliottneilclark commented Jul 23, 2013

I pushed a new version. The mapping code still has some blocking; keeping some blocking just makes the code much cleaner and allows a fully tail recursive version of the retry loop. In addition it allows me to completely stay off of all thread executors on cache hits.

@mosesn

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@mosesn

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@mosesn

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@mosesn

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Jul 23, 2013

Contributor

Hmmm not sure what's going on I'll try and work out how to fix the tests. They seem like a test only issue on travis.

Contributor

elliottneilclark commented Jul 23, 2013

Hmmm not sure what's going on I'll try and work out how to fix the tests. They seem like a test only issue on travis.

@bmdhacks

View changes

Show outdated Hide outdated ...ase/src/main/scala/com/twitter/zipkin/storage/hbase/mapping/Mapper.scala
@bmdhacks

This comment has been minimized.

Show comment
Hide comment
@bmdhacks

bmdhacks Aug 1, 2013

Contributor

Oh hey, I merged my dependencies work and messed up the interface. Do you have time to rebase?

Contributor

bmdhacks commented Aug 1, 2013

Oh hey, I merged my dependencies work and messed up the interface. Do you have time to rebase?

@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Aug 1, 2013

Contributor

Sure. Here's a rebased version. It has:

  • New dependencies
  • New version of HBase
  • Better threading
  • Un constrained thread creation is a bad idea with the # of qps that hbase can generate :-)
Contributor

elliottneilclark commented Aug 1, 2013

Sure. Here's a rebased version. It has:

  • New dependencies
  • New version of HBase
  • Better threading
  • Un constrained thread creation is a bad idea with the # of qps that hbase can generate :-)
// Get the normal annotations
val annoFutures = span.annotations.filter { a =>
// skip core annotations since that query can be done by service name/span name anyway 5
!Constants.CoreAnnotations.contains(a.value)

This comment has been minimized.

@bmdhacks

bmdhacks Aug 1, 2013

Contributor

TODO: we added CoreAddress annotations in Isaac's PR so whatever gets merged first we probably want to fix the other one.

@bmdhacks

bmdhacks Aug 1, 2013

Contributor

TODO: we added CoreAddress annotations in Isaac's PR so whatever gets merged first we probably want to fix the other one.

@bmdhacks

View changes

Show outdated Hide outdated ...c/main/scala/com/twitter/zipkin/storage/hbase/utils/ByteBufferUtil.scala
@bmdhacks

View changes

Show outdated Hide outdated .../src/main/scala/com/twitter/zipkin/storage/hbase/utils/IdGenerator.scala
@bmdhacks

View changes

Show outdated Hide outdated ...-hbase/src/main/scala/com/twitter/zipkin/storage/hbase/utils/Retry.scala
@bmdhacks

View changes

Show outdated Hide outdated ...c/main/scala/com/twitter/zipkin/storage/hbase/utils/ThreadProvider.scala
@bmdhacks

This comment has been minimized.

Show comment
Hide comment
@bmdhacks

bmdhacks Aug 2, 2013

Contributor

Ok, done reviewing. Lemme know about my nits, but I'm cool with merging it as-is.

Contributor

bmdhacks commented Aug 2, 2013

Ok, done reviewing. Lemme know about my nits, but I'm cool with merging it as-is.

@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Aug 2, 2013

Contributor

Ok, I think that addresses all of your thoughts.

I added more thread pools so they can spin up faster when load is applied.

Contributor

elliottneilclark commented Aug 2, 2013

Ok, I think that addresses all of your thoughts.

I added more thread pools so they can spin up faster when load is applied.

@bmdhacks

This comment has been minimized.

Show comment
Hide comment
@bmdhacks

bmdhacks Aug 2, 2013

Contributor

Is this good to merge then from your perspective?

Contributor

bmdhacks commented Aug 2, 2013

Is this good to merge then from your perspective?

@bmdhacks

This comment has been minimized.

Show comment
Hide comment
@bmdhacks

bmdhacks Aug 2, 2013

Contributor

Some of the unit tests fail.

Contributor

bmdhacks commented Aug 2, 2013

Some of the unit tests fail.

@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Aug 2, 2013

Contributor

Whoops yeah some debugging stuff got left in there. Lets run this past ci again.

Contributor

elliottneilclark commented Aug 2, 2013

Whoops yeah some debugging stuff got left in there. Lets run this past ci again.

Add HBase backend
* Added zipkin-hbase subproject
* Created HTableFactoryWithExecutor to create HTables with a thread pool.
* Create HBaseTable, an async facade on HTable.
* Created HBaseStorage
* HBase storage puts serialized thrift into the zipkin.storage table
* Created HBaseIndex
* Created HBaseAggregates
* add the ability to globally assign an id to a string in HBase.
* Add hbase command to query and collector
* Changed indexSpanDuration to return Future[Unit].
* Allow compression on table creation
* Speed up tests by limiting tables created.
* Code clean up.
* Cap HBase threads created.
@elliottneilclark

This comment has been minimized.

Show comment
Hide comment
@elliottneilclark

elliottneilclark Aug 2, 2013

Contributor

Yep ready to go in.

Contributor

elliottneilclark commented Aug 2, 2013

Yep ready to go in.

@bmdhacks bmdhacks closed this in 3345014 Aug 5, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment