Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests #5

Merged
merged 1 commit into from Sep 19, 2016

Conversation

Projects
None yet
2 participants
@draganm
Copy link
Contributor

commented Sep 8, 2016

Ginko/Gomega integration tests.
Setup

$ go get github.com/onsi/ginkgo/ginkgo
$ go get github.com/onsi/gomega

Running:

$ ginkgo integration

One test is pending - creating topic returns wrong Content-Type ATM

@draganm draganm force-pushed the draganm:integration_tests branch 2 times, most recently from 1c0095d to 5f6b047 Sep 8, 2016

})

AfterEach(func() {
response.Body.Close()

This comment has been minimized.

Copy link
@bictorman

bictorman Sep 8, 2016

Contributor

usually I use this little function everywhere to errcheck deferred close https://github.com/ninibe/netlog/blob/master/util.go#L10

I should have put it in a package, but I don't want to create a package just for that one, so it's a bit repeated in a few places, you could do the same here.

This comment has been minimized.

Copy link
@draganm

draganm Sep 8, 2016

Author Contributor

tbh. This close failing won't damage anything hence didn't bother with checking the error ...

This comment has been minimized.

Copy link
@bictorman

bictorman Sep 8, 2016

Contributor

sure, here the error doesn’t matter, but the errcheck in travis is not smart enough to figure that, and the build also fails here. In this particular case you can as well do _ = response.Body.Close()

@bictorman

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2016

Looking really good!

And looks like you also found a data race, should I handle it? or do you want to fix it yourself?

FYI I started to create a Go client in this branch which includes some refactoring (message package) that will conflict with this, but should be easy to fix, and this will get merged first anyway.

@draganm

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2016

Re. race ... was actually wondering during your talk too - is the index really necessary? I mean, Segments are more or less of constant size and sorted by time. Since messages have different sizes, binary search is not feasible, but linear scan is. There is a whole rant at Kafka's design page (https://kafka.apache.org/08/design.html) about performance of linear disk reads. Given constant size (c) of a segment then finding the message with a given timestamp would be O(c) which is O(1).
In theory every segment's file name could be the timestamp of the first message in it, so selecting in which file to search is relatively easy - in worst case O(log n).
Would enable us to delete the whole index managing code, meaning that the data race would be gone too :).

@bictorman

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2016

The index has been there since the very beginning, the only reason is that I saw kafka doing it and since it seemed easy to replicate I didn’t question it. Then I started to wonder too, but I left it there assuming they knew what they where doing and over time I started to see the point.

Aside from the faster offset lookup it has some other upsides, like allowing BigLog to be completely format agnostic, and additionally it got the timestamps in it which I think is nice to keep separated from the actual data.

But where “I believe” the index is going to make the real difference is streaming (wip)

While performance is stated as a non goal, scanning one by one is impossible to achieve any meaningful throughput the moment there’s a little bit of network latency.

Streaming works similar to how kafka actually works, the clients says: give me as much as you can up to N offsets and up to X bytes of data. And the servers sends them all at once.

The streamer just needs to look at the index to calculate what it needs to send, and pass a LimitedReader directly over the data file to the ResponseWriter. This is more efficient than loading into memory all requested data before starting to reply, maybe even to just respond with an error.

Notice also that for performance messages are also batched and compressed before being written.

That being said, I’d be curious to see an alternative index-less version and compare the two, but to be a fair comparison all features need to be implemented first.

It’s be also interesting to see how a regular index file instead of a pre-allocated memory-mapped would work.


. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"
"github.com/satori/go.uuid"

This comment has been minimized.

Copy link
@bictorman

bictorman Sep 9, 2016

Contributor

could we reuse the UUID lib that we have already in vendor?
github.com/comail/go-uuid/uuid

it's a mirror of the old UUID lib in google code

@bictorman bictorman referenced this pull request Sep 14, 2016

Closed

segment headers #7

@bictorman

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2016

FYI the data race is now fixed in master.
The scanner test also needs a tiny adjustment due to #4

@draganm draganm force-pushed the draganm:integration_tests branch 2 times, most recently from 4bb18be to adeb591 Sep 19, 2016

basic integration tests
integration tests vendor files

CI build fix

@draganm draganm force-pushed the draganm:integration_tests branch from adeb591 to adedb12 Sep 19, 2016

@draganm

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2016

Thanks, I've fixed the rest of the tests CI should be passing now

@draganm

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2016

Re index. Something like tar format would IMHO remove the need for index ... so every segment file would have format
HEADER->PAYLOAD->HEADER->PAYLOAD ...
HEADER contains meta information about the payload, such as creation time and the length of the payload, making it easy to skip/seek to the next header.

@bictorman

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2016

Great, welcome to the list of contributors :)

index: We kind of have that format already (except the timestamp), from the netlog side. If you want to experiment, it shouldn't be too difficult to implement an alternative streamer without the index and run some benchmarks.

If you decide to do so, I'd recommend you to base it on this branch where I'm slowly implementing a Go client, since there the message format has already been spin off into its own package and you won't have circular dependency problems.

I still think the index is going to perform much better (specially in traditional disks), take into account that you need to inform the client about what you are actually sending in the HTTP headers before you start writing the response. You could use HTTP trailers instead, but I fear that would make developing clients more difficult (many people hasn't even heard of HTTP trailers).

Maybe I'm wrong though. Or maybe with/without index could be a setting.

@bictorman bictorman merged commit 38e96d0 into ninibe:master Sep 19, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.