Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update trunk #6

Merged
merged 175 commits into from
Sep 3, 2019
Merged

update trunk #6

merged 175 commits into from
Sep 3, 2019

Conversation

khaireddine120
Copy link
Owner

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

khaireddine120 and others added 30 commits June 25, 2019 18:22
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
…ker.protocol and inter.broker.listener.name depending on kafka version (#7000)

Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
The purpose here is to leverage static membership information during round robin consumer assignment, because persistent member id could help make the assignment remain the same during rebalance.
The comparison logic is changed to:

1. If member A and member B both have group.instance.id, then compare their group.instance.id
2. If member A has group.instance.id, while member B doesn't, then A < B
3. If both member A and B don't have group.instance.id, compare their member.id

In round robin assignor, we use ephemeral member.id to sort the members in order for assignment. This semantic is not stable and could trigger unnecessary shuffle of tasks. By leveraging group.instance.id the static member assignment shall be persist when satisfying following conditions:

1. number of members remain the same across generation
2. static members' identities persist across generation

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
…6957)

Include group.instance.id in the describe group result for better visibility.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Reviewers: David Arthur <mumrah@gmail.com>
cleanup of some redundant checkstyle
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Followers should cache the log offset metadata for the start offset of each transaction in order to be able to compute the last stable offset without an offset index lookup. This is needed for follower fetching in KIP-392.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
Due to the accidental duplication of `case e: ExecutionException`,
the verification code was not actually running.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, David Arthur <mumrah@gmail.com>
Scala 2.13 support was added to build via #5454. This PR adjusts the code so that
it compiles with 2.11, 2.12 and 2.13.

Changes:
* Add `scala-collection-compat` dependency.
* Import `scala.collection.Seq` in a number of places for consistent behavior between
Scala 2.11, 2.12 and 2.13.
* Remove wildcard imports that were causing the Java classes to have priority over the
Scala ones, related Scala issue: scala/scala#6589.
* Replace parallel collection usage with `Future`. The former is no longer included by
default in the standard library.
* Replace val _: Unit workaround with one that is more concise and works with Scala 2.13
* Replace `filterKeys` with `filter` when we expect a `Map`. `filterKeys` returns a view
that doesn't implement the `Map` trait in Scala 2.13.
* Replace `mapValues` with `map` or add a `toMap` as an additional transformation
when we expect a `Map`. `mapValues` returns a view that doesn't implement the
`Map` trait in Scala 2.13.
* Replace `breakOut` with `iterator` and `to`, `breakOut` was removed in Scala
2.13.
* Replace to() with toMap, toIndexedSeq and toSet
* Replace `mutable.Buffer.--` with `filterNot`.
* ControlException is an abstract class in Scala 2.13.
* Variable arguments can only receive arrays or immutable.Seq in Scala 2.13.
* Use `Factory` instead of `CanBuildFrom` in DecodeJson. `CanBuildFrom` behaves
a bit differently in Scala 2.13 and it's been deprecated. `Factory` has the behavior
we need and it's available via the compat library.
* Fix failing tests due to behavior change in Scala 2.13,
"Map.values.map is not strict in Scala 2.13" (scala/bug#11589).
* Use Java collections instead of Scala ones in StreamResetter (a Java class).
* Adjust CheckpointFile.write to take an `Iterable` instead of `Seq` to avoid
unnecessary collection copies.
* Fix DelayedElectLeader to use a Map instead of Set and avoid `to` call that
doesn't work in Scala 2.13.
* Use unordered map for mapping in SimpleAclAuthorizer, mapping of ordered
maps require an `Ordering` in Scala 2.13 for safety reasons.
* Adapt `ConsumerGroupCommand` to compile with Scala 2.13.
* CoreUtils.min takes an `Iterable` instead of `TraversableOnce`, the latter does
not exist in Scala 2.13.
* Replace `Unit` with `()` in a couple places. Scala 2.13 is stricter when it expects
a value instead of a type.
* Fix bug in CustomQuotaCallbackTest where we did not necessarily set `partitionRatio`
correctly, `forall` can terminate early.
* Add a couple of spotbugs exclusions that are needed by code generated by Scala 2.13
* Remove unused variables, simplify some code and remove procedure syntax in a few
places.
* Remove unused `CoreUtils.JSONEscapeString`.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
…ures

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Gwen Shapira

Closes #6966 from cmccabe/KAFKA-8560
… compatibility test (#7023)

Connect tests were using String version for KafkaService instead of the expected KafkaVersion object. This broke due to recent changes to KafkaVersion. It turns out that the tests with String version were running compatibility tests against `dev` brokers rather than the older broker versions they were expecting to run against. When version was fixed, tests using 0.9.0.1 brokers started failing since new clients are not compatible with 0.9.0.1 brokers. So this PR fixes version parameter and removes the two tests against 0.9.0.1 brokers.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
…7010)

Leaders should make changes to the assignment and the ISR at the same time as part of processing the LeaderAndIsr requests. The leader should also preserve the order of assignment mainly for consistency with the Controller's code and data representation.

Reviewers: Vikas Singh, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
KIP-91 was included in Kafka 2.1.0, so we should mention
`delivery.timeout.ms` in the hint as it's the config that
users would want to change in most cases.

Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
#7028)

`EmbeddedConnectCluster` has the ability to mask system exits to avoid killing the jvm. It appears that the default was intended to be `true`, but is actually `false`. The `maskExitProcedures` method on `EmbeddedConnectCluster.Builder` documents the parameter as:

```
* @param mask if false, exit and halt procedures remain unchanged; true is the default.
```
Because this is not enabled by default as intended, we are seeing some build failures which exit abruptly:
```
17:29:11 Execution failed for task ':connect:runtime:integrationTest'.
17:29:11 > Process 'Gradle Test Executor 25' finished with non-zero exit value 1
```
The culprit often appears to be `ExampleConnectIntegrationTest`, which indeed does not override the default value of `maskExitProcedures`.

Reviewers: Ewen Cheslack-Postava <me@ewencp.org>
…rrides (#7018)

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Corrected language error which was confusing.

Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
…7006)

The patch clarifies the exception message for unknown key versions when loading from the group metadata topic. The patch also makes a trivial change in `KafkaAdminClient` to use `Map.computeIfAbsent`. 

Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
The Pipe.java file should exist within the myapps package directory.

Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
…6937)

Reviewers: Ismael Juma <ismael@juma.me.uk>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
…Bench

Add a "useConfiguredPartitioner" boolean to specify testing with the configured partitioner, rather than overriding the partitioner in the test.

Add a "skipFlush" boolean to specify skipping the flush operation when producing.  This is helpful when testing some scenarios where linger.ms is greater than 0.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
This PR fixes a wrong input stream name in PipeDemo's javadoc.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Jason Gustafson <jason@confluent.io>
Follow on to #6731, this PR adds broker-side support for [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) (fetch from followers). 

Changes:
* All brokers will handle FetchRequest regardless of leadership
* Leaders can compute a preferred replica to return to the client
* New ReplicaSelector interface for determining the preferred replica
* Incremental fetches will include partitions with no records if the preferred replica has been computed
* Adds new JMX to expose the current preferred read replica of a partition in the consumer

Two new conditions were added for completing a delayed fetch. They both relate to communicating the high watermark to followers without waiting for a timeout:
* For regular fetches, if the high watermark changes within a single fetch request 
* For incremental fetch sessions, if the follower's high watermark is lower than the leader

A new JMX attribute `preferred-read-replica` was added to the `kafka.consumer:type=consumer-fetch-manager-metrics,client-id=some-consumer,topic=my-topic,partition=0` object. This was added to support the new system test which verifies that the fetch from follower behavior works end-to-end. This attribute could also be useful in the future when debugging problems with the consumer.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
We recently saw a few failing tests recently due to the static reliance on port 5000. For example:
```
org.apache.kafka.common.KafkaException: Socket server failed to bind to localhost:5000: Address already in use.
	at kafka.network.Acceptor.openServerSocket(SocketServer.scala:605)
	at kafka.network.Acceptor.<init>(SocketServer.scala:481)
	at kafka.network.SocketServer.createAcceptor(SocketServer.scala:253)
	at kafka.network.SocketServer.$anonfun$createControlPlaneAcceptorAndProcessor$1(SocketServer.scala:234)
	at kafka.network.SocketServer.$anonfun$createControlPlaneAcceptorAndProcessor$1$adapted(SocketServer.scala:232)
	at scala.Option.foreach(Option.scala:438)
	at kafka.network.SocketServer.createControlPlaneAcceptorAndProcessor(SocketServer.scala:232)
	at kafka.network.SocketServer.startup(SocketServer.scala:119)
	at kafka.network.SocketServerTest.withTestableServer(SocketServerTest.scala:1139)
	at kafka.network.SocketServerTest.testControlPlaneRequest(SocketServerTest.scala:198)
```
This patch fixes the failing tests to dynamically select the port.

Reviewers: Ismael Juma <ismael@juma.me.uk>
ZkUtils was removed so we don't need this anymore.

Also:
* Fix ZkSecurityMigrator and ReplicaManagerTest not to
reference ZkClient classes.
* Remove references to zkclient in various `log4j.properties`
and `import-control.xml`.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
Users often use the RocksDBConfigSetter to modify parameters such as cache or block size, which must be set through the BlockBasedTableConfig object. Rather than creating a new object in the config setter, however, users should most likely retrieve a reference to the existing one so as to not lose the other defaults (eg the BloomFilter)

There have been notes from the community that it is not obvious this should be done, nor is it immediately clear how to do so. This PR updates the RocksDBConfigSetter docs to hopefully improve things.

I also piggybacked a few minor cleanups in the docs

Reviewers: Kamal Chandraprakash, Jim Galasyn <jim.galasyn@confluent.io>,  Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
brianbushree and others added 29 commits August 19, 2019 17:12
Author: Brian Bushree <bbushree@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #7138 from brianbushree/update-ducktape
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@confluent.io>, Guozhang Wang <guozhang@confluent.io>
… errors (#7176)

This patch fixes a bug in the handling of MESSAGE_TOO_LARGE errors. The large batch is split, the smaller batches are re-added to the accumulator, and the batch is deallocated, but it was not removed from the list of in-flight batches. When the batch was eventually expired from the in-flight batches, the producer would try to deallocate it a second time, causing an error. This patch changes the behavior to correctly remove the batch from the list of in-flight requests.

Reviewers: Luke Stephenson, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
In a KTable context, we should not pass null into a user-supplied serde.

Testing: I verified that the change to the test results in test failures without the patch.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>,
Author: asutosh936 <asutosh.pandya@hotmail.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Vahid Hashemian <vahid.hashemian@gmail.com>, Jason Gustafson <jason@confluent.io>

Closes #7141 from asutosh936/KAFKA-8698
This is the implementation for [KIP-503](https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion)

When deleting a large number of topics, the Controller can get quite bogged down. One problem with this is the lack of visibility into the progress of the Controller. We can look into the ZK path for topics marked for deletion, but in a production environment this is inconvenient. This PR adds a JMX metric `kafka.controller:type=KafkaController,name=TopicsToDeleteCount` to make it easier to see how many topics are being deleted.

Reviewers: Stanislav Kozlovski <stanislav@confluent.io>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Move the error code resetting logic from the onPartitionsRevoked callback into the streamthread directly after we've decided to rejoin the group, since onPartitionsRevoked are not guaranteed to be triggered.

Ran system tests on the originally failed StreamsUpgradeTest 10 times and passed.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Jun Rao <junrao@gmail.com>
)

Changed Connect's `WorkerSourceTask` to capture non-retriable exceptions from the `producer.send(...)` (e.g., authentication or authorization errors) and to fail the connector task when such an error is encountered. Modified the existing unit tests to verify this functionality.

Note that most producer errors are retriable, and Connect will (by default) set up each producer with 1 max in-flight message and infinite retries. This change only affects non-retriable errors.
I've spent quite a bit of time on trying to discover the root cause, with no luck so far. I have been able to reproduce it locally by running the following 100 times:
```
./gradlew connect:runtime:clean connect:runtime:test --tests org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest
```
The `testReconfigConnector` test failed 28% of the time and the others failed 0%. This issue and KAFKA-8661 suggest that `testDeleteConnector` and `testStartTwoConnectors` are also flaky, though I've not seen those tests fail locally.

Because this flakiness is causing issues for the rest of the project, I'm going to temporarily ignore several of the flaky ITs while I continue to investigate:
* `RebalanceSourceConnectorsIntegrationTest.testReconfigConnector`
* `RebalanceSourceConnectorsIntegrationTest.testDeleteConnector`
* `RebalanceSourceConnectorsIntegrationTest.testStartTwoConnectors`

**This should be backported to the `2.3` branch, which is when these integration tests were first added.**

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Ismael Juma

Closes #7237 from rhauch/kafka-8391-temporary
#7242)

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
…ng producers (#7207)

Prior to this change an NPE is raised when calling AssignedTasks.close
under the following conditions:

1. EOS is enabled
2. The task was in a suspended state

The cause for the NPE is that when a clean close is requested for a
StreamTask the StreamTask tries to commit. However, in the suspended
state there is no producer so ultimately an NPE is thrown for the
contained RecordCollector in flush.

The fix put forth in this commit is to have AssignedTasks call
closeSuspended when it knows the underlying StreamTask is suspended.

Note also that this test is quite involved. I could have just tested
that AssignedTasks calls closeSuspended when appropriate, but that is
testing, IMO, a detail of the implementation and doesn't actually verify
we reproduced the original problem as it was described. I feel much more
confident that we are reproducing the behavior - and we can test exactly
the conditions that lead to it - when testing across AssignedTasks and
StreamTask. I believe this is an additional support for the argument of
eventually consolidating the state split across classes.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
* Adds custom provider class to security config 
* Implementation of KIP-492
Reviewers: Sriharsha Chintalapani <sriharsha@apache.org> , Jeff Huang
#7223)

Make offsets immutable to users of RecordCollector.offsets. Fix up an
existing case where offsets could be modified in this way. Add a simple
test to verify offsets cannot be changed externally.

Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
RocksDB metrics are added to the Kafka metrics. For each segmented state store only
one set of metrics is exposed rather than one set of metrics for each segment.

The metrics are not computed yet.

Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
When sending bad records, the Trogdor task will fail if the final record produced is bad. Instead we should catch the exception to allow the task to finish since sending bad records is a valid use case.

Reviewers: Tu V. Tran <tuvtran97@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
…ll (#7152)

Make sure to show the message key, even when the message value is null.

This changes the output of one of the tools. Is the output of the tool considered a public API? Does this need a discussion or a KIP?

Testing: Ran the tool on a compacted topic. Previously, the tool did not show any message keys for tombstone messages (messages where the value is null). Now, the tool shows message keys.

Reviewers: Mickael Maison <mimaison@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
Splits the existing StickyAssignor logic into an AbstractStickyAssignor class, which is extended by the existing (eager) StickyAssignor and by the new CooperativeStickyAssignor which supports incremental cooperative rebalancing.

There is no actual change to the logic -- most methods from StickyAssignor were moved to AbstractStickyAssignor to be shared with CooperativeStickyAssignor, and the abstract MemberData memberData(Subscription) method converts the Subscription to the embedded list of owned partitions for each assignor.

The "generation" logic is left in, however this is always Optional.empty() for the CooperativeStickyAssignor as onPartitionsLost should always be called when a generation is missed.

Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch fixes the quota system test whose JMX tool relies on the existence
of these metrics.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Nikhil Bhatia <nikhil@confluent.io>, Tu V. Tran <tuvtran97@gmail.com>, Ismael Juma <ismael@juma.me.uk>
…erTestHarness (#7255)

- Call `assertNoNonDaemonThreads` in test method instead of tear down method
to avoid situation where parent's class tear down is not invoked.
- Pass the thread prefix in tests that call `assertNoNonDaemonThreads` so that it
works correctly.
- Rename `verifyNonDaemonThreadsStatus` to `assertNoNonDaemonThreads` to
make it clear that it may throw.

Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Currently on commit streams will attempt to delete offsets from repartition topics. However, if a topology does not have any repartition topics, then the recordsToDelete map will be empty.

This PR adds a check that the recordsToDelete is not empty before executing the AdminClient#deleteRecords() method.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
…7253)

Reviewers: cpettitt-confluent <53191309+cpettitt-confluent@users.noreply.github.com>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
As part of commit 4d1ee26 streams
version 2.3.0 test jar was added, but there was a simple typo in the
path that specified the version.

`ducker-ak up` was failing because of that. Fixed that.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
The jmh LRUCacheBenchmark will exhibit an int overflow when run on a fast machine:

```
java.lang.ArrayIndexOutOfBoundsException: Index -3648 out of bounds for length 10000
	at org.apache.kafka.jmh.cache.LRUCacheBenchmark.testCachePerformance(LRUCacheBenchmark.java:70)
	at org.apache.kafka.jmh.cache.generated.LRUCacheBenchmark_testCachePerformance_jmhTest.testCachePerformance_thrpt_jmhStub(LRUCacheBenchmark_testCachePerformance_jmhTest.java:119)
	at org.apache.kafka.jmh.cache.generated.LRUCacheBenchmark_testCachePerformance_jmhTest.testCachePerformance_Throughput(LRUCacheBenchmark_testCachePerformance_jmhTest.java:83)
```

Reviewers: Jason Gustafson <jason@confluent.io>
The tag key for store level metrics specified in StreamsMetricsImpl
is unified with the tag keys on thread and task level.

Reviewers: Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
In case of version probing we would skip the logic for setting cluster / assigned tasks; since these values are initialized as null they are vulnerable to NPE when code changes.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>
…running multiple iterations (#7260)

Fix a bug where ClientCompatibilityFeaturesTest fails when running multiple iterations.

Also, fix a typo in tests/docker/Dockerfile.

Reviewers: Ismael Juma <ismael@juma.me.uk>
New Java Authorizer API and a new out-of-the-box authorizer (AclAuthorizer) that implements the new interface.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Author: teebee <tb@teebee.de>

Reviewers: Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #7140 from teebee/teebee/ssl-principal-mapping-rules-handling
@khaireddine120 khaireddine120 merged commit 535e402 into khaireddine120:trunk Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet