Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantiles2 #1

Open
wants to merge 96 commits into
base: 0.9.2
Choose a base branch
from
Open

Quantiles2 #1

wants to merge 96 commits into from

Conversation

fundead
Copy link
Owner

@fundead fundead commented Dec 7, 2016

No description provided.

pjain1 and others added 30 commits September 23, 2016 16:53
* LRU cache guarantee to keep size under limit

* address comments

* fix failed tests in jdk7
* Add support for timezone in segment granularity

* CR feedback. Handle null timezone during equals check.

* Include timezone in docs.
Add timezone for ArbitraryGranularitySpec.
close kafka consumer in case supervisor start fails
…it can be reused across multiple combine(..) calls (apache#3471)
…nested query (apache#3549)

* print exception details from QueryInterruptedException

* in QueryResource.java, set thread name to include dataSource names and not whole query string e.g. from QueryDataSource
* support finding segments from a AWS S3 storage.

* add more Uts

* address comments and add a document for the feature.

* update docs indentation

* update docs indentation

* address comments.
1. add a Ut for json ser/deser for the config object.
2. more informant error message in a Ut.

* address comments.
1. use @min to validate the configuration object
2. change updateDescriptor to a string as it does not take an argument otherwise

* fix a Ut failure - delete a Ut for testing default max length.
…#3499)

This is useful for the insert-segment-to-db tool, which would otherwise
potentially insert a lot of overshadowed segments as "used", causing
load and drop churn in the cluster.
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9

* Don't mask index in MergeIntIterator.makeQueueElement()

* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator

* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable

* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
…he#3539)

* shutdown kafka consumer on close

* handle close() race condition
…upting (apache#3534)

* allow run thread to gracefully complete instead of interrupting when stopGracefully() is called

* add comments
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()

* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()

* Remove unused import
gianm and others added 29 commits November 4, 2016 12:54
…ad of returning callable (apache#3651)

* Rename ExtractionNamespaceCacheFactory.getCachePopulator() to populateCache() and make it to populate cache itself instead of returning a Callable which populates cache, because this "callback style" is not actually needed.

ExtractionNamespaceCacheFactory isn't a "factory" so it should be renamed, but renaming right in this commit would tear the git history for files, because ExtractionNamespaceCacheFactory implementations have too many changed lines. Going to rename ExtractionNamespaceCacheFactory to something like "CachePopulator" in one of subsequent PRs.

This commit is a part of a bigger refactoring of the lookup cache subsystem.

* Remove unused line and imports
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
* FileSmoosher requested changes 

from metamx/java-util#55

* Addressed code review requested changes.
If any of the bitmaps are empty, the result will be false.
…#3674)

* Use Long timestamp as key instead of DateTime.

DateTime representation is screwed up when you store with an obj
and read with a different DateTime obj.

For example: The code below fails when you use DateTime as key
```
        DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles")));
        HashMap<DateTime, String> map = new HashMap<>();
        map.put(odt, "abc");
        DateTime dt = new DateTime(odt.getMillis());
        System.out.println(map.get(dt));
```

* Respect timezone when creating the file.

* Update docs with timezone caveat in granularity spec

* Remove unused imports
* Migrating  bytebuffercollections from Metamarkets.

* resolving code conflicts and removing <p> from bytebuffer-collections.
…ll druid metrics (apache#3679)

* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics

* Remove unused imports

* Use empty string instead of "testing-version" as a version placeholder
…eFactory.populateCache() (part of apache#3667) (apache#3668)

* Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache()

* Fix tests
* Constant flatteing in math expression

* Addressed comments and fixed some bugs

* Addressed comments
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
Also excludes the correct artifacts from apache#2741
…n tests. (apache#3698)

This also involved some other test changes:

- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
  engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
  apache/datasketches-java#116 for datasketches tests.
…tion (apache#3678)

* option to reset offset automatically in case of OffsetOutOfRangeException
if the next offset is less than the earliest available offset for that partition

* review comments

* refactoring

* refactor

* review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.