Quantiles2 #1

fundead · 2016-12-07T20:28:29Z

No description provided.

* LRU cache guarantee to keep size under limit * address comments * fix failed tests in jdk7

* Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.

close kafka consumer in case supervisor start fails

)

…it can be reused across multiple combine(..) calls (apache#3471)

…nested query (apache#3549) * print exception details from QueryInterruptedException * in QueryResource.java, set thread name to include dataSource names and not whole query string e.g. from QueryDataSource

@min

* support finding segments from a AWS S3 storage. * add more Uts * address comments and add a document for the feature. * update docs indentation * update docs indentation * address comments. 1. add a Ut for json ser/deser for the config object. 2. more informant error message in a Ut. * address comments. 1. use @min to validate the configuration object 2. change updateDescriptor to a string as it does not take an argument otherwise * fix a Ut failure - delete a Ut for testing default max length.

…#3499) This is useful for the insert-segment-to-db tool, which would otherwise potentially insert a lot of overshadowed segments as "used", causing load and drop churn in the cluster.

* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9 * Don't mask index in MergeIntIterator.makeQueueElement() * DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator * Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable * Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }

* Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check

…#3550) Fixes apache#3527

…he#3539) * shutdown kafka consumer on close * handle close() race condition

…f necessary. (apache#3577) Fixes apache#3576.

…#3573) Fixes apache#3548.

…upting (apache#3534) * allow run thread to gracefully complete instead of interrupting when stopGracefully() is called * add comments

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.

* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue() * Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue() * Remove unused import

…tring. (apache#3658)

…ad of returning callable (apache#3651) * Rename ExtractionNamespaceCacheFactory.getCachePopulator() to populateCache() and make it to populate cache itself instead of returning a Callable which populates cache, because this "callback style" is not actually needed. ExtractionNamespaceCacheFactory isn't a "factory" so it should be renamed, but renaming right in this commit would tear the git history for files, because ExtractionNamespaceCacheFactory implementations have too many changed lines. Going to rename ExtractionNamespaceCacheFactory to something like "CachePopulator" in one of subsequent PRs. This commit is a part of a bigger refactoring of the lookup cache subsystem. * Remove unused line and imports

Excludes tests from AvoidStaticImport, since those are used often there and I didn't want to make this changeset too large. Production code use was minimal and I switched those to non-static imports.

…he#3529)

…a-storage) (apache#3421)

…deep storage (apache#3670)

* FileSmoosher requested changes from metamx/java-util#55 * Addressed code review requested changes.

If any of the bitmaps are empty, the result will be false.

…#3674) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports

…pache#3671)

* Migrating bytebuffercollections from Metamarkets. * resolving code conflicts and removing <p> from bytebuffer-collections.

…ll druid metrics (apache#3679) * Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics * Remove unused imports * Use empty string instead of "testing-version" as a version placeholder

…eFactory.populateCache() (part of apache#3667) (apache#3668) * Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() * Fix tests

…egments (apache#3666)

Fixes apache#3683.

* Constant flatteing in math expression * Addressed comments and fixed some bugs * Addressed comments

* Min/Max aggregator for Timestamp * remove unused imports and method * rebase and zip the test data * add docs

Also excludes the correct artifacts from apache#2741

…n tests. (apache#3698) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around apache/datasketches-java#116 for datasketches tests.

…p Versions' docs (apache#3706)

…tion (apache#3678) * option to reset offset automatically in case of OffsetOutOfRangeException if the next offset is less than the earliest available offset for that partition * review comments * refactoring * refactor * review comments

…pache#3708)

…d post aggregators

pjain1 and others added 30 commits September 23, 2016 16:53

log exceptions while trying to pause task (apache#3504)

15c9918

secure BrokerQueryResource endpoints (apache#3506)

56b0586

LRU cache guarantee to keep size under limit (apache#3510)

91e6ab4

* LRU cache guarantee to keep size under limit * address comments * fix failed tests in jdk7

Changing num threads to 9 (apache#3492)

d0ea841

make global http client worker threads configurable (apache#3514)

78b06a7

Add simple test to FunctionalExtractionTest (apache#3522)

654e1db

Bump versions to 0.9.3-SNAPSHOT (apache#3524)

40f2fe7

Add support for timezone in segment granularity (apache#3528)

43cdc67

* Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.

handle supervisor spec metadata failures (apache#3456)

e419407

close kafka consumer in case supervisor start fails

add context to kafka supervisor for the kafka indexing task (apache#3464

5929035

)

SketchAggregatorFactory.combine(..) returns Union object now so that …

1523de0

…it can be reused across multiple combine(..) calls (apache#3471)

Make segment creation gauva 14 friendly (apache#3520)

76e77cb

create parent dir on HDFS if it does not exist (apache#3547)

76a60a0

fix: QueryResource thread name includes whole inner query string for …

7e68245

…nested query (apache#3549) * print exception details from QueryInterruptedException * in QueryResource.java, set thread name to include dataSource names and not whole query string e.g. from QueryDataSource

Use explicit version from HadoopIngestionSpec. (apache#3554)

078de4f

fix datasegment metadata (apache#3555)

c255dd8

Doc update(batch-ingestion) to include useExplicitVersion. (apache#3557)

3a83e05

fix useExplicitVersion (apache#3559)

1e79a1b

When inserting segments, mark unused if already overshadowed. (apache…

ddc8562

…#3499) This is useful for the insert-segment-to-db tool, which would otherwise potentially insert a lot of overshadowed segments as "used", causing load and drop churn in the cluster.

Add Checkstyle framework (apache#3551)

5dc9538

* Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check

Avoid exceptions for dataSource spec when using s3 (apache#3544)

4554c12

Add druid-lookups-cached-single to default distribution build (apache…

3b6261c

…#3550) Fixes apache#3527

KafkaLookupExtractorFactory: shutdown kafka consumer on close() (apac…

472c409

…he#3539) * shutdown kafka consumer on close * handle close() race condition

HdfsDataSegmentPusher: Properly include scheme, host in output path i…

0ce33bc

…f necessary. (apache#3577) Fixes apache#3576.

Remove dropwizard-jdbc dependency from lookups-cached-single. (apache…

c1d3b8a

…#3573) Fixes apache#3548.

KafkaIndexTask: Allow run thread to stop gracefully instead of interr…

c2ae734

…upting (apache#3534) * allow run thread to gracefully complete instead of interrupting when stopGracefully() is called * add comments

Small topn scan improvements (apache#3526)

9611358

* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue() * Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue() * Remove unused import

gianm and others added 29 commits November 4, 2016 12:54

Add NOTICE for TestNG code. (apache#3661)

2bbbad9

SubstringDimExtractionFn, BoundDimFilter: Implement typical style toS…

4cbebd0

…tring. (apache#3658)

Checkstyle checks for AvoidStaticImport, UnusedImports. (apache#3660)

657e451

Excludes tests from AvoidStaticImport, since those are used often there and I didn't want to make this changeset too large. Production code use was minimal and I switched those to non-static imports.

Add support for Confluent Schema Registry in the avro extension (apac…

37ecffb

…he#3529)

Metadata Storage extension for Microsoft SqlServer (sqlserver-metadat…

575aeb8

…a-storage) (apache#3421)

reset-cluster command to clean up druid state stored on metadata and …

b76b3f8

…deep storage (apache#3670)

FileSmoosher requested changes (apache#3673)

1acc816

* FileSmoosher requested changes from metamx/java-util#55 * Addressed code review requested changes.

Short-circuiting AndFilter. (apache#3676)

fd54514

If any of the bitmaps are empty, the result will be false.

consolidate different theta sketch representations into SketchHolder (a…

ddc0789

…pache#3671)

Migrating bytebuffercollections from Metamarkets. (apache#3647)

3e40849

* Migrating bytebuffercollections from Metamarkets. * resolving code conflicts and removing <p> from bytebuffer-collections.

Unwrap exceptions from RuntimeException in URIExtractionNamespaceCach…

988d97b

…eFactory.populateCache() (part of apache#3667) (apache#3668) * Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() * Fix tests

fix bug in StringDimensionHandler and add a cli tool for validating s…

7c0f462

…egments (apache#3666)

Make buildV9Directly the default. (apache#3688)

bcd2044

groupBy v1: Force all dimensions to strings. (apache#3685)

9ad34a3

Fixes apache#3683.

Constant flattening in math expression (apache#3090)

bb26636

* Constant flatteing in math expression * Addressed comments and fixed some bugs * Addressed comments

Fixed Missing commas in json example of Lookup (apache#3680)

2df98bc

Support Min/Max for Timestamp (apache#3299)

094f5b8

* Min/Max aggregator for Timestamp * remove unused imports and method * rebase and zip the test data * add docs

Update branch in deploy-docs.sh. (apache#3695)

607f643

WIP: Add Google Storage support (apache#2458)

7d36f54

Also excludes the correct artifacts from apache#2741

Fix resource leaks (apache#3702)

7b56cec

Add mapreduce.job.classloader.system.classes property to 'Other Hadoo…

7c63bee

…p Versions' docs (apache#3706)

fix the documented property name for specifying avro reader schema (a…

7d37f67

…pache#3708)

Fix concurrency defects, remove unnecessary volatiles (apache#3701)

c070b4a

datasketches-experiments contrib module with quantiles aggregators an…

b4c747e

…d post aggregators

fundead force-pushed the quantiles2 branch from 0f3e6c0 to b4c747e Compare December 7, 2016 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantiles2 #1

Quantiles2 #1

fundead commented Dec 7, 2016

Quantiles2 #1

Are you sure you want to change the base?

Quantiles2 #1

Conversation

fundead commented Dec 7, 2016