Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThetaSketch Ingestion #38

Closed
alon-blum opened this issue Jun 5, 2023 · 2 comments
Closed

ThetaSketch Ingestion #38

alon-blum opened this issue Jun 5, 2023 · 2 comments

Comments

@alon-blum
Copy link

alon-blum commented Jun 5, 2023

Is ThetaSketch ingestion supported?

If the ThetaSketch objects are already created in the Spark DataFrame using the Datasketches libraries, and I try to ingest it, I get the following error.
Are there any dependencies I need to add?

Job aborted due to stage failure: Task 0 in stage 14.0 failed 4 times, most recent failure: Lost task 0.3 in stage 14.0 (TID 127905) (ip-10-232-14-117.ec2.internal executor 127): java.lang.IllegalArgumentException: Failed to deserialize from metricsSpec=[
        {
          "type": "thetaSketch",
          "name": "user_id_sketch",
          "fieldName": "uid",
          "size": 4096,
          "shouldFinalize": true,
          "isInputThetaSketch": true,
          "errorBoundsStdDev": null
        },
        {
          "type": "longMin",
          "name": "min_timestamp",
          "fieldName": "unix_timestamp_min",
          "expression": null
        },
        {
          "type": "longMax",
          "name": "max_timestamp",
          "fieldName": "unix_timestamp_max",
          "expression": null
        }
      ]
	at com.rovio.ingest.model.SegmentSpec.getAggregators(SegmentSpec.java:240)
	at com.rovio.ingest.model.SegmentSpec.getDataSchema(SegmentSpec.java:158)
	at com.rovio.ingest.TaskDataWriter.<init>(TaskDataWriter.java:97)
	at com.rovio.ingest.DruidDataSourceWriter$TaskWriterFactory.createWriter(DruidDataSourceWriter.java:106)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:430)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:381)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:138)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Please make sure to load all the necessary extensions and jars with type 'thetaSketch'. Could not resolve type id 'thetaSketch' as a subtype of `org.apache.druid.query.aggregation.AggregatorFactory` known type ids = [cardinality, count, doubleAny, doubleFirst, doubleLast, doubleMax, doubleMean, doubleMin, doubleSum, expression, filtered, floatAny, floatFirst, floatLast, floatMax, floatMin, floatSum, grouping, histogram, hyperUnique, javascript, longAny, longFirst, longLast, longMax, longMin, longSum, stringAny, stringFirst, stringFirstFold, stringLast, stringLastFold]
 at [Source: (String)"[
        {
          "type": "thetaSketch",
          "name": "user_id_sketch",
          "fieldName": "uid",
          "size": 4096,
          "shouldFinalize": true,
          "isInputThetaSketch": true,
          "errorBoundsStdDev": null
        },
        {
          "type": "longMin",
          "name": "min_timestamp",
          "fieldName": "unix_timestamp_min",
          "expression": null
        },
        {
          "type": "longMax",
          "name": "max_timestamp",
          "fi"[truncated 78 chars]; line: 3, column: 19] (through reference chain: java.util.ArrayList[0])
	at com.fasterxml.jackson.databind.exc.InvalidTypeIdException.from(InvalidTypeIdException.java:43)
	at org.apache.druid.jackson.DefaultObjectMapper$DefaultDeserializationProblemHandler.handleUnknownTypeId(DefaultObjectMapper.java:124)
	at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownTypeId(DeserializationContext.java:1545)
	at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._handleUnknownTypeId(TypeDeserializerBase.java:298)
	at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:165)
	at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:125)
	at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:110)
	at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:357)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3612)
	at com.rovio.ingest.model.SegmentSpec.getAggregators(SegmentSpec.java:238)
	... 13 more
@vivek-balakrishnan-rovio
Copy link
Collaborator

Hi @alon-blum,
Currently, ThetaSketch ingestion is not supported.
We are working on to add datasketches extensions to this library.

@vivek-balakrishnan-rovio
Copy link
Collaborator

New release supports datasketches https://github.com/rovio/rovio-ingest/releases/tag/v1.0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants