New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ignoreInvalidRow option does not work when loading a measure column #1158

Closed
kyungtaak opened this Issue Dec 26, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@kyungtaak
Copy link
Contributor

kyungtaak commented Dec 26, 2018

Describe the bug
Despite you set 'ignoreInvalidRow' property in tuning configuration to true and proceed to load data source, You can see NumberFormatException or exception related to the type conversion in ingestion log, if the column specified by measure is not a numeric column.

To Reproduce

Steps to reproduce the behavior:

  1. Go to 'Create Datasource' in datasource management view
  2. Click on 'File' type
  3. Select sample file.
  4. Specify 'm1' column as measure.
  5. Confirm that 'ignoreInvalidRow' property is ture.
  6. Save datasource
  7. see error.

Expected behavior
Errors that occur when processing row values ​​during ingestion must be affected by ignoreInvalidRow property.

Screenshots
(N/A)

Desktop (please complete the following information):
(N/A)

Additional context
(N/A)

@kyungtaak kyungtaak added this to the 3.2.0 milestone Dec 26, 2018

@kyungtaak

This comment has been minimized.

Copy link
Contributor Author

kyungtaak commented Dec 27, 2018

I will use 'enforceType' property in dataSchema #1157 :)

@kyungtaak kyungtaak self-assigned this Dec 30, 2018

@kyungtaak

This comment has been minimized.

Copy link
Contributor Author

kyungtaak commented Jan 7, 2019

related with #735

@kyungtaak

This comment has been minimized.

Copy link
Contributor Author

kyungtaak commented Jan 7, 2019

@navis 아래와 같이 csv 데이터와 적재 스펙을 수행 결과, enforceType 이 true 이고, ignoreInvalidRows 가 true 인 경우 적재 오류가 발생합니다. 이경우 "2017-04-20,d1,sd3,30,aa" 부분만 제외하고 적재가 정상적으로 되어야할것 으로 생각되는데요. 확인 부탁드립니다.

  • csv data
time,d,sd,m1,m2
2017-04-20,d1,sd1,10,30
2017-04-20,d1,sd2,20,30
2017-04-20,d1,sd3,30,aa
2017-04-20,d2,sd1,10,30
2017-04-20,d2,sd2,20,30
2017-04-20,d2,sd3,30,30
2017-04-20,d3,sd1,10,30
2017-04-20,d3,sd2,20,30
2017-04-20,d3,sd3,30,30
  • IngestionSpec
{
  "type": "index",
  "spec": {
    "dataSchema": {
      "dataSource": "localfileingestion_invalid_enforce_trueblmoc",
      "parser": {
        "type": "csv.stream",
        "timestampSpec": {
          "column": "time",
          "format": "yyyy-MM-dd",
          "missingValue": "2019-01-07T10:49:03.326Z",
          "invalidValue": "2019-01-07T10:49:03.326Z",
          "replaceWrongColumn": true
        },
        "dimensionsSpec": {
          "dimensions": [
            "d",
            "sd"
          ],
          "dimensionExclusions": [],
          "spatialDimensions": []
        },
        "columns": [
          "time",
          "d",
          "sd",
          "m1",
          "m2"
        ],
        "delimiter": ",",
        "skipHeaderRecord": true
      },
      "metricsSpec": [
        {
          "type": "count",
          "name": "count"
        },
        {
          "type": "sum",
          "name": "m1",
          "fieldName": "m1",
          "inputType": "double"
        },
        {
          "type": "sum",
          "name": "m2",
          "fieldName": "m2",
          "inputType": "double"
        }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "MONTH",
        "queryGranularity": "DAY",
        "intervals": [
          "1970-01-01/2050-01-01"
        ]
      },
      "enforceType": true
    },
    "ioConfig": {
      "type": "index",
      "firehose": {
        "type": "local",
        "filter": "localfileingestion_invalid_enforce_trueblmoc_1546858143257.csv",
        "baseDir": "/var/folders/b_/c009kjz97lq05rjg_43srz240000gn/T/"
      }
    },
    "tuningConfig": {
      "type": "index",
      "maxRowsInMemory": 75000,
      "ignoreInvalidRows": true,
      "buildV9Directly": true
    }
  },
  "context": {
    "druid.task.runner.dedicated.host": "localhost:8091"
  }
}
@navis

This comment has been minimized.

Copy link

navis commented Jan 8, 2019

It's thrown when deciding interval / partition with streaming parser. Fixed to apply 'IgnoreInvalidRows' also into the damn thing.

@navis

This comment has been minimized.

Copy link

navis commented Jan 8, 2019

@kyungtaak

This comment has been minimized.

Copy link
Contributor Author

kyungtaak commented Jan 9, 2019

@navis 적용테스트 결과 IgnoreInvalidRows 옵션를 true 로 지정했음에도, 아래와 같은 오류가 발생합니다.

2019-01-09T12:56:38,236 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_localfileingestion_invalid_enforce_trueheqxs_2019-01-09T12:56:32.281Z, type=index, dataSource=localfileingestion_invalid_enforce_trueheqxs}]
com.metamx.common.parsers.ParseException: Unable to parse double from value[aa]
	at io.druid.data.Rows.parseDouble(Rows.java:65) ~[druid-api-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.ValueType$3.cast(ValueType.java:106) ~[druid-api-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.ValueDesc.cast(ValueDesc.java:510) ~[druid-api-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.input.InputRowParsers$1.evaluate(InputRowParsers.java:73) ~[druid-processing-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.input.InputRowParsers$1.evaluate(InputRowParsers.java:68) ~[druid-processing-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.input.InputRowParsers$2.convert(InputRowParsers.java:146) ~[druid-processing-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.input.InputRowParsers$2.access$000(InputRowParsers.java:93) ~[druid-processing-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.input.InputRowParsers$2$1.apply(InputRowParsers.java:117) ~[druid-processing-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.input.InputRowParsers$2$1.apply(InputRowParsers.java:113) ~[druid-processing-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at com.google.common.collect.Iterators$8.transform(Iterators.java:794) ~[guava-16.0.1.jar:?]
	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) ~[guava-16.0.1.jar:?]
	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:646) ~[guava-16.0.1.jar:?]
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.1.jar:?]
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.1.jar:?]
	at io.druid.common.guava.GuavaUtils$8.hasNext(GuavaUtils.java:478) ~[druid-common-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.common.guava.GuavaUtils$DelegatedProgressing.hasNext(GuavaUtils.java:512) ~[druid-common-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.segment.realtime.firehose.LocalFirehoseFactory$2.hasMore(LocalFirehoseFactory.java:196) ~[druid-server-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:397) ~[druid-indexing-service-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:222) ~[druid-indexing-service-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:458) [druid-indexing-service-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:430) [druid-indexing-service-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_65]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_65]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_65]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
Caused by: java.lang.NumberFormatException: For input string: "aa"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) ~[?:1.8.0_65]
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) ~[?:1.8.0_65]
	at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_65]
	at java.lang.Double.valueOf(Double.java:502) ~[?:1.8.0_65]
	at io.druid.data.Rows.tryParseDouble(Rows.java:185) ~[druid-api-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	at io.druid.data.Rows.parseDouble(Rows.java:62) ~[druid-api-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
	... 24 more

kyungtaak added a commit that referenced this issue Jan 9, 2019

@navis

This comment has been minimized.

Copy link

navis commented Jan 10, 2019

파라미터를 빼먹었네요. 다시 올렸습니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment