Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't configure datasource column as date/time if data set starts with null / empty values #2294

Closed
siepkes opened this issue Jul 1, 2019 · 3 comments

Comments

@siepkes
Copy link
Contributor

commented Jul 1, 2019

Describe the bug
When a datasource (I tested this with a CSV file) has a column which starts with null values the column can't be configured as date/time. The "Time display format" field will always display an error saying "Please enter in the same format as the data value". The "Next" button stays greyed out.

To Reproduce
Steps to reproduce the behavior:

  1. Load a CSV file as datasource with a date/time value column which starts with empty values.
  2. Configure the column as data/time with the correct format.
  3. Metatron shows an error and you can't press the "Next" button.

Expected behavior
I would expect Metatron to ignore the null/empty rows and simply use the first actual value to verify the format.

Screenshots
Screenshot from 2019-07-01 09-12-53

Desktop (please complete the following information):

  • OS: CentOS 7 Linux with Gnome 3.28.2
  • Browser Chrome
  • Version 75.0.3770.80
  • Metatron version: 3.3.0-RC1

Additional context
Not applicable

@kyungtaak

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

@siepkes I'm sorry for the late reply. :(
The current logic will pass validation if more than half of the entries match the timestamp format. At this time, we have not considered null or empty items. I'll fix this soon.

Note that you have to specify one column for the timestamp role because the engine is a time series database. When you specify the currently mentioned column as the timestamp role, the query may be possible, but not efficient in terms of data ingestion. please refer this link: https://metatron-app.github.io/metatron-doc-discovery/en/discovery/part01/druid_features.html#ingestion

@siepkes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 4, 2019

I'm sorry for the late reply. :(

No sweat! Thanks for replying!

The current logic will pass validation if more than half of the entries match the timestamp format. At this time, we have not considered null or empty items. I'll fix this soon.

I understand that this is conceptually somewhat of a hard problem. What if only 15% of the column has a value and the rest is null? What if of that 15% half of it fits the date format and the other half does not?

As an idea; Maybe instead of a hard error it should display a warning like: "Only 50% of the data in this column matches this date format. " and simply ignore the data it can't parse?

Note that you have to specify one column for the timestamp role because the engine is a time series database. When you specify the currently mentioned column as the timestamp role, the query may be possible, but not efficient in terms of data ingestion. please refer this link: https://metatron-app.github.io/metatron-doc-discovery/en/discovery/part01/druid_features.html#ingestion

Sorry my example screenshot is a bad one in that regard. The column in which I demonstrate the issue is not the column that is to be used as the primary timeseries. I hadn't selected the primary timeseries when I made the screenshot to visualize the issue. I should have done that first to avoid confusion.

@minhyun2

This comment has been minimized.

Copy link
Contributor

commented Jul 5, 2019

@siepkes Thank you for your kindness.
If there are null or empty items, we will validate that it is more than half the total count, excluding the count of null or empty items.

minhyun2 added a commit that referenced this issue Jul 8, 2019

@minhyun2 minhyun2 added this to the 3.3.0 milestone Jul 9, 2019

@minhyun2 minhyun2 added the @datasource label Jul 9, 2019

@minhyun2 minhyun2 referenced this issue Jul 15, 2019
0 of 7 tasks complete

kyungtaak added a commit that referenced this issue Jul 17, 2019

#2294 changed validation rule for timestamp column
Validation target must not include null or empty value.

@ufoscw ufoscw closed this Jul 17, 2019

ufoscw added a commit that referenced this issue Jul 17, 2019

#2294 changed validation rule for timestamp column
Validation target must not include null or empty value.

ufoscw added a commit that referenced this issue Jul 29, 2019

#2294 changed validation rule for timestamp column
Validation target must not include null or empty value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.