Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data: support duplicate Field names in a Frame #83

Merged
merged 6 commits into from Mar 11, 2020

Conversation

kylebrandt
Copy link
Contributor

@kylebrandt kylebrandt commented Mar 9, 2020

Add fieldIdx prefix to arrow field Name to keep the names unique.

Front-end will need to do the same removal of the prefix.

for #59

@kylebrandt kylebrandt requested a review from ryantxu March 9, 2020 18:24
@kylebrandt kylebrandt added arrow https://godoc.org/github.com/apache/arrow/go/arrow dataframe labels Mar 9, 2020
@kylebrandt kylebrandt added this to the 6.7 milestone Mar 9, 2020
@ryantxu
Copy link
Member

ryantxu commented Mar 9, 2020

Is the proposal here to always have the prefix number? Maybe we can check and append (if necessary) before making a frame?

@kylebrandt
Copy link
Contributor Author

Idea is always as part of the to/from arrow process.

Has to be always, or can't tell if it is our prefix or name that looks like our prefix.

@ryantxu
Copy link
Member

ryantxu commented Mar 9, 2020

Can we add a constraint that the names must be unique? And add a helper to append indicies when necessary? (Then we don't need to modify on the read side)

@kylebrandt
Copy link
Contributor Author

Can we add a constraint that the names must be unique?

Can do it via documentation, and maybe helper functions. However, this awkward is standard wide time series response. You have a metric name that would be repeated, but different Labels on the frame for that repeated name.

And add a helper to append indicies when necessary? (Then we don't need to modify on the read side)

I don't understand what you are saying here.

@kylebrandt
Copy link
Contributor Author

Also this means that Tables that don't have names, and just positions can't be represented. Because Name: "" - empty string, would still be a duplicate.

@ryantxu
Copy link
Member

ryantxu commented Mar 9, 2020

If I understand this PR, it will throw an error for any dataframe that does not have a field name that matches "${index}: ${string}" -- that seems pretty rough.

Perhaps the long/wide syncing could assume that fields named:
value, value (1), value (2) are all meant to be called "value"

with changing the name to "${index}: ${string}" it seems we may as well name it a UUID and get the name from elsewhere

@kylebrandt
Copy link
Contributor Author

I like the UUID idea and just using metadata for name in arrow. Can work on that tomorrow.

@ryantxu
Copy link
Member

ryantxu commented Mar 9, 2020

i am reluctant to have the names be totally arbitrary -- if we need a mode where they are, that seems fine, but name==UUID seems weird.

This would be the first choice that makes it so an arbitrary arrow file would not "just work"

@ryantxu
Copy link
Member

ryantxu commented Mar 9, 2020

If I understand the feature request in this PR is that you want to be able to represent multiple fields with the same name in arrow Table, not just the fields?

@kylebrandt
Copy link
Contributor Author

kylebrandt commented Mar 9, 2020

I don't understand the "not just the fields" part...

But yes, the idea is that a Dataframe supports multiple Fields with the same Name, so the name doesn't have to be unique (or set) within a dataframe.

@kylebrandt
Copy link
Contributor Author

Consider:

MySQL > select 1 AS a, 2 AS a;
+---+---+
| a | a |
+---+---+
| 1 | 2 |
+---+---+

Copy link
Member

@ryantxu ryantxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- what else needs updated in core before while we bump the version?

if len(sp) == 2 {
return sp[1]
}
return name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@marefr
Copy link
Member

marefr commented Mar 10, 2020

I don't really have any preferences, but if arrow have this restriction I don't see why we shouldn't have this resctriction.
MySQL > select 1 AS a, 2 AS a;
Even id you could do this it doesn't make sense to me. A table in database cannot have 2 columns with the same name so unless you do something crazy suit yourself is my point 😄

@kylebrandt
Copy link
Contributor Author

I don't really have any preferences, but if arrow have this restriction I don't see why we shouldn't have this resctriction.
MySQL > select 1 AS a, 2 AS a;
Even id you could do this it doesn't make sense to me. A table in database cannot have 2 columns with the same name so unless you do something crazy suit yourself is my point

There are less abnormal examples. For example a CSV file with no headers (name is absent), or maybe everything in the frame is the same "name", so the frame gets a name, but you want to leave all the column names default ("").

@kylebrandt
Copy link
Contributor Author

LGTM -- what else needs updated in core before while we bump the version?

I can't think so. Once you have a Frontend PR ready that goes with this. I will merge it, tag it, and then update your PR to use this.

@marefr
Copy link
Member

marefr commented Mar 10, 2020

I don't really have any preferences, but if arrow have this restriction I don't see why we shouldn't have this resctriction.
MySQL > select 1 AS a, 2 AS a;
Even id you could do this it doesn't make sense to me. A table in database cannot have 2 columns with the same name so unless you do something crazy suit yourself is my point

There are less abnormal examples. For example a CSV file with no headers (name is absent), or maybe everything in the frame is the same "name", so the frame gets a name, but you want to leave all the column names default ("").

Sure. The question is how Grafana should be able to handle it if the idea is that Grafana should remove the identifier. Can Grafana handle two fields named the same?

@ryantxu
Copy link
Member

ryantxu commented Mar 10, 2020

Can Grafana handle two fields named the same?

Yes -- there is a warning that if you have two fields with the same name some things -- like using template variables for display (ie: ${frame.name[2]}) won't work

Copy link
Contributor

@bergquist bergquist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

❤️ golen test files

@marefr
Copy link
Member

marefr commented Mar 11, 2020

Can Grafana handle two fields named the same?

Yes -- there is a warning that if you have two fields with the same name some things -- like using template variables for display (ie: ${frame.name[2]}) won't work

Great. Just want to make sure I understand everything. So what would the field type of a "merged" field be in Grafana - none?

@kylebrandt
Copy link
Contributor Author

Front end change exists it seems: grafana/grafana#22705

@kylebrandt kylebrandt changed the title dataframe: field name prefix data: support duplicate field names in arrow by adding name prefix Mar 11, 2020
@kylebrandt kylebrandt changed the title data: support duplicate field names in arrow by adding name prefix data: support duplicate field names in arrow Frame encoding by adding name prefix Mar 11, 2020
@kylebrandt kylebrandt changed the title data: support duplicate field names in arrow Frame encoding by adding name prefix data: support duplicate Field names in a Frame Mar 11, 2020
@kylebrandt kylebrandt merged commit 90efbdc into master Mar 11, 2020
@kylebrandt kylebrandt deleted the df_field_name_prefix branch March 11, 2020 13:31
kylebrandt added a commit that referenced this pull request Apr 15, 2020
kylebrandt added a commit that referenced this pull request Apr 15, 2020
data.Frame's Field's Name prefixing hack is now no longer needed, Arrow now allows for duplicate Field names, at least in our use cases.

* Revert "data: support duplicate Field names in a Frame (#83)"
This reverts commit 90efbdc

* Update to newer version of go Arrow lib that allows duplicate field names and add a test that panicked before the upgrade. Also add duplicate field to golden file.

fixes #137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow https://godoc.org/github.com/apache/arrow/go/arrow dataframe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants