Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columnar data format effiency: create extra columns for needed expressions #6

Open
cybertyche opened this issue Dec 10, 2018 · 2 comments

Comments

@cybertyche
Copy link
Contributor

For a data structure with fields a, b, and c, if the downstream query operators never refer to field a directly but instead refer to a.d.z, or a["bacon"], or some other constant expression, it may make sense to have a column representing a.d.z or a["bacon"] instead of a. This change would require an alteration of the data type structure of the generated columnar batch, and it would change the way that generated operators over those columns reference fields.

@veikkoeeva
Copy link

There are multiple discussions around this topic, I think. I link the other places at https://github.com/dotnet/corefx/issues/26845 and dotnet/machinelearning#69. It appears the handling industry is converging around Apache Arrow (https://arrow.apache.org/) as the columnar format and it landed an initial C# implementation just recently (https://github.com/apache/arrow/tree/master/csharp). It might make sense to coordinate a bit around this a bit to make a good case for .NET at large (as a side note, tangentially discussed heterogenous computing, Arrow, machine learning parameter tunings and other things at https://github.com/dotnet/orleans). :)

For the readers coming from other links, the Trill has a few other related issues:
#7
#6

@cybertyche
Copy link
Contributor Author

cybertyche commented Dec 30, 2018

That's a fantastic idea. If there is already data arriving natively in Arrow format then making Trill operate directly and efficiently on it would be fantastic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants