Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: easier column extraction and data cleaning #146

Open
piever opened this issue Sep 19, 2017 · 0 comments
Open

Proposal: easier column extraction and data cleaning #146

piever opened this issue Sep 19, 2017 · 0 comments
Milestone

Comments

@piever
Copy link

piever commented Sep 19, 2017

I was thinking that something like the @df macro in StatPlots would benefit many different packages, by allowing normal arrays from the output of a query to be fed directly to a function (especially as it can be done without even explicitly collecting the query, see here). What I was wondering is whether something similar could live in Query as well. I'm thinking about a macro of the style:

@replace_complete_cols df f(_..a, _..b, _..c .+ 1)

which would replace the _..s expression with the respective columns converted to a regular Array (it would exclude rows where a column that is being used is missing data). There are two more tools that would be helpful to implement this functionality and would go well together with it:

  • a @dropna stand-alone macro that would filter rows with no missing values
  • as mentioned here, the possibility to have a tuple in a @select statement, which could then be collected as a tuple of Arrays. Without that, selecting an arbitrary number of columns is a bit cumbersome (I haven't found a way of selecting multiple columns with a NamedTuple iterator because there doesn't seem to be a type stable way of generating a NamedTuple without manually typing each element, whereas list comprehension works just fine for tuples).

Do you believe that this kind of macro belongs to Query.jl or should it live somewhere else?
Also, what syntax would you think is best? What I put here is pretty much a placeholder.

@davidanthoff davidanthoff added this to the Backlog milestone Aug 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants