Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

based_on not working in @linq macro #48

Closed
ElOceanografo opened this issue Mar 30, 2016 · 5 comments
Closed

based_on not working in @linq macro #48

ElOceanografo opened this issue Mar 30, 2016 · 5 comments

Comments

@ElOceanografo
Copy link

I've been playing around with this package, and it's already made some of my real-world work a lot easier. I came across this error just now--based_on does not seem to work when chained together with other operations using @linq.

With R and dplyr, I can do the following:

> iris %>%  group_by(Species) %>%  summarise(x=mean(Sepal.Width))

but when I try the (I think?) equivalent in Julia, I get an error:

julia> @linq iris |> groupby(:Species) |> based_on(x = mean(SepalWidth))
ERROR: UndefVarError: based_on not defined

Is this something that should work? (Or am I doing something incorrectly?)

@nalimilan
Copy link
Member

Looks like only the @based_on form (with the @) works:

iris=dataset("datasets", "iris")
@linq iris |> groupby(:Species) |> @based_on(x = mean(:SepalWidth))

But this syntax is probably better, right?

@linq iris |> by(:Species, x=mean(:SepalWidth))

Or maybe I'm missing the difference between by and based_on (@tshort again, do we need both?).

@ElOceanografo
Copy link
Author

If you want to do transformations or calculations in between grouping and summarizing, a sand-alone based_on is nice. For an example, say you've got a data frame with the positions (x) of a bunch of different objects at different times (t), and you want to calculate each object's average speed:

speeds = @linq trajectory |>
    group_by(:object) |>
    transform(timespan = max(:t) - min(:t), displacement = last(:x) - first(:x)) |>
    based_on(avg_speed = :displacement ./ :timespan)

I this could of course all be done on one line, or by defining a helper function, but if you're working in the data-frame-piping style I think a separate based_on or summarise makes things a little clearer...

@ElOceanografo
Copy link
Author

👍

@nalimilan
Copy link
Member

OK, I see. Though I don't understand the name: what is "based on" what in this operation? Isn't summarise a better term?

@ElOceanografo
Copy link
Author

I believe the ideas is "based on this data frame, calculate these things." I'd tend to agree that summarize is a better term. In dplyr, all the data frame operations are verbs, which is nice for readability.

(Prepare for the coming flame war over whether it should be summarise or summarize ;) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants