Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the pipe operator #21

Merged
merged 2 commits into from
Jun 12, 2017
Merged

Conversation

davidanthoff
Copy link
Member

This enables the following syntax:

DataFrame(x=randn(200)) |> data_values() |>
    mark_bar() |>
    encoding_x_quant(:x; bin=Dict(:maxbins=>20), axis=Dict(:title=>"values")) |>
    encoding_y_quant(:*, aggregate="count", axis=Dict(:title=>"number of draws"))

In itself probably not a huge improvement over the existing + (although ggvis also uses pipes instead of +, so there is some precedence).

But I hope to enable a dplyr like interface for Query.jl, and at that point it might make for a nice overall UI experience, i.e. one could pipe all the way from say a csv load, through some query manipulations into a plot.

@coveralls
Copy link

coveralls commented Jun 5, 2017

Coverage Status

Coverage decreased (-0.7%) to 34.343% when pulling 865e1f9 on davidanthoff:pipe into cd110cb on fredo-dedup:master.

@fredo-dedup
Copy link
Collaborator

In the devl branch I am currently moving away from VegaLite specs built by adding pieces (with + or |>) because the VegaLite JSON schema is rather deep and with identical property names appearing at different places (tick, axis etc..) so it becomes ambiguous if, for example, a ... + tick(..) is added.
Unless I keep that option only for the root level, IDK.

Currently in devl, I am using nested function calls with a plot(..) at the root level. Either the property is a scalar or string and you use a keyword argument, or it is a structure and then you use a function (with a leading underscore to avoid name collisions with Base), for example :

durl = "https://raw.githubusercontent.com/vega/new-editor/master/data/movies.json"

plot(_data(url=durl),
     mark="circle",
     _encoding(_x(_bin(maxbins=10), field="IMDB_Rating", typ="quantitative"),
               _y(_bin(maxbins=10), field="Rotten_Tomatoes_Rating", typ="quantitative"),
               _color(field="Rotten_Tomatoes_Rating", typ="quantitative"),
               _size(aggregate="count", typ="quantitative")),
     width=300, height=300)

One advantage of keeping a syntax that maps directly to the JSON spec is that I can auto-generate all functions and associated docs from the JSON schema file of the VegaLite project. This enables an almost instant synchronization with the project at very little cost (changing the pkg.build to point to the new versions and checking that the auto-generation still passes and that docs are not mangled).

Do you think that keeping the additive syntax (with + or |> ) is a key feature ?

@davidanthoff
Copy link
Member Author

Hm, interesting... I kind of like the composition via some operator at the root level, especially given the ggplot precedent. It kind of allows for a simple building up of figures.

I think the idea that one can pipe something into a plot would work with the new design in any case, though. The code might look like:

load("data.csv") |> 
    plot(
     mark="circle",
     _encoding(_x(_bin(maxbins=10), field="IMDB_Rating", typ="quantitative"),
               _y(_bin(maxbins=10), field="Rotten_Tomatoes_Rating", typ="quantitative"),
               _color(field="Rotten_Tomatoes_Rating", typ="quantitative"),
               _size(aggregate="count", typ="quantitative")),
     width=300, height=300)

I completely understand the desire to match the upstream schema automatically, but I have to say that the API here seems to look quite a bit less attractive relative to what is on master right now... The underscores in particular seem odd, my sense is that they are emerging as a convention for "private" field members, and to see them in a public API now seems odd. It also seems that a whole bunch of things that were Symbols previously are now Strings? That also seems less convenient.

To be honest, my first reaction to the original design on master was "finally a ggplot like API that still looks really julia native", whereas this looks much more like something that an automatic wrapper would create, but not like something that really fits into the julia style or would be easy and pleasant to use...

Maybe one could have the automatically generated stuff as the bottom layer, but then provide another user facing API on top of that that looks more like the old API? I'm not sure that would really buy us much, but maybe worth a thought?

@coveralls
Copy link

Coverage Status

Coverage increased (+24.7%) to 59.732% when pulling 0d7fbe4 on davidanthoff:pipe into cd110cb on fredo-dedup:master.

3 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+24.7%) to 59.732% when pulling 0d7fbe4 on davidanthoff:pipe into cd110cb on fredo-dedup:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+24.7%) to 59.732% when pulling 0d7fbe4 on davidanthoff:pipe into cd110cb on fredo-dedup:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+24.7%) to 59.732% when pulling 0d7fbe4 on davidanthoff:pipe into cd110cb on fredo-dedup:master.

@fredo-dedup
Copy link
Collaborator

Those are good points and I'll try to come up with something that preserves the spirit of the old syntax.

Everywhere strings are valid, symbols can be provided too. There are all translated to JSON the same.

@davidanthoff
Copy link
Member Author

@fredo-dedup I'm now done integrating my whole IterableTables.jl and Query.jl framework with FileIO.jl and a general pipe syntax. If we could merge this here, it would enable some really nice flows:

load("data.csv") |>
    @query(i, begin
        @group i by i.children into g
        @select {children = g.children, age = mean(g..age)}
    end |>
    data_values() |>
    mark_bar() |>
    encoding_x_quant(:x; bin=Dict(:maxbins=>20), axis=Dict(:title=>"values")) |>
    encoding_y_quant(:*, aggregate="count", axis=Dict(:title=>"number of draws")) |>
    save("output.pdf")

There is support for quite a wide range of file formats, both on the load and the save front.

Note in particular that I managed to reuse the save function in FileIO.jl to save plots and you can use that to save in multiple formats.

It would be great if I could show-case all of this in my juliacon talk on Query.jl. Do you think there is a chance that you could merge this PR and then tag a version based on the current master sometime early this week? And then maybe not tag anything based on devel until after juliacon? That would give me a) enough time to have a tagged version to prep my talk and b) make sure that I have the stable, current master version as the tagged version during the course of juliacon. Btw., are you coming to juliacon?

@davidanthoff davidanthoff changed the title WIP Add support for the pipe operator Add support for the pipe operator Jun 12, 2017
@fredo-dedup fredo-dedup merged commit d8d9c9f into queryverse:master Jun 12, 2017
@davidanthoff davidanthoff deleted the pipe branch June 12, 2017 20:00
@davidanthoff
Copy link
Member Author

Cool, thanks, much appreciated!!

acheryauka pushed a commit to acheryauka/VegaLite.jl that referenced this pull request Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants