Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plots dependency #42

Closed
tbreloff opened this issue Oct 17, 2015 · 21 comments
Closed

Plots dependency #42

tbreloff opened this issue Oct 17, 2015 · 21 comments

Comments

@tbreloff
Copy link
Collaborator

cc: @johnmyleswhite @petercolberg

Continuing the comments from METADATA... I agree that I also prefer to keep dependencies small. A core part of the design of Plots is to be lightweight with minimal dependencies so that it could be a requirement for packages like this (Colors.jl is the only additional dependency). You don't need to install Gtk or Cairo, etc to be able to load the package.

That said... I'd like to hear other ideas for how to structure the organization of statistical plots. Should there be a StatPlots.jl package with plotting recipes for common needs? Of course then you may run into the reverse problem... all I wanted was to trace plot a linear regression, and now I have dependencies on lots of statistics packages.

What's the cleanest separation?

@Evizero
Copy link

Evizero commented Oct 17, 2015

To weigh in on this: One aspect I like about R is that almost everything provides a custom plot method.

I am sure you know this, but let me give a concrete example just to get my point across: In this case fitting and plotting a linear model in R

fit <- lm(dist~speed, data=cars)
plot(fit)

or

plot(cars$speed, cars$dist)
abline(fit)

I would love a clean way to do the same kind of thing in Julia as well. For me the use-case is currently SVMs, and I would like the package to provide non-trivial custom plots similar to the following (google result of a svc-plot kernlab produced)

svm plot

To me Plots.jl currently seems like the best way to provide this kind of functionality for a broad range of backends.

That being said one could always use Requires.jl to just overload the Plots.jl plot function only once it is loaded (which is probably what I will do). The only other solution I can think of is by using Requires.jl and custom tailor overloaded plots for each backend separately, which may or may not yield better individual results

@StefanKarpinski
Copy link

Conditional modules are relevant – there are also various hacks to emulate this poorly (e.g. checking for the presence of Plots and only evaluating some code if already present.

@tbreloff
Copy link
Collaborator Author

I'm on the same page with @Evizero. Plotting should be core to many things... it would be a shame to make people jump through hoops to support complex custom visualizations for their packages.

If conditional modules existed, then I definitely agree that's the way to go. I use lots of the aforementioned hacks in Plots to avoid loading those huge graphics packages. @StefanKarpinski: what's the most Julian hack currently? Call a loadplots() method which evals a bunch of definitions? Wrap the load an a try block? Use Pkg.installed (which also needs a try block)?

On Oct 17, 2015, at 4:23 AM, Stefan Karpinski notifications@github.com wrote:

Conditional modules are relevant – there are also various hacks to emulate this poorly (e.g. checking for the presence of Plots and only evaluating some code if already present.


Reply to this email directly or view it on GitHub.

@joshday
Copy link
Owner

joshday commented Oct 17, 2015

Also with @Evizero on the importance of plot methods. Also with @tbreloff on what's the most correct way to do this?

@tbreloff
Copy link
Collaborator Author

@joshday How do you feel about me starting a StatPlots.jl package and moving your Plots-specific stuff there, as well as my corrplot, and whatever else falls into the "statistical plotting recipes" category?

I assume it would require (and reexport) Plots, and maybe StatsBase/LearnBase, and have optional dependencies on OnlineStats, OnlineAI, MultivariateStats, etc.

Do you like the name? Any other suggestions?

@Evizero
Copy link

Evizero commented Oct 21, 2015

To me it would make more sense if the specific plots that involve concepts from OnlineStats live in OnlineStats. When using Requires.jl there doesn't need to be a static require on any plotting package anyway if I am not mistaken.

Having a central Plot package for everything statistics sounds like a nightmare conceptually. Then again, it might make sense if the scope of that package would only include the most low-level Stats libraries instead of everything Stats related.

I wonder though, is there any drawback to defining the plots in OnlineStats using Requires.jl ? Sounds like the cleanest solution to me

@joshday
Copy link
Owner

joshday commented Oct 21, 2015

StatPlots.jl could be really useful. It would be great if things like corrplot, residual plots, etc., could all live in there. using StatPlots would essentially translate to

using StatsBase
using Plots

...lots of plot methods defined here

I think plot methods for a package should remain in its own repo. Maybe OnlineStats should use Requires with StatPlots? I really don't know.

@tbreloff
Copy link
Collaborator Author

I think I'm in agreement with you... just trying to make people happy that have a knee-jerk reaction to anything plotting related going into statistics packages.

I would argue it's just as important to keep statistics out of plotting packages if you care about being lightweight.

I think a standalone package for statistics/ML plotting should indeed contain the generic versions, such as corrplot(mat) or residualplot(y, yhat), etc. If OnlineStats want to provide corrplot(CovarianceMatrix) or similar, it can always just call out to the generic version.

With that logic, though, OnlineStats would then depend on StatPlots.jl instead of Plots.jl directly... not sure how people feel about that...

@tbreloff
Copy link
Collaborator Author

@Evizero The drawback to Requires is that it's not good for precompilation. Only people that don't want to plot anything will benefit (not sure how often that happens... I always want to plot everything).

On that note, I've been thinking through whether I can precompile the installed backends into Plots, so that there's minimal initialization needed, but still minimal package requirements.

@johnmyleswhite
Copy link

I think it's better to leave you all to reach your own consensus, but I feel like I should point that at work I never want to plot anything. I build automated systems that humans do not observe directly and there is never a single instance of EDA happening in these systems.

The most important thing for my use case is that packages have the absolute minimum of dependencies because every new dependency brings with it a lot of extra work to integrate with the build system that deploys binaries to a production server.

@Evizero
Copy link

Evizero commented Oct 21, 2015

@tbreloff I didn't consider that. That is indeed an issue

@johnmyleswhite does have a good point. I guess I am really biased towards science and thus experiments. This is more of a issue than I thought.

How about we go the route of partner-packages? In the sense of OnlineStats.jl has a partner-package OnlineStatsPlots.jl, and KSVM.jl has KSVMPlots.jl and so on?

@tbreloff
Copy link
Collaborator Author

Thanks for chiming in John. Do you consider this to be too heavyweight:

julia 0.3
Colors
Reexport
Compat

Colors also depends on ColorTypes and FixedPointNumbers, but overall the additional code isn't huge, and there shouldn't be any complexity to the build process, but I can understand not wanting it if you don't need it.

What do you think of the following paradigm:

julia> module A
           a() = "imported A"
       end
A

julia> module B
           a() = "A is missing"
           try
               import A
               a() = A.a()
           end

           c() = "C is missing"
           try
               import C
               c() = C.c()
           end
       end
B

julia> B.a()
"imported A"

julia> B.c()
"C is missing"

There are stub definitions for the optional dependencies which get overwritten if the import is successful. This should eliminate required dependencies for non-essential functionality, and I think it will still be ok with precompilation.

@tbreloff
Copy link
Collaborator Author

If I understand Requires, then it's doing something more complicated, registering the module with the julia core so that on a subsequent import it will run a code block. I also think Requires will force the embedded code into the __init__ block during precompilation, so that anything after @requires cannot be precompiled.

cc: @one-more-minute can you confirm whether this is true? Do you know if wrapping the import in a try block as above will cause problems?

@MikeInnes
Copy link

Requires.jl should work just fine with precompilation, and if it doesn't that's a bug. (I might need to tag a new release though). You're right that the code within the @require block itself won't be precompiled, but if all you're doing is extending function X with type Y to fall back to some other precompiled code, that's not a big deal – at runtime you're just evaluating a small expression, which is pretty fast.

I think the patches for 0.4 introduced a couple of small caveats – like working only for top-level package modules and having to call @init if you define a custom __init__ function in that module – but otherwise it should suit this use case, I think.

@tbreloff
Copy link
Collaborator Author

That's good to hear Mike, thanks. Do you know if the code will be
precompiled if wrapping in a try-block as I did above? For some cases, the
code block inside the require block could be very substantial.

On Wed, Oct 21, 2015 at 3:12 PM, Mike J Innes notifications@github.com
wrote:

Requires.jl should work just fine with precompilation, and if it doesn't
that's a bug. (I might need to tag a new release though). You're right that
the code within the @require block itself won't be precompiled, but if
all you're doing is extending function X with type Y to fall back to some
other precompiled code, that's not a big deal – you're only evaluating a
small expression at runtime, which is pretty fast.

I think the patches for 0.4 introduced a couple of small caveats – like
working only for top-level package modules and having to call @init if
you define a custom init function in that module – but otherwise it
should suit this use case, I think.


Reply to this email directly or view it on GitHub
#42 (comment)
.

@MikeInnes
Copy link

No problem – sure, your code example above should be completely precompiled. However, the performance difference might be less than you think – a significant amount of the time spent loading Julia code is in parsing, and @require expressions will be "pre-parsed" if not fully precompiled. (I haven't rigorously benchmarked that, though.)

Even if that's not the case, I'm betting you can do something like this:

plot_dt(dt) = Window(DecisionTrees.to_html(dt)) # or whatever

@require DecisionTrees plot(dt::DecisionTree) = plot_dt(dt)

Pretty much the only thing you can't do before the module is defined is overload functions to its types. But you can certainly define code that uses those types (so that the function doing the heavy lifting, plot_dt, is perfectly precompilable), then have a couple lines of @requires definitions which provide the neater API.

@joshday
Copy link
Owner

joshday commented Nov 3, 2015

Unless I hear strong objection, I'll be creating OnlineStatsPlots.jl to solve this.

@tbreloff
Copy link
Collaborator Author

tbreloff commented Nov 3, 2015

Eh. I think I like the idea of using Requires better. It would suck to have lots of separate plotting packages, each of which only apply to one package.

On Nov 3, 2015, at 1:12 PM, Josh Day notifications@github.com wrote:

Unless I hear strong objection, I'll be creating OnlineStatsPlots.jl to solve this.


Reply to this email directly or view it on GitHub.

@joshday
Copy link
Owner

joshday commented Nov 4, 2015

I didn't take a close enough look at Requires earlier. Yeah, I like the idea of using Requires too.

@Evizero
Copy link

Evizero commented Nov 5, 2015

I think both approaches have their appeal. I suggest you pick what feels right to you. Since it will be a few weeks before I have to make this decision myself, I will probably just follow your lead then.

@joshday
Copy link
Owner

joshday commented Nov 5, 2015

After trying it out, I really like the Requires approach.

@joshday joshday closed this as completed Nov 5, 2015
@joshday joshday mentioned this issue Nov 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants