Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graph command for graphing module dependencies #3635

Closed
wants to merge 1 commit into from

Conversation

jmackie
Copy link
Contributor

@jmackie jmackie commented May 12, 2019

This PR proposes adding a graph command to the compiler, which takes module files as input (similar to purs compile) and returns a JSON description of the module dependency graph. Essentially this just surfaces the output of Language.PureScript.ModuleDependencies.sortModules with some serialization logic.

I think this will be useful for tooling as it let's us be smarter about what actually needs to be compiled. For example, if I'm writing a bundling tool that takes an entrypoint module I can avoid passing the world to purs compile (e.g. "src/**/*.purs" ".psc-package/*/*/*/src/**/*.purs") by first invoking this command.

More concretely, I'd like to improve PureScript support in webpack and parcel, and having this available would really help.

It's very much a first draft, and there are probably a bunch of error cases I need to consider, but comments/feedback would be much appreciated :)

@hdgarrood
Copy link
Contributor

I'd actually rather keep this logic internal to the compiler if we possibly can. The idea is that you do pass everything to purs compile every time, and it's the compiler's responsibility to only do the work which actually needs doing. Is the reason that you'd rather avoid this that it can be slow? Because if so there are other avenues which I'd rather take to improve this state of affairs. One example is the CST work which should greatly improve startup time for projects with lots of modules (which has now been merged). Another example is #2477, which would allow us to avoid doing extra work to parse and sort modules from dependencies every time.

@jmackie
Copy link
Contributor Author

jmackie commented May 12, 2019

Sure, but purs compile has no notion of an entrypoint right? So it's always gonna compile everything I pass to it at least once, which isn't great for large projects with lots of dependencies. And in the jenga-tower of frontend tooling I find myself having to rm -rf output more often then I'd like...

As another example: if I want to tell webpack/parcel what files need to be watched, I currently have to default to "src/**/*.purs" ".psc-package/*/*/*/src/**/*.purs", which can be super resource hungry (I've hit linux open file limits because of this at work).

If you're building tooling on top of purs this is a pretty reasonable requirement though, no? Currently we can only throw a bunch of globs at the compiler, but it would be nice to have some insight into the build plan.

@hdgarrood
Copy link
Contributor

So it's always gonna compile everything I pass to it at least once, which isn't great for large projects with lots of dependencies [...] I currently have to default to "src/**/*.purs" ".psc-package/*/*/*/src/**/*.purs"

Yes; it should be possible to address both of these issues with something like #2477, so that any given dependency only needs to be built once (across any number of projects), and so that once a dependency has been built, we know it isn't going to change, so we don't need to watch those files.

And in the jenga-tower of frontend tooling I find myself having to rm -rf output more often then I'd like...

I agree. Whenever this is a result of the compiler misbehaving that's something I'd like to fix too. See #3145, for instance.

If you're building tooling on top of purs this is a pretty reasonable requirement though, no? Currently we can only throw a bunch of globs at the compiler, but it would be nice to have some insight into the build plan.

Lots of compilers delegate the responsibility of constructing build plans to third party tools. This can be a big pain, so we've opted for a different approach in PureScript: that build plans should remain internal to the compiler. I think this approach has worked really well so far, and I still believe that the issues you're coming up against can be addressed while sticking to this approach. (I'm open to being persuaded otherwise but I'm not yet convinced.)

@jmackie
Copy link
Contributor Author

jmackie commented May 12, 2019 via email

@jmackie jmackie closed this May 12, 2019
@hdgarrood
Copy link
Contributor

Welcome! You might want to check in with @joneshf as he has been looking into using Shake to implement this.

@jmackie
Copy link
Contributor Author

jmackie commented May 12, 2019 via email

@joneshf
Copy link
Member

joneshf commented May 12, 2019

I think this is a great thing to have! What would have to change to get this PR accepted?

@joneshf
Copy link
Member

joneshf commented May 12, 2019

Also, I'll throw a proposal for shake into #3145 so we can re-invigorate that conversation.

@hdgarrood
Copy link
Contributor

hdgarrood commented May 12, 2019

What would have to change to get this PR accepted?

For me to get on board I’d want to hear a more persuasive case for it. As I say I think build plans should continue to be the responsibility of the compiler. I think having the compiler delegate the responsibility of calculating what needs rebuilding to external tools has the potential to greatly increase the complexity of build processes; if a number of people create tools which use that information in slightly different ways then we could very easily find ourselves grappling with lots of bugs which are difficult to reproduce and where the location of the problem is difficult to identify (eg “is this a bug in the compiler or in external tooling?”). I don’t think we need to concede this ground to be able to make the compiler smarter and faster in the sense of not doing more work than it needs to in incremental builds.

@natefaubion
Copy link
Contributor

I don't think the module graph is an internal implementation detail. It's a static artifact of the modules you pass to it, which anyone can derive unambiguously by parsing them. Sure, if it contained information like what needs to be rebuilt, that would be different, but the hierarchy itself is just language semantics. I don't think there's anything wrong exposing it, but I would like a concrete use case where a tool really needs it before we commit to supporting it.

@jmackie
Copy link
Contributor Author

jmackie commented May 13, 2019

My initial motivation was adding PureScript support to parcel. I could go with the usual glob paths and tell parcel that my Entrypoint.purs depends on "src/**/*.purs" ".psc-package/*/*/*/src/**/*.purs" (or bower_components or whatever) and have it watch all that stuff and purs build all that stuff, or try and be a bit smarter with the help of this command.

@joneshf
Copy link
Member

joneshf commented May 21, 2019

For more motivation, any sort of large scale analysis would re-implement this graph algorithm. In a large codebase, it's very common and tedious to do this analysis by hand.

Some examples of what you can do with dependency graphs in the JS ecosystem:

Other ecosystems have similar tooling.

Are these examples enough to warrant accepting this PR?

@hdgarrood
Copy link
Contributor

Ok, yeah, that's a good point. I was on the fence about the idea of compiling with an entry point; I think I'd prefer to address the underlying issue there, which if I understand correctly is that we want builds to be faster, by coming up with a way of only needing any given dependency at a given version to be built once and then stored somewhere where it's accessible across different projects. Being able to make tooling like the examples you gave would indeed be nice.

@natefaubion
Copy link
Contributor

I personally would support reopening it. I would probably like a more general analysis of modules though, but it's not necessary for this PR. I would like to see:

  • All exports (and where they might be reexported from)
  • All imports (and if there are open imports, which are the ones that are actually used)
  • Foreign imports

@f-f
Copy link
Member

f-f commented May 27, 2019

As @hdgarrood noted in purescript/spago#165 (comment), that issue would be another use case for having this command (and would avoid having a dependency on the compiler or part of it)

@natefaubion
Copy link
Contributor

This is still marked as a draft. We haven't looked at this in a while. Is there anything else to do with this?

@jmackie
Copy link
Contributor Author

jmackie commented Jan 6, 2020

If there's agreement that this is something we want then I'll polish it up into a proper PR?

@natefaubion
Copy link
Contributor

Yes, I think this would be a useful command to have.

@f-f
Copy link
Member

f-f commented Jan 6, 2020

I would really really like to see this command in.

The reason is that having this command in the compiler would allow us to remove this bunch of code to parse module headers we have in Spago - because you know, it's not good for several reasons: different versions of the compiler parse different things, we should not parse files again if the compiler is going to do it, etc.

So having the module graph info exposed by the compiler would allow this and other usecases to have a clean implementation in downstream tools.

@f-f
Copy link
Member

f-f commented Jan 18, 2020

@jmackie is there anything I could help with to move this forward?

@jmackie
Copy link
Contributor Author

jmackie commented Jan 21, 2020

@f-f lack of time is the blocker 🙈

The code is pretty straightforward but I haven't built it since the original commit. I also haven't looked at this codebase for a while. But if it still builds maybe it's good to go? I'll try and take a look during lunch today.

@f-f
Copy link
Member

f-f commented Jan 27, 2020

@natefaubion I'm available to take this PR to a mergeable state, if anything needs to change. Could I ask you to have a review at it and see if we need changes?

@kritzcreek
Copy link
Member

@f-f I'm also up for reviewing it. I took a cursory glance and it seems the main thing missing is a few tests. I'll probably also have a couple of nits on code style, but I can fix those myself unless you'd like to.

@f-f
Copy link
Member

f-f commented Jan 30, 2020

@kritzcreek I opened #3781 that supersedes this one

@jmackie
Copy link
Contributor Author

jmackie commented Jan 30, 2020

Thanks for taking this on @f-f 🙏 sorry I couldn't be more helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants