Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out happy-codegen-common #221

Merged
merged 3 commits into from
Jan 20, 2022

Conversation

Ericson2314
Copy link
Collaborator

@Ericson2314 Ericson2314 commented Jan 14, 2022

There was some extra information stuff in Grammar which had nothing to
do with the grammar, but was simply there because Grammar was also
playing the role of capturing all information from the abstract syntax.

I don't think that's good. As we really try to really make libraries out
of this stuff we should be stricter and stricter about separating
concerns. Grammar should really just be that, the grammar, and the code
in happy-tabular should not be privy to information that is just for
the backend.

Splitting out a code generation CommonOptions type is a first step to rectifying this. I
hope we can do a few more refactors like this to really make each data
type shine on its own.


I don't want to sound to harsh though, since we purposely made the split
low impact with more cleanups -- such as this -- left for later. It is
of course easier to see what's good and what's bad once the code is
split up!

Also, this change calls into question my previous BookendedAbsSyn.
We're now acknowledging the abstract syntax mixes "middle" and back end
concerns, and those are not properly separated until the nest step.
Given that, there isn't much use of making BookendedAbsSyn when we
could just stick the header and footer in CommonOptions.

@int-index
Copy link
Collaborator

I’m fine with splitting out a data type, but I think that making it a separate package is overkill. Nothing stops us from making a separate package for every data type and every function, and then we could be really explicit about what depends on what, but there’s a cost to this: uploading more packages to Hackage, maintaining correct version bounds, cluttering the dependency graph for end-users, etc.

I actually think the separation between happy-grammar and happy-tabular is also gratuitous. We don’t have use cases where one would be used without the other.

There’s a trade-off between fine-grained and coarse-grained dependencies, and I’d like our decisions to be informed by (at least imaginary) use cases:

  • happy-cli can be a separate package because I can imagine using happy as a library (e.g. GHC could add built-in support for .y files)
  • happy-frontend can be a separate package because with a TH-based approach we wouldn’t be parsing AbsSyn
  • happy-backend-lalr and happy-backend-glr can be separate packages because in any given application, only one backend is likely to be used
  • happy-tabular must be separated out to provide a common interface for happy-frontend and happy-backend-* to communicate

happy-grammar can be easily merged into happy-tabular. Yes, that would mean that happy-frontend depends on more definitions than the code in it requires, but: would anyone ever use happy-grammar without happy-tabular? I don’t think so.

@int-index
Copy link
Collaborator

int-index commented Jan 14, 2022

To elaborate a bit more on this, packages are a means of code distribution, and should be motivated by distribution needs.

As long as we ship happy as a single binary, it’s fine to have everything in a single package. I want to ship a TH-based version, so I’d like to factor out anything related with .y-files and CLI into their own packages, so that TH users wouldn’t depend on those components.

The backends go into their own packages because we could easily ship happy-lalr, happy-glr, and happy-rad as separate binaries, and I think most users would just pick and use one of those.

Now, what this patch is doing is related to separation of concerns rather than distribution. And I’m more than happy to separate concerns: dedicated functions, data types, and modules, are all very good (that is also the stated motivation: “make each data type shine on its own”). But there’s no need for separate .cabal packages.

@Ericson2314
Copy link
Collaborator Author

Ericson2314 commented Jan 14, 2022

@int-index I agree we don't want to go overkill on the packages, and indeed it's already unwieldy. But, I do think there is some utility in ripping things into too many pieces on purpose, just so you have a clean slate and more flexibility to decide to decide how to to put them back together.

I agree with your breakdown of concerns and survey of potential distributions. I agree too it is likely happy-grammar and happy-tabular should be recombined, But I also don't think that while this type certainly doesn't deserve it's own package, it doesn't belong in any of the others existing currently either.

As far as concerns go, this stuff is really backend concerns that need an interface so we show-horn them in the frontend. The core middle parts of the compiler (which I think are destined to become the most general-purpose librar(y|ies)) don't care about them at all.

As far as distributions go, I think use-case: along these lines are useful and plausible:

  1. Work with plain grammars, not ornamented with fused elimination rules in the YACC tradition.
  2. Something that just says diagnostics about your grammar, like --info and doesn't actually compiler it into anything. Hell, we could have a full-on language server!
  3. Interpret a grammar. Dispense with the phase restrictions and just interpret a (possibly ornamented) grammar right into a Haskell function. Slow, but might have advantages. Also can do some ridiculously context-dependent shenanagins.

All of these would avoid the frontend, avoid the CLI, avoid the backends, and avoid this Directives type. In other words they would just use happy-gramar and happy tabular.

I think a likely happy ending for happy-directives is for it to evolve into some sort of happy-backend-common. (And I am also thinking s/backend/codegen because codegen is intrinsic to the interface whereas backend is merely how it relates to other parts of existing use-cases.) This would contain Directives (which should be something like CommonCodegenOptions), and also your new syntax-building combinators. This makes it a meatier library more worthy of existing!

How does that sound?

@int-index
Copy link
Collaborator

How does that sound?

Yes, sounds reasonable overall. But if you put the directives in happy-codegen-common, then happy-frontend will depend on it (since it needs to produce a grammar+directives that it parser from .y), and that would be quite strange.

@Ericson2314
Copy link
Collaborator Author

I think that's not so bad, because the current frontend is specific to the use-cases that do codegen.

@Ericson2314 Ericson2314 force-pushed the remove-off-topic-from-grammar-type branch from edd8d35 to 71edaaf Compare January 16, 2022 05:19
There was some extra information stuff in `Grammar` which had nothing to
do with the grammar, but was simply there because `Grammar` was also
playing the role of capturing all information from the abstract syntax.

I don't think that's good. As we really try to really make libraries out
of this stuff we should be stricter and stricter about separating
concerns. Grammar should really just be that, the grammar, and the code
in `happy-tabular` should not be privy to information that is just for
the backend.

Splitting out a code generation `CommonOptions` type is a first step to
rectifying this. I hope we can do a few more refactors like this to
really make each data type shine on its own.

---

I don't want to sound to harsh though, since we purposely made the split
low impact with more cleanups -- such as this -- left for later. It is
of course easier to see what's good and what's bad once the code is
split up!

Also, this change calls into question my previous `BookendedAbsSyn`.
We're now acknowledging the abstract syntax mixes "middle" and back end
concerns, and those are not properly separated until the nest step.
Given that, there isn't much use of making `BookendedAbsSyn` when we
could just stick the header and footer in `CommonOptions`.
This temporarily undoes the patch
@Ericson2314 Ericson2314 force-pushed the remove-off-topic-from-grammar-type branch from 71edaaf to b6910c3 Compare January 16, 2022 06:34
@Ericson2314 Ericson2314 changed the title WIP: Split out happy-directives Split out happy-directives Jan 16, 2022
> token_type :: String,
> imported_identity :: Bool,
> monad :: (Bool,String,String,String,String),
> expect :: Maybe Int,
> attributes :: [(String,String)],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t attribute and attributetypes also move to CommonOptions? They’re not used by happy-tabular

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they should be moved out, but I figured I would deal with all attribute things in the next PR.

Also I vaguely recall only one backend supported attribute grammars?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think GLR doesn’t.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Yeah so the attributes stuff should be moved out to a different data type, and with that change we should get some better errors if one tries to do attribute stuff with the GLR backend.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know if GLR and attributes are fundamentally incompatible or if it’s a limitation of the current backend. In the latter case, we should probably still pass the attribute stuff to the backend and throw “NotImplemented” or something of this sort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy’s documentation is unhelpfully vague on this point:

Currently, attribute grammars cannot be generated for GLR parsers (It's not exactly clear how these features should interact...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think if we had a separate datatype, the lack of attribute grammars during bootstrapping could perhaps become a little cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@int-index between boostrapping and the uncertainty around GLR, are you fine leaving it as-is for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fine with me. I was concerned that happy-directives is too fine-grained, but happy-codegen-common is reasonable.

@Ericson2314 Ericson2314 changed the title Split out happy-directives Split out happy-codegen-common Jan 20, 2022
@Ericson2314 Ericson2314 merged commit 9347634 into master Jan 20, 2022
@Ericson2314 Ericson2314 deleted the remove-off-topic-from-grammar-type branch January 20, 2022 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants