Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Conversation

@robrix
Copy link
Contributor

@robrix robrix commented Oct 2, 2019

This PR abstracts actions on terms and diffs over the term/diff types instead of the syntax type.

robrix added 30 commits October 1, 2019 11:31
It’s unused.
Copy link
Contributor Author

@robrix robrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review. Note that this PR mostly enables us to add support for precise ASTs to various features, rather than actually adding it itself.

time "parse.cmark_parse" languageTag $
let term = cmarkParser blobSource
in length term `seq` pure term
SomeParser parser -> SomeTerm <$> runParser blob parser
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing was using this.

-> (Term syntax (FeatureVector, ann1) -> Term syntax (FeatureVector, ann2) -> Bool)
-> [Term syntax (FeatureVector, ann1)]
-> [Term syntax (FeatureVector, ann2)]
-> EditScript (Term syntax (FeatureVector, ann1)) (Term syntax (FeatureVector, ann2))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use distinct type parameters for the annotations on either side of the diffing process to guarantee we don’t get them crossed.

diffTermPair = these Diff.deleting Diff.inserting diffTerms
class DiffTerms term diff | diff -> term, term -> diff where
-- | Diff a 'These' of terms.
diffTermPair :: These (term ann1) (term ann2) -> diff ann1 ann2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstracting diffing over the term & diff types allows us to express the operation over à la carte & precise terms uniformly (when precise terms support diffing).

SomeTerm :: ApplyAll typeclasses syntax => Term syntax ann -> SomeTerm typeclasses ann

withSomeTerm :: (forall syntax . ApplyAll typeclasses syntax => Term syntax ann -> a) -> SomeTerm typeclasses ann -> a
withSomeTerm with (SomeTerm term) = with term
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Semantic.Api.Terms was using this, so I moved it in there (and eventually deleted it).

toMap as = Map.singleton (T.pack (blobPath b)) (toJSON <$> as)

termToC :: (Foldable f, Functor f) => Term f (Maybe Declaration) -> [TOCSummary]
termToC = fmap (recordSummary "unchanged") . termTableOfContentsBy declaration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

import Source.Loc
import Tags.Taggable

import qualified Language.Python as Py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we’re able to extract the various rendering classes into a new package, they live in here, and thus we need to import the various supported languages (directly or indirectly) for their term types. For the à la carte terms, this is managed via the instances for Term; for precise terms, it uses instances for the newtypes we define for their terms in each Language.* module.

Once the interfaces are in a separate package, each semantic-* language package will depend on that and provide any instances in Language.* instead.

showTerm = serialize Show . quieterm

instance ShowTerm Py.Term where
showTerm = serialize Show . Py.getTerm
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t currently use this instance due to an ongoing fight with compile times, but this is the basic reason for any of this change: as before, we want to abstract over the different types of term for each language, but with the inclusion of precise ASTs, the term types are less uniform: we can no longer expect to have Term as the top level type constructor, indexed by some unique syntax functor.

We still have the property that the terms are parameterized by the annotation type, however, so all of these classes are defined at * -> *. (And once we’re 100% on Unmarshal-based syntax, we won’t need to specialize them all to Loc annotations, but rather to any UnmarshalAnn instance; thus, stuff like quieterm will be unnecessary since we’ll just unmarshal the annotations as () instead.)

sexprTerm :: (Carrier sig m, Member (Reader Config) sig) => term Loc -> m Builder

instance (ConstructorName syntax, Foldable syntax, Functor syntax) => SExprTerm (Term syntax) where
sexprTerm = serialize (SExpression ByConstructorName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an implementation of s-expression serialization for precise ASTs as of #171, but I haven’t done the leg-work to hook it up yet.

-> Term syntax Loc
-> [Tag]
runTagging blob symbolsToSummarize
runTagging lang source symbolsToSummarize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t want to have to pass the entire Blob around in ALaCarteTerm, so I just pulled out the bits we needed.

, Member (Error SomeException) sig
, Member Parse sig
)
=> (forall term . TermActions term => term Loc -> m a)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function presents something of a problem, and I’m not happy with any of the solutions I’ve landed on thus far.

The problem: given m languages, and n features, fill in the matrix defined by their cartesian product incrementally; or, how do I add support for a language one feature at a time?

Diffing, parsing, analysis, and now symbols all have their own variations on this function, used to allow us to e.g. support parse --symbols for precise Python terms now, before we’ve added support for parse --dot, diff, or anything else. This is to some degree necessary: we don’t support analysis for Markdown or JSON, for example, because they have no computational content. Even so, it’s not without flaws:

  • We have several different functions mapping languages to parsers, which increases our surface area for bugs; there’s no one canonical mapping from languages to parsers. If we someday decide JavaScript should be parsed as TypeScript instead of TSX, we’ll have to be very careful to locate and update every place where we’re doing something doParse-esque.

  • Each such function has to be typechecked independently, and that’s not quick—checking TermActions for all of these languages means solving all of its constraints for each Term instance, and therefore solving all of their constraints, transitively, for every element of each syntax sum.

  • Repeating that effort for each superclass of TermActions separately would require multiple doParse analogues, which we’d like to avoid due to the aforementioned duplication & margin for error. I have anecdotal evidence suggesting that it’s also slower than grouping them all up under TermActions thus.

  • While we can easily parameterize doParse by the constraint we want to satisfy using AllowAmbiguousTypes, ConstraintKinds, & TypeApplications, doing so moves the burden of solving the constraints to each call-site, slowing down compilation significantly. It also requires us to explicitly mention all of the term types exhaustively in its context, which is tiresome to enumerate.

  • Grouping multiple constraints into TermActions for improved compile times means that we can’t add support for each member constraint independently of the others; thus, we can’t add --show support for Python terms until we’ve also implemented --dot, --json, etc.

  • To the best of my knowledge, there’s no way for us to turn the static satisfiability of a given constraint into a runtime Maybe (i.e. to return Just for any languages supporting the feature, Nothing for the others, using the same function for each different feature).

In an effort to rectify this situation, I have experimented with an approach to allow us to list the parsers for ShowTerm separately of the parsers for JSONTreeTerm, gaining the compile-time benefits of statically listing all of the supported languages for a given constraint, and the flexibility of tailoring these lists to each feature, without quite so much duplication or compilation time. It’s still pretty duplicative of the map between languages and parsers, tho, and while it doesn’t seem to kill compile times straight away, more experience is needed to know whether it’s viable. I will be working on this in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wonderful and exhaustive analysis! Many thanks for it, and many thanks for caring about compile times. I don’t have any ideas on what the best way to do this is yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, I’ve made a ton of headway here 👍

@robrix robrix marked this pull request as ready for review October 2, 2019 13:43
@robrix robrix requested a review from a team October 2, 2019 14:16
@robrix robrix mentioned this pull request Oct 2, 2019
Copy link
Contributor

@patrickt patrickt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Really excited to see the precise AST stuff gearing up.

jsonDiff :: (DiffEffects sig m) => RenderJSON m syntax -> BlobPair -> m (Rendering.JSON.JSON "diffs" SomeJSON)
jsonDiff f blobPair = doDiff blobPair (const id) f `catchError` jsonError blobPair
jsonDiff :: DiffEffects sig m => BlobPair -> m (Rendering.JSON.JSON "diffs" SomeJSON)
jsonDiff blobPair = doDiff (const id) (pure . jsonTreeDiff blobPair) blobPair `catchError` jsonError blobPair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion: doDiff (const id) is common enough that a name for it wouldn’t hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m working on this in a follow-up PR, I’ll see what I can do there.

Comment on lines +99 to +103
= let graph = renderTreeGraph diff
toEdge (Edge (a, b)) = DiffTreeEdge (diffVertexId a) (diffVertexId b)
in DiffTreeFileGraph path lang (V.fromList (vertexList graph)) (V.fromList (fmap toEdge (edgeList graph))) mempty where
path = T.pack $ pathForBlobPair blobPair
lang = bridging # languageForBlobPair blobPair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the simultaneous use of a let and a where aesthetically off-putting; can you considerate it into one or the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy-pasta, I’d rather not change it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Tho I agree.)

Comment on lines +120 to +121
class ShowDiff diff where
showDiff :: (Carrier sig m, Member (Reader Config) sig) => diff Loc Loc -> m Builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a different name than ShowDiff? EncodeDiff? When I think Show I think of something returning a string, not a bytestring-builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended to use Show, so I don’t want to name it anything that obscures that fact.

@@ -1,4 +1,4 @@
{-# LANGUAGE GADTs, TypeOperators, DerivingStrategies #-}
{-# LANGUAGE MonoLocalBinds, RankNTypes #-}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why MonoLocalBinds here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-instance typeclasses give you warnings without it.

PHP -> SomeTerm <$> parse phpParser blob

class ShowTerm term where
showTerm :: (Carrier sig m, Member (Reader Config) sig) => term Loc -> m Builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought on Show here? SerializeTerm might be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it’s specifically supposed to use Show.

, Member (Error SomeException) sig
, Member Parse sig
)
=> (forall term . TermActions term => term Loc -> m a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wonderful and exhaustive analysis! Many thanks for it, and many thanks for caring about compile times. I don’t have any ideas on what the best way to do this is yet.

@robrix robrix merged commit 394936e into master Oct 2, 2019
@robrix robrix deleted the abstract-actions-over-terms branch October 2, 2019 17:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants