-
Notifications
You must be signed in to change notification settings - Fork 459
Abstract actions over terms #300
Conversation
It’s unused.
This allows us to ensure that they’re handled soundly.
robrix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for review. Note that this PR mostly enables us to add support for precise ASTs to various features, rather than actually adding it itself.
| time "parse.cmark_parse" languageTag $ | ||
| let term = cmarkParser blobSource | ||
| in length term `seq` pure term | ||
| SomeParser parser -> SomeTerm <$> runParser blob parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing was using this.
| -> (Term syntax (FeatureVector, ann1) -> Term syntax (FeatureVector, ann2) -> Bool) | ||
| -> [Term syntax (FeatureVector, ann1)] | ||
| -> [Term syntax (FeatureVector, ann2)] | ||
| -> EditScript (Term syntax (FeatureVector, ann1)) (Term syntax (FeatureVector, ann2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use distinct type parameters for the annotations on either side of the diffing process to guarantee we don’t get them crossed.
| diffTermPair = these Diff.deleting Diff.inserting diffTerms | ||
| class DiffTerms term diff | diff -> term, term -> diff where | ||
| -- | Diff a 'These' of terms. | ||
| diffTermPair :: These (term ann1) (term ann2) -> diff ann1 ann2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstracting diffing over the term & diff types allows us to express the operation over à la carte & precise terms uniformly (when precise terms support diffing).
| SomeTerm :: ApplyAll typeclasses syntax => Term syntax ann -> SomeTerm typeclasses ann | ||
|
|
||
| withSomeTerm :: (forall syntax . ApplyAll typeclasses syntax => Term syntax ann -> a) -> SomeTerm typeclasses ann -> a | ||
| withSomeTerm with (SomeTerm term) = with term |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Semantic.Api.Terms was using this, so I moved it in there (and eventually deleted it).
| toMap as = Map.singleton (T.pack (blobPath b)) (toJSON <$> as) | ||
|
|
||
| termToC :: (Foldable f, Functor f) => Term f (Maybe Declaration) -> [TOCSummary] | ||
| termToC = fmap (recordSummary "unchanged") . termTableOfContentsBy declaration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused.
| import Source.Loc | ||
| import Tags.Taggable | ||
|
|
||
| import qualified Language.Python as Py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we’re able to extract the various rendering classes into a new package, they live in here, and thus we need to import the various supported languages (directly or indirectly) for their term types. For the à la carte terms, this is managed via the instances for Term; for precise terms, it uses instances for the newtypes we define for their terms in each Language.* module.
Once the interfaces are in a separate package, each semantic-* language package will depend on that and provide any instances in Language.* instead.
| showTerm = serialize Show . quieterm | ||
|
|
||
| instance ShowTerm Py.Term where | ||
| showTerm = serialize Show . Py.getTerm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t currently use this instance due to an ongoing fight with compile times, but this is the basic reason for any of this change: as before, we want to abstract over the different types of term for each language, but with the inclusion of precise ASTs, the term types are less uniform: we can no longer expect to have Term as the top level type constructor, indexed by some unique syntax functor.
We still have the property that the terms are parameterized by the annotation type, however, so all of these classes are defined at * -> *. (And once we’re 100% on Unmarshal-based syntax, we won’t need to specialize them all to Loc annotations, but rather to any UnmarshalAnn instance; thus, stuff like quieterm will be unnecessary since we’ll just unmarshal the annotations as () instead.)
| sexprTerm :: (Carrier sig m, Member (Reader Config) sig) => term Loc -> m Builder | ||
|
|
||
| instance (ConstructorName syntax, Foldable syntax, Functor syntax) => SExprTerm (Term syntax) where | ||
| sexprTerm = serialize (SExpression ByConstructorName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an implementation of s-expression serialization for precise ASTs as of #171, but I haven’t done the leg-work to hook it up yet.
| -> Term syntax Loc | ||
| -> [Tag] | ||
| runTagging blob symbolsToSummarize | ||
| runTagging lang source symbolsToSummarize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t want to have to pass the entire Blob around in ALaCarteTerm, so I just pulled out the bits we needed.
| , Member (Error SomeException) sig | ||
| , Member Parse sig | ||
| ) | ||
| => (forall term . TermActions term => term Loc -> m a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function presents something of a problem, and I’m not happy with any of the solutions I’ve landed on thus far.
The problem: given m languages, and n features, fill in the matrix defined by their cartesian product incrementally; or, how do I add support for a language one feature at a time?
Diffing, parsing, analysis, and now symbols all have their own variations on this function, used to allow us to e.g. support parse --symbols for precise Python terms now, before we’ve added support for parse --dot, diff, or anything else. This is to some degree necessary: we don’t support analysis for Markdown or JSON, for example, because they have no computational content. Even so, it’s not without flaws:
-
We have several different functions mapping languages to parsers, which increases our surface area for bugs; there’s no one canonical mapping from languages to parsers. If we someday decide JavaScript should be parsed as TypeScript instead of TSX, we’ll have to be very careful to locate and update every place where we’re doing something
doParse-esque. -
Each such function has to be typechecked independently, and that’s not quick—checking
TermActionsfor all of these languages means solving all of its constraints for eachTerminstance, and therefore solving all of their constraints, transitively, for every element of each syntax sum. -
Repeating that effort for each superclass of
TermActionsseparately would require multipledoParseanalogues, which we’d like to avoid due to the aforementioned duplication & margin for error. I have anecdotal evidence suggesting that it’s also slower than grouping them all up underTermActionsthus. -
While we can easily parameterize
doParseby the constraint we want to satisfy usingAllowAmbiguousTypes,ConstraintKinds, &TypeApplications, doing so moves the burden of solving the constraints to each call-site, slowing down compilation significantly. It also requires us to explicitly mention all of the term types exhaustively in its context, which is tiresome to enumerate. -
Grouping multiple constraints into
TermActionsfor improved compile times means that we can’t add support for each member constraint independently of the others; thus, we can’t add--showsupport for Python terms until we’ve also implemented--dot,--json, etc. -
To the best of my knowledge, there’s no way for us to turn the static satisfiability of a given constraint into a runtime
Maybe(i.e. to returnJustfor any languages supporting the feature,Nothingfor the others, using the same function for each different feature).
In an effort to rectify this situation, I have experimented with an approach to allow us to list the parsers for ShowTerm separately of the parsers for JSONTreeTerm, gaining the compile-time benefits of statically listing all of the supported languages for a given constraint, and the flexibility of tailoring these lists to each feature, without quite so much duplication or compilation time. It’s still pretty duplicative of the map between languages and parsers, tho, and while it doesn’t seem to kill compile times straight away, more experience is needed to know whether it’s viable. I will be working on this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a wonderful and exhaustive analysis! Many thanks for it, and many thanks for caring about compile times. I don’t have any ideas on what the best way to do this is yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, I’ve made a ton of headway here 👍
patrickt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. Really excited to see the precise AST stuff gearing up.
| jsonDiff :: (DiffEffects sig m) => RenderJSON m syntax -> BlobPair -> m (Rendering.JSON.JSON "diffs" SomeJSON) | ||
| jsonDiff f blobPair = doDiff blobPair (const id) f `catchError` jsonError blobPair | ||
| jsonDiff :: DiffEffects sig m => BlobPair -> m (Rendering.JSON.JSON "diffs" SomeJSON) | ||
| jsonDiff blobPair = doDiff (const id) (pure . jsonTreeDiff blobPair) blobPair `catchError` jsonError blobPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion: doDiff (const id) is common enough that a name for it wouldn’t hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m working on this in a follow-up PR, I’ll see what I can do there.
| = let graph = renderTreeGraph diff | ||
| toEdge (Edge (a, b)) = DiffTreeEdge (diffVertexId a) (diffVertexId b) | ||
| in DiffTreeFileGraph path lang (V.fromList (vertexList graph)) (V.fromList (fmap toEdge (edgeList graph))) mempty where | ||
| path = T.pack $ pathForBlobPair blobPair | ||
| lang = bridging # languageForBlobPair blobPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the simultaneous use of a let and a where aesthetically off-putting; can you considerate it into one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copy-pasta, I’d rather not change it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Tho I agree.)
| class ShowDiff diff where | ||
| showDiff :: (Carrier sig m, Member (Reader Config) sig) => diff Loc Loc -> m Builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a different name than ShowDiff? EncodeDiff? When I think Show I think of something returning a string, not a bytestring-builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intended to use Show, so I don’t want to name it anything that obscures that fact.
| @@ -1,4 +1,4 @@ | |||
| {-# LANGUAGE GADTs, TypeOperators, DerivingStrategies #-} | |||
| {-# LANGUAGE MonoLocalBinds, RankNTypes #-} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why MonoLocalBinds here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-instance typeclasses give you warnings without it.
| PHP -> SomeTerm <$> parse phpParser blob | ||
|
|
||
| class ShowTerm term where | ||
| showTerm :: (Carrier sig m, Member (Reader Config) sig) => term Loc -> m Builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought on Show here? SerializeTerm might be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it’s specifically supposed to use Show.
| , Member (Error SomeException) sig | ||
| , Member Parse sig | ||
| ) | ||
| => (forall term . TermActions term => term Loc -> m a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a wonderful and exhaustive analysis! Many thanks for it, and many thanks for caring about compile times. I don’t have any ideas on what the best way to do this is yet.
This PR abstracts actions on terms and diffs over the term/diff types instead of the syntax type.