Abstract actions over terms #300

robrix · 2019-10-02T03:48:16Z

This PR abstracts actions on terms and diffs over the term/diff types instead of the syntax type.

It’s unused.

This allows us to ensure that they’re handled soundly.

…to it.

robrix

Ready for review. Note that this PR mostly enables us to add support for precise ASTs to various features, rather than actually adding it itself.

robrix · 2019-10-02T03:48:39Z

src/Control/Carrier/Parse/Measured.hs

    time "parse.cmark_parse" languageTag $
      let term = cmarkParser blobSource
      in length term `seq` pure term
-  SomeParser parser -> SomeTerm <$> runParser blob parser


Nothing was using this.

robrix · 2019-10-02T03:49:33Z

src/Diffing/Algorithm/RWS.hs

+    -> (Term syntax (FeatureVector, ann1) -> Term syntax (FeatureVector, ann2) -> Bool)
+    -> [Term syntax (FeatureVector, ann1)]
+    -> [Term syntax (FeatureVector, ann2)]
+    -> EditScript (Term syntax (FeatureVector, ann1)) (Term syntax (FeatureVector, ann2))


We use distinct type parameters for the annotations on either side of the diffing process to guarantee we don’t get them crossed.

robrix · 2019-10-02T03:50:24Z

src/Diffing/Interpreter.hs

-diffTermPair = these Diff.deleting Diff.inserting diffTerms
+class DiffTerms term diff | diff -> term, term -> diff where
+  -- | Diff a 'These' of terms.
+  diffTermPair :: These (term ann1) (term ann2) -> diff ann1 ann2


Abstracting diffing over the term & diff types allows us to express the operation over à la carte & precise terms uniformly (when precise terms support diffing).

robrix · 2019-10-02T03:50:55Z

src/Parsing/Parser.hs

-  SomeTerm :: ApplyAll typeclasses syntax => Term syntax ann -> SomeTerm typeclasses ann
-
-withSomeTerm :: (forall syntax . ApplyAll typeclasses syntax => Term syntax ann -> a) -> SomeTerm typeclasses ann -> a
-withSomeTerm with (SomeTerm term) = with term


Only Semantic.Api.Terms was using this, so I moved it in there (and eventually deleted it).

robrix · 2019-10-02T03:51:02Z

src/Rendering/TOC.hs

-    toMap as = Map.singleton (T.pack (blobPath b)) (toJSON <$> as)
-
-    termToC :: (Foldable f, Functor f) => Term f (Maybe Declaration) -> [TOCSummary]
-    termToC = fmap (recordSummary "unchanged") . termTableOfContentsBy declaration


robrix · 2019-10-02T12:51:41Z

src/Semantic/Api/Terms.hs

 import           Source.Loc
-import           Tags.Taggable
+
+import qualified Language.Python as Py


Until we’re able to extract the various rendering classes into a new package, they live in here, and thus we need to import the various supported languages (directly or indirectly) for their term types. For the à la carte terms, this is managed via the instances for Term; for precise terms, it uses instances for the newtypes we define for their terms in each Language.* module.

Once the interfaces are in a separate package, each semantic-* language package will depend on that and provide any instances in Language.* instead.

robrix · 2019-10-02T12:55:34Z

src/Semantic/Api/Terms.hs

+  showTerm = serialize Show . quieterm
+
+instance ShowTerm Py.Term where
+  showTerm = serialize Show . Py.getTerm


We don’t currently use this instance due to an ongoing fight with compile times, but this is the basic reason for any of this change: as before, we want to abstract over the different types of term for each language, but with the inclusion of precise ASTs, the term types are less uniform: we can no longer expect to have Term as the top level type constructor, indexed by some unique syntax functor.

We still have the property that the terms are parameterized by the annotation type, however, so all of these classes are defined at * -> *. (And once we’re 100% on Unmarshal-based syntax, we won’t need to specialize them all to Loc annotations, but rather to any UnmarshalAnn instance; thus, stuff like quieterm will be unnecessary since we’ll just unmarshal the annotations as () instead.)

robrix · 2019-10-02T12:56:52Z

src/Semantic/Api/Terms.hs

+  sexprTerm :: (Carrier sig m, Member (Reader Config) sig) => term Loc -> m Builder
+
+instance (ConstructorName syntax, Foldable syntax, Functor syntax) => SExprTerm (Term syntax) where
+  sexprTerm = serialize (SExpression ByConstructorName)


We have an implementation of s-expression serialization for precise ASTs as of #171, but I haven’t done the leg-work to hook it up yet.

robrix · 2019-10-02T12:57:45Z

src/Tags/Tagging.hs

           -> Term syntax Loc
           -> [Tag]
-runTagging blob symbolsToSummarize
+runTagging lang source symbolsToSummarize


I didn’t want to have to pass the entire Blob around in ALaCarteTerm, so I just pulled out the bits we needed.

robrix · 2019-10-02T13:42:54Z

src/Semantic/Api/Terms.hs

+     , Member (Error SomeException) sig
+     , Member Parse sig
+     )
+  => (forall term . TermActions term => term Loc -> m a)


This function presents something of a problem, and I’m not happy with any of the solutions I’ve landed on thus far.

The problem: given m languages, and n features, fill in the matrix defined by their cartesian product incrementally; or, how do I add support for a language one feature at a time?

Diffing, parsing, analysis, and now symbols all have their own variations on this function, used to allow us to e.g. support parse --symbols for precise Python terms now, before we’ve added support for parse --dot, diff, or anything else. This is to some degree necessary: we don’t support analysis for Markdown or JSON, for example, because they have no computational content. Even so, it’s not without flaws:

We have several different functions mapping languages to parsers, which increases our surface area for bugs; there’s no one canonical mapping from languages to parsers. If we someday decide JavaScript should be parsed as TypeScript instead of TSX, we’ll have to be very careful to locate and update every place where we’re doing something doParse-esque.

Each such function has to be typechecked independently, and that’s not quick—checking TermActions for all of these languages means solving all of its constraints for each Term instance, and therefore solving all of their constraints, transitively, for every element of each syntax sum.

Repeating that effort for each superclass of TermActions separately would require multiple doParse analogues, which we’d like to avoid due to the aforementioned duplication & margin for error. I have anecdotal evidence suggesting that it’s also slower than grouping them all up under TermActions thus.

While we can easily parameterize doParse by the constraint we want to satisfy using AllowAmbiguousTypes, ConstraintKinds, & TypeApplications, doing so moves the burden of solving the constraints to each call-site, slowing down compilation significantly. It also requires us to explicitly mention all of the term types exhaustively in its context, which is tiresome to enumerate.

Grouping multiple constraints into TermActions for improved compile times means that we can’t add support for each member constraint independently of the others; thus, we can’t add --show support for Python terms until we’ve also implemented --dot, --json, etc.

To the best of my knowledge, there’s no way for us to turn the static satisfiability of a given constraint into a runtime Maybe (i.e. to return Just for any languages supporting the feature, Nothing for the others, using the same function for each different feature).

In an effort to rectify this situation, I have experimented with an approach to allow us to list the parsers for ShowTerm separately of the parsers for JSONTreeTerm, gaining the compile-time benefits of statically listing all of the supported languages for a given constraint, and the flexibility of tailoring these lists to each feature, without quite so much duplication or compilation time. It’s still pretty duplicative of the map between languages and parsers, tho, and while it doesn’t seem to kill compile times straight away, more experience is needed to know whether it’s viable. I will be working on this in a follow-up PR.

This is a wonderful and exhaustive analysis! Many thanks for it, and many thanks for caring about compile times. I don’t have any ideas on what the best way to do this is yet.

Fortunately, I’ve made a ton of headway here 👍

patrickt

This is awesome. Really excited to see the precise AST stuff gearing up.

patrickt · 2019-10-02T15:06:36Z

src/Semantic/Api/Diffs.hs

-jsonDiff :: (DiffEffects sig m) => RenderJSON m syntax -> BlobPair -> m (Rendering.JSON.JSON "diffs" SomeJSON)
-jsonDiff f blobPair = doDiff blobPair (const id) f `catchError` jsonError blobPair
+jsonDiff :: DiffEffects sig m => BlobPair -> m (Rendering.JSON.JSON "diffs" SomeJSON)
+jsonDiff blobPair = doDiff (const id) (pure . jsonTreeDiff blobPair) blobPair `catchError` jsonError blobPair


Small suggestion: doDiff (const id) is common enough that a name for it wouldn’t hurt.

I’m working on this in a follow-up PR, I’ll see what I can do there.

patrickt · 2019-10-02T15:08:11Z

src/Semantic/Api/Diffs.hs

+    = let graph = renderTreeGraph diff
+          toEdge (Edge (a, b)) = DiffTreeEdge (diffVertexId a) (diffVertexId b)
+      in DiffTreeFileGraph path lang (V.fromList (vertexList graph)) (V.fromList (fmap toEdge (edgeList graph))) mempty where
+        path = T.pack $ pathForBlobPair blobPair
+        lang = bridging # languageForBlobPair blobPair


I find the simultaneous use of a let and a where aesthetically off-putting; can you considerate it into one or the other?

This is copy-pasta, I’d rather not change it in this PR.

(Tho I agree.)

patrickt · 2019-10-02T15:08:55Z

src/Semantic/Api/Diffs.hs

+class ShowDiff diff where
+  showDiff :: (Carrier sig m, Member (Reader Config) sig) => diff Loc Loc -> m Builder


Maybe a different name than ShowDiff? EncodeDiff? When I think Show I think of something returning a string, not a bytestring-builder.

It is intended to use Show, so I don’t want to name it anything that obscures that fact.

patrickt · 2019-10-02T15:09:36Z

src/Semantic/Api/Symbols.hs

@@ -1,4 +1,4 @@
-{-# LANGUAGE GADTs, TypeOperators, DerivingStrategies #-}
+{-# LANGUAGE MonoLocalBinds, RankNTypes #-}


Out of curiosity, why MonoLocalBinds here?

Single-instance typeclasses give you warnings without it.

patrickt · 2019-10-02T15:10:19Z

src/Semantic/Api/Terms.hs

-  PHP        -> SomeTerm <$> parse phpParser blob
+
+class ShowTerm term where
+  showTerm :: (Carrier sig m, Member (Reader Config) sig) => term Loc -> m Builder


Same thought on Show here? SerializeTerm might be better?

Again, it’s specifically supposed to use Show.

patrickt · 2019-10-02T15:16:14Z

src/Semantic/Api/Terms.hs

+     , Member (Error SomeException) sig
+     , Member Parse sig
+     )
+  => (forall term . TermActions term => term Loc -> m a)


This is a wonderful and exhaustive analysis! Many thanks for it, and many thanks for caring about compile times. I don’t have any ideas on what the best way to do this is yet.

robrix added 30 commits October 1, 2019 11:31

🔥 SomeParser.

c0ecbb1

It’s unused.

Don’t pass the blob into tagging.

e9fc612

Don’t pass the blob into contextualizing.

de5de5f

Don’t pass the blob into runTagging.

937df79

Generalize renderPreciseToSymbols to any term with a ToTags instance.

1bc2511

Define a helper to provide a ToTags instance for à la carte terms.

09f95d6

Render à la carte terms to symbols via ToTags.

a43e947

Define renderToSymbols using renderPreciseToSymbols.

44b0614

Combine the code paths.

7259059

Parameterize ALaCarteTerm by the symbols to summarize.

f092a30

Use ToTags for the legacy tagging API as well.

f9c20bc

renderToSymbols is pure.

371e0c0

Generalize renderToSymbols for legacy tagging.

11091cf

Copy in our own version of doParse & SomeTerm.

785d523

doParse handles the PerLanguageModes.

152e94a

Don’t specialize parseSymbols for the PerLanguageModes.

0f8da48

🔥 ParseEffects.

598dce7

Factor out the term construction.

05e0086

🔥 the HasTextElement & Taggable obligations from TermConstraints.

5de3d14

Simplify the Traversable obligation down to just Foldable/Functor.

a97e42a

Move SomeTerm/withSomeTerm into Semantic.Api.Terms.

5c6df16

🔥 the Declarations1 obligation.

cca474b

🔥 redundant parens.

7aaaa64

Define TermConstraints as a class.

02f46c6

Reformat the signature for doParse.

13e3788

Export ToSExpression.

5e1b21c

Reformat the doParse signature.

7101733

Align all the blobs.

a060a15

Eliminate the term directly in doParse.

6f5e2ff

Align the blobs.

19ac4d2

robrix added 18 commits October 1, 2019 22:02

Render diffs to JSON graphs using an abstract interface.

8fc85e8

Spacing.

7aa36de

Use distinct type parameters for the annotations on either side.

377d824

This allows us to ensure that they’re handled soundly.

Diff via an abstract interface.

0c5342a

The diff & term types are mutually supporting.

68ea7e1

🔥 withSomeTermPair.

5fe4b1d

🔥 redundant parens.

e5685f9

Summarize diffs using an abstract interface.

f08ac25

🔥 renderToCTerm.

fea2338

Summarize diffs legacy-wise using an abstract interface.

08247c7

Diff using an abstract interface.

bc62f32

🔥 a redundant language pragma.

c847990

Merge branch 'master' into abstract-actions-over-terms

ab72710

🔥 redundant language extensions.

9b1c99f

🔥 more redundant language extensions.

8af4158

🔥 yet more redundant language extensions.

7647102

Redefine TermActions as a constraint synonym since we’re specialized …

1fdecb0

…to it.

Redefine DiffActions as a constraint synonym.

9f68d5b

robrix commented Oct 2, 2019

View reviewed changes

robrix marked this pull request as ready for review October 2, 2019 13:43

robrix added 3 commits October 2, 2019 09:59

🔥 a redundant import.

7312997

🔥 a redundant binding.

eb922e7

Fix the tests.

714aac1

robrix requested a review from a team October 2, 2019 14:16

robrix mentioned this pull request Oct 2, 2019

Switch to proto-lens #296

Merged

patrickt approved these changes Oct 2, 2019

View reviewed changes

Patrick Thomson added 2 commits October 2, 2019 11:20

Merge branch 'master' into abstract-actions-over-terms

7d93f39

Merge branch 'master' into abstract-actions-over-terms

ca2f8a6

robrix merged commit 394936e into master Oct 2, 2019

robrix deleted the abstract-actions-over-terms branch October 2, 2019 17:39

		class ShowDiff diff where
		showDiff :: (Carrier sig m, Member (Reader Config) sig) => diff Loc Loc -> m Builder

		@@ -1,4 +1,4 @@
		{-# LANGUAGE GADTs, TypeOperators, DerivingStrategies #-}
		{-# LANGUAGE MonoLocalBinds, RankNTypes #-}

Abstract actions over terms #300

Abstract actions over terms #300

Uh oh!

Conversation

robrix commented Oct 2, 2019

Uh oh!

robrix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants