Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(typescript_indexer): emit "named" edges for imports #3934

Open
wants to merge 7 commits into
base: master
from

Conversation

@ayazhafiz
Copy link
Contributor

commented Aug 1, 2019

Due to limitations in Kythe UIs, an anchor cannot be a definition and a
reference at the same time. For an import like

import {foo} from './bar';

it would be nice to have local uses of foo "ref" foo in the import
statement and for foo in the import statement to "ref" the definition
of foo in './bar'. Since this is not possible, instead all local uses
of foo "ref" the definition of foo in './bar'.

The current way this is done is by having all Symbol VNames
corresponding to the external definition of foo also correspond to the
local definition of foo. This is somewhat unfavorable because it
breaks TypeScript semantics.

This PR introduces an alternate approach to achieving the same UI xref
functionality by exploiting Kythe "named" edges.

"named" edges connect a semantic node (i.e. a node that
represents a language entity and not a source code anchor) and an
external name for that node. They have a nice property of folding
cross-references for a UI; that is, given a graph like
(1) A -ref-> D <-named- B <-defines/binding- C
a cross-reference between A and D is generated in a UI as if the graph
were
(2) A -ref-> B <-defines/binding- C

For the TypeScript indexer, "D" in graph (1) is a local import VName.
Local usages can reference the local import, and such references will
later be folded to provide a UI xref from the local usage to the remote
definition.

Now that they are semantic nodes, local import symbols emit a "name"
node because they are identifiers for a remote definition.

This commit introduces a change in the functionality of the indexer's
entry emissions but not in its end result UI xrefs, so the change is
best classified as a refactoring.

@googlebot googlebot added the cla: yes label Aug 1, 2019

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

cc @evmar, @zzmp
cc @kythe for usage of Kythe schema

@shahms

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

We should not generally bend the graph to make the UIs "work". Generates edges have particular semantics, which it does not seem like this use actually follows.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

Generates edges have particular semantics, which it does not seem like this use actually follows.

I agree, this definitely a hack. We are looking some way to describe an implicit alias for a symbol. Are there other edges you feel would be more appropriate? Maybe "named"?

@shahms

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

@zrlk I believe Python has similar semantics here (in that imports create a local definition which should also be treated as a reference to the imported module).

ayazhafiz added some commits Aug 1, 2019

refactor(typescript_indexer): emit "generates" edges for imports
Due to limitations in Kythe UIs, an anchor cannot be a definition and a
reference at the same time. For an import like

```typescript
import {foo} from './bar';
```

it would be nice to have local uses of `foo` "ref" `foo` in the import
statement and for `foo` in the import statement to "ref" the definition
of `foo` in './bar'. Since this is not possible, instead all local uses
of `foo` "ref" the definition of `foo` in './bar'.

The current way this is done is by having all Symbol VNames
corresponding to the external definition of `foo` also correspond to the
local definition of `foo`. This is somewhat unfavorable because it
breaks TypeScript semantics.

This PR introduces an alternate approach to acheiving the same UI xref
functionality by exploiting Kythe "generates" edges.

["generates"](https://kythe.io/docs/schema/#generates) edges connect two
semantic nodes (i.e. nodes that represent a language entity and not a
source code anchor) and are typically used for extralingual processes.
They have a nice property of folding cross-references for a UI; that is,
given a graph like
  (1) `A -ref-> D <-generates- B <-defines/binding- C`
a cross-reference between A and D is generated in a UI as if the graph
were
  (2) `A -ref-> B <-defines/binding- C`

For the TypeScript indexer, "D" in graph (1) is a local import VName.
Local usages can reference the local import, and such references will
later be folded to provide a UI xref from the local usage to the remote
definition.

Now that they are semantic nodes, local import symbols emit a
["name"](https://kythe.io/docs/schema/#name) node because they are
identifiers for a remote definition.

This commit introduces a change in the functionality of the indexer's
entry emissions but _not_ in its end result UI xrefs, so the change is
best classified as a refactoring.

@ayazhafiz ayazhafiz force-pushed the ayazhafiz:refactor/import branch from 6f27914 to 0764f38 Aug 1, 2019

ayazhafiz added some commits Aug 1, 2019

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

Moved towards using "named" edges which provide the same functionality and are abide by graph semantics.

@shahms shahms requested a review from zrlk Aug 1, 2019

@ayazhafiz ayazhafiz changed the title refactor(typescript_indexer): emit "generates" edges for imports refactor(typescript_indexer): emit "named" edges for imports Aug 1, 2019

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

The Python indexer gets creative with which part of the syntax it associates with the definition--it actually links the import (see https://github.com/google/pytype/blob/8f90fdc6a666c5af886d332ac3a5347d21a9a750/pytype/tools/xref/testdata/import.py#L13). It doesn't add the named edges or name nodes, the use of which here seems somewhat novel. In particular, the schema says that a name node should represent an "external identifier", like a JVM-internal class name or a mangled C++ symbol.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

The Python indexer gets creative with which part of the syntax it associates with the definition--it actually links the import

What happens in the case of multiple imports? Do they all reference import?

I think the Kythe schema doesn't currently have anything that properly describes the relationship expressed here.

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

Looks like the Python indexer will drop multiple definitions on top of the import in something like from typing import Optional, Text.

The closest existing relationship that we have is aliases (https://kythe.io/docs/schema/#aliases). @schroederc, do you know whether we use aliases edges for clustering?

@schroederc

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

Looks like the Python indexer will drop multiple definitions on top of the import in something like from typing import Optional, Text.

The closest existing relationship that we have is aliases (https://kythe.io/docs/schema/#aliases). @schroederc, do you know whether we use aliases edges for clustering?

We used to cluster aliasing, but we turned it off because it was unexpected for C++ users.

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

@ayazhafiz, just to confirm, should references to the local alias be aggregated with references to the definition being aliased? Or should the local alias be an explicit stopping point that users need to click through (like how Python works now)?

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

@ayazhafiz, just to confirm, should references to the local alias be aggregated with references to the definition being aliased?

It would be nice to have this, yes. That's the change proposed here.

Or should the local alias be an explicit stopping point that users need to click through (like how Python works now)?

If the former option is not recommended, we can do this. But this is not preferred because it breaks TypeScript semantics.

To clarify further - ideally the local alias itself would be the stopping point, which could then be clicked through to the external definition (like Python, but referencing on the import name rather than import). This is not possible, so we prefer referencing the external definition rather than the local alias.

ayazhafiz added some commits Aug 1, 2019

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

I think that, to satisfy the preferred requirements, we'll need to add a new edge kind to the schema. It would have the same clustering behavior as generates going from the definition of the thing being imported to the local name for that thing, such that the definition is the cluster head. Maybe delegates? Then the local name is an ordinary variable and named/name isn't used at all.

@creachadair

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

Due to limitations in Kythe UIs, an anchor cannot be a definition and a
reference at the same time. For an import like

Maybe I missed something, but there seem to be two separate claims here, and I think at least one of them may be false:

An anchor can be both a definition and a reference. In terms of the data model, that's perfectly OK. Historically Google's internal CodeSearch tool insisted on making a single-click action prefer one or the other, which is maybe what you're referring to in this case? But either way, that is not a limitation of the UI, that's a design choice that particular tool made.

I agree with the others who've spoken out against hacking around that decision. The UI has enough information to give the user a choice, and should do so if the user cares.

@creachadair

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

Leaving aside the definition/declaration issue, it sounds like the real semantic question here is about resolution: When someone writes import foo as bar and then references bar.Baz in the code, the indexer has at least two options for what to connect to the anchor:

  1. It may refer to the binding introduced by the import statement,
  2. It may refer to the imported package.

I'd argue the indexer should record both associations if it knows them: A user who requests cross-references for the bar at from foo import bar should be able to locate all the places where that name is actually used in the module, since that's the alias that governs its use. But likewise, a user who requests cross-references for the bar at bar.Baz should be able to locate the original definition of bar as well as the local binding at import foo as bar.

How that gets surfaced in the UI is an important but—I would argue—ancillary question.

@evmar

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

Another reason to prefer the more "local" target when you have chain of

a: export foo
b: export {foo} from a
c: export {foo} from b
d: import {foo} from c; foo.whatever

If you click on the 'foo' in d and it jumps you to a, you sometimes lose how you even got to a, so recording each hop separately has value.

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

If we don't want to consider the local created by an import as ~equivalent to the thing being imported, another possible way to satisfy the requirements may be to simply emit two ref edges at each site where the local is used. One refs the local's definition; the other refs the value bound to the local by the import. Then it's up to the UI to figure out what to do. (This is just a rehash of @creachadair's comment above.)

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2019

An anchor can be both a definition and a reference. In terms of the data model, that's perfectly OK.

Agreed. I think we should strive to emit this information in every such case.

Historically Google's internal CodeSearch tool insisted on making a single-click action prefer one or the other, which is maybe what you're referring to in this case? But either way, that is not a limitation of the UI, that's a design choice that particular tool made.

Ah, that is what I am referring to; sorry for not being more clear. I mean it as a limitation in a user's interaction with the Kythe graph - that is, a user can't see that an anchor is both a definition and reference as expressed in the Kythe graph because the UI choice is to pick one.

You are right that this should be a feature requested from the UI. And also, I think because the main usage Kythe as an indexing service is to provide cross-references in a UI, we should strive to have a good solution that works around this design choice until UI representations of an anchor as a reference and definition are a feature. I could be wrong though -- this is just my intuition.

When someone writes import foo as bar and then references bar.Baz in the code, the indexer has at least options for what to connect to the anchor:

  1. It may refer to the binding introduced by the import statement,
  2. It may refer to the imported package.

I'd argue the indexer should record both associations if it knows them: A user who requests cross-references for the bar at from foo import bar should be able to locate all the places where that name is actually used in the module, since that's the alias that governs its use. But likewise, a user who requests cross-references for the bar at bar.Baz should be able to locate the original definition of bar as well as the local binding at import foo as bar.

I agree; to me, giving information about everything we know about the import usage is the best solution presented so far. I do think that the most correct association is still that bar at bar.Baz references only the local import and that bar at import foo as bar references the original definition of foo. This preserves the semantics of the language, because in fact bar is an alias of foo and is therefore a unique entity.

This is a difficult argument on all sides because we are trying to mix hacks for user convenience with the most appropriate (Kythe) graph and language semantics, so it's likely my interpretation is still not the best (if there even is a least-ambiguous one here).

a: export foo
b: export {foo} from a
c: export {foo} from b
d: import {foo} from c; foo.whatever

If you click on the 'foo' in d and it jumps you to a, you sometimes lose how you even got to a, so recording each hop separately has value.

Agreed. @craigdbarber's solution works here too; it is possible to emit a reference to each hop. I would argue that doing so would be incorrect in semantic terms, because it is not the case that foo in (d) is defined by all foos in (a, b, c, d) - each is an alias. But it does provide the more-accurate hopping information.

@creachadair

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2019

I wonder if the UI could do a better job if we marked the names introduced by imports specially—say, by giving them a subkind (variable/imported for the sake of argument). That would allow the client to distinguish "liminal" bindings from "primary" ones. It would also permit denormalizing one hop if necessary for performance reasons: If @—ref—P and @—defines/binding—P and P.node/kind variable/imported and P—ref/imports—Q, say, then folding across the ref/imports relation for */imported nodes could be done without collapsing the identity.

I haven't fully thought through the implications of this, but it occurs to me that might be one way to tease apart the layers.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

Summarizing and annotating on the ideas expressed so far:

@creachadair's variable/imported subkind:

variable/imported ... would allow the client to distinguish "liminal" bindings from "primary" ones. It would also permit denormalizing one hop if necessary for performance reasons

If I'm understanding this correctly, this seems very close to @zrlk's delegates edge idea:

clustering behavior as generates going from the definition of the thing being imported to the local name for that thing, such that the definition is the cluster head. Maybe delegates?

For these two ideas it would be useful to also create a variable/importExport subkind (or similarly, a delegates edge) on an export from statement

export {foo} from './fooSource'; // TS/JS-only?

This would create a graph like

IMG_1237

(left hand variable/import subkinds, right hand delegates edges)

that would have references to the import/export hops @evmar mentioned.

A UI could use these variable/import subkinds or delegates edges to fold imports across nodes:

Alternatively, each transitive usage of AExport could reference both its most-immediate hop and the "primary" AExport definition:

This requires some anchors to be both definitions and references, which is valid in the graph but not currently supported in a UI (that is, a UI chooses an anchor to be either a definition or reference, but not both).

If anchors can be treated as both definitions and references in the UI, it is also possible to just provide hops between "local" imports/usages:

IMG_1236

This hopping mechanism can still be done in a UI that doesn't support anchors that are both definitions and references but separating the anchor defining a node and the anchor referencing from that node:

IMG_1239

This is what the Python indexer does.

In short, there are currently three ways to approach this problem:

  1. Extend a UI to support both definitions and references on one anchor
  2. Create a new node subkind or edge that allows a UI to fold hops where the UI decides is appropriate
  3. Separate the anchors defining a definition of and reference from a node as a workaround for a UI not having definitions and references one an anchor.
@creachadair

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2019

Your synopsis seems accurate to me. Only your point (3) I think may not work: CodeSearch does not actually care about the identity of anchors, only their spans within the file—since that's how it decorates the file. That may have changed since I left, but if not, having a separate anchor at the same location will not fix the separation issue in the UI. A more sophisticated UI could make use of it, though.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

In (3) the anchors are emitted at different locations:

# @import defines/binding LocalBar
# @bar ref ExternalBar
from foo import bar
@creachadair

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2019

In (3) the anchors are emitted at different locations:

# @import defines/binding LocalBar
# @bar ref ExternalBar
from foo import bar

I see. But then what happens if there is more than one import?

from foo import bar, baz          # no aliases
from foo import bar as baz, quux  # aliases or mixed

I realize Google's Python style guide says you shoudn't do this, but even in Google3 it shows up in third-party code all the time. And at least from what TS code I've seen, it's pretty common to import multiple names this way too.

@shahms

This comment has been minimized.

Copy link
Contributor

commented Aug 5, 2019

For the record, this would also happen for C++ using declarations, in that a more "correct" modelling of: using ::namespace::Type; would be to treat it as equivalent to the type alias using Type = ::namespace::Type;, but with overlapping spans on Type for the reference and alias definition. We don't currently do this (we only emit the reference), but there is at least one feature request around this from what I recall.

I still think (1) is the right solution and, AFAICT, delegates would essentially be a more general form of the aliases edge, arguably obviating the latter (and potentially suggesting that we expand aliases rather than introducing a new edge).

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 5, 2019

I see. But then what happens if there is more than one import?

The local import definitions all cluster on the import anchor. See zrlk's comment.

delegates would essentially be a more general form of the aliases edge, arguably obviating the latter (and potentially suggesting that we expand aliases rather than introducing a new edge).

I agree. Though I'm not sure if there are any other situations of value aliasing except in imports/exports. Perhaps pointer aliasing, but this is a compiler feature, not a programming language feature.

@shahms

This comment has been minimized.

Copy link
Contributor

commented Aug 5, 2019

It's only "value" aliasing because Python doesn't have anything else ;-) There was a C++ proposal to allow using Name = AnotherName; to be used regardless of the kind of AnotherName but because it was reasonable it hasn't been adopted.

@hildrum

This comment has been minimized.

Copy link

commented Aug 5, 2019

Regarding your comment, Kythe may need to give the UI better information so the UI can do the right thing, but Kythe shouldn't try to work around UI choices.

There's no real constraint that the UI can't support definitions and references on one anchor. It's just that the UI needs to know what happens on a click. Right now the two options are goto the definition or open the xref panel.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 5, 2019

Kythe may need to give the UI better information so the UI can do the right thing

What kind of additional information would be useful for Kythe to provide in this case? Is it possible to always open the xrefs panel when an anchor has definitions and references?

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 5, 2019

Expanding aliases to do what delegates is suggested to do above would cause a regression for C++ users, for whom we narrowed aliases in the first place.

I still like fixing the UI, adding both refs in (to the def of the local name and to the def of the thing being imported), and using the variable/import subkind. delegates is still an option, but it might rope in too many local names in this instance; the UI would have to figure out how to scope them to the local file if the user wanted (to avoid dumping out from a reference to an import of module foo all of the references to imports of foo everywhere).

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

We talked this out today and came up with a new suggestion. Use an aliases edge between the imported variable/import node and the thing it targets. To avoid the problem we ran into with C++[1], we're going to change the way aliases behaves during denormalization. Roughly, if A aliases B, references to A will roll up to B, but references to B (or A' where A' aliases B) won't roll down to A.

For imports like from foo import bar, bar should both defines/binding the variable/import and ref bar in foo[2]. For imports like from foo import bar as baz, baz defines/binding the variable/import node, while bar should still ref bar in foo.

What do you think?

[1] the historic problem had to do with a typedef like using foo = double, where someone wanted to identify places where foo was used, but was helpfully provided with all of those sites plus everything else that was a double.

[2] Pending UI changes, putting the defines/binding spans on the import for non-renamed imports should work.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

For imports like from foo import bar, bar should both defines/binding the variable/import and ref bar in foo[2]. For imports like from foo import bar as baz, baz defines/binding the variable/import node, while bar should still ref bar in foo.

This is reasonable to me; it is the "optimal" solution preserving graph semantics. Where do aliases edges fit in here, if at all?

Pending UI changes, putting the defines/binding spans on the import for non-renamed imports should work.

This is okay to me too, but let's see if there are any objections, as this is a hack we may not want to spread across multiple indexers. Is there any estimate for the timeline on the UI changes (@hildrum)?

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

The aliases edges are necessary for usages of the local baz to appear in the imported bar's cross-references (similarly for non-renaming imports).

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

So in from foo import bar as baz Baz aliases Bar, and in from foo import bar LocalBar aliases FooBar? (node names just for example)

@creachadair

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2019

We talked this out today and came up with a new suggestion. Use an aliases edge between the imported variable/import node and the thing it targets.

With the change you described, I think this could work pretty well. I suppose it could be a little weird in Python where aliases might hop, e.g.

# foo/__init__.py
from _submodule import thingy as Whatzit

# app.py
from foo import Whatzit as Hoozimabob

So now @Hoozimabob—defines/binding—[HM:var/import]—aliases—[WH:var/import]—aliases—[TH:class] or words to that effect. But I expect that may mostly be Python, where only values (not bindings) have types.

For imports like from foo import bar, bar should both defines/binding the variable/import and ref bar in foo[2]. For imports like from foo import bar as baz, baz defines/binding the variable/import node, while bar should still ref bar in foo.

And presumably in this case @baz—defines/binding—[BAZ:var/import]—aliases—[foo:bar]?

I don't see any obvious explosive difficulties.

@ayazhafiz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

AFAICT, there is no variable/import subkind currently. We'll need to add this.

We'll need to change the description of aliases to allow values to be aliased as well.

@zrlk

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2019

The schema changes are up to variable and aliases (#3979); initial serving support will come with #3980.

ayazhafiz added a commit to ayazhafiz/kythe that referenced this pull request Aug 15, 2019

feat(typescript_indexer): emit references to hops in imports
See longer discussion at kythe#3934.

Due to limitations in Kythe UIs, an anchor cannot be a definition and a
reference at the same time. For an import like

```typescript
import {foo} from './bar';
```

it would be nice to have local uses of `foo` "ref" `foo` in the import
statement and for `foo` in the import statement to "ref" the definition
of `foo` in './bar'. Pending UI changes, instead all local uses of
`foo` "ref" the definition of `foo` in './bar'.

This PR introduces an alternate approach to acheiving more a
functionality more representative of TypeScript semantics by separating
local definitions and remote references in the same way the Python
indexer currently does (see kythe#3934).

For a code like

```typescript
import {foo} from './bar'
```

The anchor at "import" now "defines/binding" the local definition of foo,
"LocalFoo". "LocalFoo"
- is a "variable" if it's a value, a "talias" if it's a type.
- has a subkind "import".
- "aliases" the remote definition of foo, "RemoteFoo".

The anchor at "foo" now "ref" the remote definition of foo, "RemoteFoo".

For a code like

```typescript
import {foo as baz} from './bar';
```

The functionality is more as expected. The anchor at "baz"
"defines/binding" the local definition of `baz`, and the anchor at "foo"
"ref" the remote definition of `foo`.

Finally, nodes are now emitted for both value and type imports. The
biggest difference is that local import declarations that are values
have a node kind "variable", and those that are types have a node kind
"talias". Import declarations that are both values and types have two
nodes emitted, one for each kind of import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.