Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference external MyST sites with labels and standard reference syntax (xref) #1111

Closed
Tracked by #1106
choldgraf opened this issue Apr 16, 2024 · 8 comments
Closed
Tracked by #1106
Assignees
Labels
enhancement New feature or request

Comments

@choldgraf
Copy link
Member

choldgraf commented Apr 16, 2024

Currently, allow for cross-referencing many kinds of external content. However, we do not support cross-references for other MyST sites.

It would be useful if you could cross-reference another MyST site's content via labels and a unique identifier for that site, similar to how the Intersphinx referencing syntax works.

For example, in MyST site 1, define a link like:

(myLabel)=
### My section

Some text

And in MyST site 2, have configuration like:

references:
  myst1: mysite.org/mystdocs

And in content, something like:

Here's a reference to [](myst1#myLabel)
@rowanc1
Copy link
Member

rowanc1 commented Apr 16, 2024

Related:

We could use the references section to define new "schemes"

references:
  compass: compass.executablebooks.org
  2i2c: 2i2c.org/docs/infrastructure
  gh_myst:
     url: https://github.com/executablebooks/mystmd
     kind: github

[](compass:#team) and @2i2c:#deploying, [](gh_myst:#1111) etc. all of these parse to URIs, and the protocol (e.g. compass:) is what we use to check against the references list. This is the same as current, but drops myst: as the protocol, which is easier to read. There are a few default protocols included at the moment (wiki, rrid), which fits with this same scheme. Regardless of the prepending of an protocol (e.g. inv: or myst:), this should be differentiated from internal links in an explicit way.

@agoose77
Copy link
Collaborator

I'm a 👍 on xref as a scheme - it marries well with our MEP on link handling. I think the autolink syntax looks most natural, and crucially doesn't require us to invent new syntax.

e.g.

<xref:gh_myst:#1111>

For MyST-MD, I feel that we want to unify intersphinx and non-intersphinx xrefs (i.e. MyST sites) under a references table, using explicit type: myst fields to indicate which kind of reference the key represents.

When it comes to different kinds of xrefs, e.g. GitHub, I am not against a single scheme per type of xref, e.g. sphinx: and myst: schemes.

<gh:executablebooks/myst#1111>

or

<gh:executablebooks/myst#1111>

These kind of schemes would be inherently more understandable (i.e. we know sphinx xrefs are Sphinx), because I think a GitHub issue link (especially how it's resolved) is somewhat different to the deviations between MyST and intersphinx.

But, this is all very hot off the press, so I'll keep thinking on it.

@rowanc1
Copy link
Member

rowanc1 commented Apr 30, 2024

In talking through with @fwkoch today, these were our notes on where we are headed shortly. This should allow for minimal changes to our config/reference syntax, a new file that we write out to have myst-reference declarations, and how to use those in markdown (going with a protocol per definition with myst inv and xref reserved), and then how that exports to the AST, the theme shouldn't actually need changes.


Exposing cross-references: write out a myst-friendly xref object, which is just JSON. This should be very similar to the object.inv. In addition to the objects.inv that we write to the site folder, write a myst.xref.json. This should have a version, the myst client version, and a list of the cross-references (e.g. in json):

version: 1
myst: 1.2.1
references:
  - identifier: fig:abc
    kind: figure
    html_id: fig-abc   # This is optional, and only included if different
    url: page
    data: some/other/page.json
    implicit: true

This allows us to get the myst identifier (don't need the label) the html_id (only include if it is different), which might be different, the url, which is relative from the location of myst.xref.json, these are URL locations (i.e. / for folder separators). The data is the URL location of the content as data, we don't aim to bring it in here, and direct the client to where that is stored.

To reference these, your myst.yml should:

project:
  references:
    # All of these should just be able to be specified with just the URL as a string
    jb:
      kind: intersphinx
      url: https://jupyterbook.org
    mystmd:
      kind: myst
      url: https://mystmd.org
    gh-mt:
      kind: github-issue
      url: https://github.com/executablebooks/myst-theme

These define url protocols, with the following resolved fields ["http", "https", "ftp", "mailto", "myst", "inv", "xref"].

This allows the following URLs:

  • [](jb:#fig:abc) is equivalent to [](jb:#fig-abc) (html_id vs identifier doesn't matter)
  • [](jb:page#introduction) is a link to an implicit reference, which needs the page
  • [](py:#zipapp-specifying-the-interpreter)
  • [See Figure {{number}} in mystmd](mystmd:#fig-1)
  • [](gh-mt:#100)

This will create a AST for a link similar to:

{
  "type": "crossReference",
  "kind": "heading",
  "identifier": "setex-headings",
  "label": "setex-headings",
  "children": [
    {
      "type": "text",
      "value": "Setext Headings",
    }
  ],
  "html_id": "setex-headings",
  "remote": true,
  "url": "/commonmark",
  "dataUrl": "https://remote.url/commonmark.json",
},

At this point, we shouldn't need any changes from the theme as long as the CORS is set correctly on the JSON response.

@agoose77
Copy link
Collaborator

agoose77 commented May 3, 2024

@rowanc1, @fwkoch, and I had a short meeting to think on the user experience side of things (now that the hard work has been done by the aforementioned two).

Briefly, there are two syntax suggestions:

  • <xref:jb/page1#heading>
  • <jb/page1#heading>

i.e., does the xref source act as a scheme or a hostname?

There were the following "topics" of conversation identified:

  • User experience of authoring xrefs with a scheme prefix vs no prefix
  • Compatibility of existing inv scheme for Sphinx inventories with MyST analogue
  • Compatibility with CommonMark document ingest
  • Overlap of custom schemes with reserved IANA schemes
  • What does a scheme mean?

We had some good discussion, and settled on using an explicit xref scheme for the following reasons:

  • xref links should not be so common in a single paragraph to adversely harm readability
  • schemes are reserved and meaningful, so we should be hesitant to make them arbitrary
  • Users may choose schemes that later MyST versions wish to implement, which would break MyST sites.

The xref scheme should follow the rules of a URI1 (see here for some URI examples, and here for the spec). If we don't support the authority component (i.e., if we don't require our URIs to have the form: xref://) then everything after xref: (mostly) is the path whose syntax we can (mostly) dictate.

There is already an example of where we've thought about this; the Intersphinx reference of this demo. myst-parser here is using inv like a URN of the form foo:bar:baz:hello:world:.... We do not have to use a URN, and I think it makes less sense for MyST xrefs:

  • MyST xrefs feel more like a location than a name, so a URL feels more appropriate.

We can choose the meaning of the path. For example, with #heading1 on page2, we could write

  • <xref:jb/page2#heading1>
  • <xref:jb:page2#heading1>
  • <xref:page2@jb#heading1>
  • <xref:jb.page2#heading1>

etc.

Note

The / character is a formal separator character, although for our purposes that doesn't matter.

We settled on a hybrid "URL"-style locator (although at the time the above discussion of URL-URN-URI was missing), which probably reflects the way we think about xrefs:

<xref:KEY/PAGE#TARGET>

With this syntax, we might need to think about how domains are represented. Maybe we need a special scheme xref+sphinx, or a way to map sphinx names foo:bar onto URL components.

The difference to a "URL" is that we don't require an authority component (the //host in a URL), so the path is directly embedded.

@rowanc1 also pointed out that in the future, we may have MyST-authored content living at a DOI address, meaning that we could define a DOI as an xref source in order to pull out structured assets!

Footnotes

  1. A really big source of confusion is uri vs url vs urn. My explanation is: a URI is a generic way to identify a resource. A URL is a colloquial term to refer to a "network" based resource (so URI and URL are mostly interchangeable for URLs). URNs refer to names rather than locations. Again, like URL-URIs, URNs are just special names for a style of URI - we can write something that looks like a URN without calling it a URN.

@choldgraf
Copy link
Member Author

choldgraf commented May 5, 2024

In case it's useful this is how Sphinx seems to recommend doing cross refs with intersphinx nowadays

https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html#role-external

It's with a dedicated role that follows a form like external+key:type and then the identifiers go in the role body.

A small nitpick: does it need to be the full "xref" or is it reasonable to use something shorter like "xr" or a symbol like "@"? Just trying to think of how to minimize visual clutter and keystrokes but it's not a deal breaker

@rowanc1
Copy link
Member

rowanc1 commented May 7, 2024

Thanks for the writeup @agoose77 I foresee a follow up MEP, @fwkoch and I have implemented it! Merging in the first PR shortly, the following still needs to be improved:

  • Replace default xref text for link of format <xref:key#id> 📐 Replace xref text when using angle brackets #1183
  • Support these xrefs with {embed}
  • Get project title instead of index page title from <xref:key> (surface project information)
  • Deprecate identifier on crossReference, etc nodes in favour of target
  • Recursive cross-references in other site (need to set context or something on the xref node, and use that to fetch) 🔗 Add remote cross-project link to AST #1187
    image
  • Rethink cache policy for cross-references - re-fetch after 1 day, based on file created time. Also, cache individual page json responses. Finally, add this same re-fetch behaviour to cached links after 30 days.
  • Figure out CORSs for github pages
  • move types to myst-common

@rowanc1 rowanc1 changed the title Reference external MyST sites with labels and standard reference syntax Reference external MyST sites with labels and standard reference syntax (xref) May 7, 2024
@agoose77
Copy link
Collaborator

agoose77 commented May 7, 2024

@rowanc1 I think this initial implementation is complete as far as the issue brief. I'll open a new issue to track some of the remaining TODOs.

@agoose77 agoose77 closed this as completed May 7, 2024
@agoose77
Copy link
Collaborator

agoose77 commented May 7, 2024

(Closed by #1180)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants