# markdown.obsidian.links
> Functions for parsing internal links in [Obsidian.md](https://obsidian.md/) style markdown.

Obsidian uses both Markdown style links and Wikilinks as [internal links](https://help.obsidian.md/How+to/Internal+link). Markdown style links are of the form `[text_shown](link)` whereas Wikilinks are of the form `[[link_to_markdown#possible_anchor_to_header|text_shown]]`. They have an exclamation mark `!` if they are embedded.

Within Obsidian, it is often more convenient to use Wikilinks for Vault-internal links for several reasons:

- Obsidian automatically searches for links and aliases matching for auto-completion when constructing a Wikilink.
![Obsidian_link_autocomplete_example.gif](/images/markdown_obsidian_links_Obsidian_link_autocomplete_example.gif)
- Wikilink allow for the empty space character ` ` (whereas Markdown style links require empty space characters ` ` to be replaced with `%20`)

Nevertheless, Markdown style links have the following functions which Wikilinks lack:

- Markdown style links can contain external links (whether the links point to other Obsidian vaults or to a URL)
- Markdown style links can render LaTeX text.


In [None]:
#| default_exp markdown.obsidian.links

In [None]:
#| export
from __future__ import annotations
from deprecated import deprecated
from enum import Enum
from pathlib import Path
import re
from trouver.helper import (
    find_regex_in_text, text_from_file
)
from trouver.markdown.obsidian.vault import (
    all_paths_to_notes_in_vault, VaultNote, NoteDoesNotExistError
)
from typing import Union


In [None]:
#| export
# TODO Make it so that these patterns don't capture latex code
INTERNAL_LINK_PATTERN = r'!?\[\[.*?\]\]' 
WIKILINK_PATTERN = r'!?\[\[[^\]]+\]\]'
EMBEDDED_WIKILINK_PATTERN = r'!\[\[[^\]]+\]\]'
# WIKILINK_CAPTURE = r'!?\[\[([^#\]\|]+)(#[^\]\|]+)?(\|[^\]]+)?\]\]'
# Note that MARKDOWNLINK_PATTERN captures whitespace characters in its link, even though Obsidian
# does not. This is implmeneted to find if any misformats in the Obsidian Markdown files.
MARKDOWNLINK_PATTERN = r'!?\[[^\]]+\]\([^)]+\)'  
EMBEDDED_MARKDOWNLINK_PATERN = r'!\[[^\]]+\]\([^)]+\)'
EMBEDDED_PATTERN = f'{EMBEDDED_WIKILINK_PATTERN}|{EMBEDDED_MARKDOWNLINK_PATERN}'
# MARKDOWNLINK_CAPTURE = r'!?\[([^\]]+)\]\(([^)#])+(#[^)]+)?\)'

In [None]:
from os import PathLike
from fastcore.test import *

## Finding links in text via indices

In [None]:
#| export
def find_links_in_markdown_text(
        text: str
        ) -> list[tuple]: # Each tuple is of the form `(a,b)` where `text[a:b]` is an obsidian internal link.
    """Returns ranges in the markdown text string
    where internal links occur.

    # TODO: rename this function, say to link_ranges_in_text, 
    # because it is confusing when there is a links_from_text function below.

    **See Also**

    - `links_from_text`
    """
    regex = f'{WIKILINK_PATTERN}|{MARKDOWNLINK_PATTERN}'
    return find_regex_in_text(text, pattern=regex)


`find_links_in_markdown_text` returns a list of indices in a string in which the links are located.

In [None]:
# TODO: add markdown links to example

In [None]:
tutorial_text = r'''
This is an Obsidian note. It has some [[this_is_the_note_to_which_the_link_points|links]]!
Links are pretty neat. They can [[this_text_is_not_actually_shown|connect notes]] for you.
The following will create a link to the note `some_note`; the displayed text is `some_note`: [[some_note]]
You can also embed the contents of one note into another note. ![[note_being_embedded]].
The contents of `note_being_embedded` will be displayed when you view the note in Obsidian's view mode.
You can make anchors in links. For example [[note#This is a header title]] is a link to the note named
`note` and more specifically to the theader with title `This is a header title`.

The above links are all Wikilinks. Obsidian also supports Markdownlinks, e.g. [This is the text shown](This is the link.)

If the note of a link does not exist in an Obsidian vault, then Obsidian will create the note.
Even if the note does not have a header with title specified by the anchor of a link, Obsidian
will still open the note; it will not go to any particular header, however.
'''

ranges = find_links_in_markdown_text(tutorial_text)
match_strs = [tutorial_text[start:end] for start, end in ranges]
test_eq(match_strs, [
    '[[this_is_the_note_to_which_the_link_points|links]]', 
    '[[this_text_is_not_actually_shown|connect notes]]',
    '[[some_note]]',
    '![[note_being_embedded]]',
    '[[note#This is a header title]]',
    '[This is the text shown](This is the link.)'])

### Longer Example

The following example is of a note whose contents are based on an excerpt from Vakil's *The Rising Sea - Foundations of Algebraic Geometry*.

In [None]:
text = r'''
---
cssclass: clean-embeds
aliases: [foag_relative_cotangent_sheaf, foag_relative_tangent_sheaf]
tags: [_meta/definition, _meta/notation, _meta/literature_note, _reference/vakil_rising_sea, relative_cotangent_sheaf, relative_tangent_sheaf, diagonal_map/scheme, conormal_sheaf, dual/coherent_sheaf, morphism/schemes]
---
# Global definition[^1]
Let $\pi: X \rightarrow Y$ be a morphism of schemes. Define the **relative cotangent sheaf** $\Omega_{X/Y}$ or $\Omega_{\pi}$ as the [[foag_conormal_sheaf_of_a_locally_closed_embedding#For a locally closed embedding 2 4|conormal sheaf]] $\mathscr{N}^\vee_{X/X \times_Y X}$ of the diagonal[^2].

# Relative tangent sheaf
Define the **relative tangent sheaf** $\mathscr{T}_{X/Y}$ as the [[foag_dual_sheaf|dual]] $\mathcal{Hom}(\Omega_{X/Y}, \mathscr{O}_X)$[^3]

[^3]: ![[foag_notation_Hom_sheaf_hom]]

# Other
We now define $\mathrm{d}: \hat{\partial}_{\mathrm{X}} \rightarrow \Omega_{\mathrm{X} / \mathrm{Y}}$. Let $\operatorname{pr}_{1}: X \times_{\mathrm{Y}} X \rightarrow X$ and $\operatorname{pr}_{2}: X \times_{\mathrm{Y}} X \rightarrow X$ be the two projections. Then define $\mathrm{d}: \mathscr{O}_{\mathrm{X}} \rightarrow \Omega_{\mathrm{X} / \mathrm{Y}}$ on the open set $\mathrm{U}$ as follows:
$$
d f=p r_{2}^{*} f-p r_{1}^{*} f
$$
(Warning: this is not a morphism of quasicoherent sheaves on $X$, although it $i$ s $\mathscr{O}_{\mathrm{Y}}$-linear in the only possible meaning of that phrase.) We will soon see that $\mathrm{d}$ is indeed a derivation of the sheaf $\mathscr{O}_{\mathrm{X}}$ (in the only possible meaning of the phrase), and at the same time see that our new notion of differentials agrees with our old definition on affine open sets, and hence globalizes the definition. Note that for any open subset $U \subset X, d$ induces a map
$$
\Gamma\left(\mathrm{U}, \mathscr{O}_{\mathrm{X}}\right) \rightarrow \Gamma\left(\mathrm{U}, \Omega_{\mathrm{X} / \mathrm{Y}}\right)
$$
which we also call d, and interpret as "taking the derivative".

# See Also
- [[foag_notation_T_X_Y_relative_tangent_sheaf]]
- [[foag_notation_Omega_X_Y_relative_cotangent_sheaf]]
- [[foag_21.2.Q]]
- [[foag_sheaf_of_relative_i_forms]]

# Meta
## References
![[_reference_foag]]

## Citations and Footnotes
[^1]: Vakil, 21.2.20, Page 572
[^2]: Note that the diagonal morphism is a locally closed embedding.
'''

ranges = find_links_in_markdown_text(text)
for match_range in ranges:
    print(text[match_range[0]:match_range[1]])

[[foag_conormal_sheaf_of_a_locally_closed_embedding#For a locally closed embedding 2 4|conormal sheaf]]
[[foag_dual_sheaf|dual]]
![[foag_notation_Hom_sheaf_hom]]
[[foag_notation_T_X_Y_relative_tangent_sheaf]]
[[foag_notation_Omega_X_Y_relative_cotangent_sheaf]]
[[foag_21.2.Q]]
[[foag_sheaf_of_relative_i_forms]]
![[_reference_foag]]


## `ObsidianLink` class

#### Exceptions

In [None]:
#| export
class LinkFormatError(Exception):
    """Error that is raised when a string cannot be parsed as an
    `ObsidianLink` object.
    
    **Attribute**

    - `text` - `str`
    """
    def __init__(self, text):
        self.text = text
        super().__init__(f'Obsidian Markdown link is not formatted properly: {text}')

#### `ObsidianLink` class

In [None]:
#| export
class LinkType(Enum):
    """An Enumeration indicating whether an `ObsidianLink` object is a
    Wikilink or a Markdown-style link.

    Enumerates `LinkType.WIKILINK` and `LinkType.MARKDOWN`.
    """
    # See https://www.markdownguide.org/basic-syntax/
    WIKILINK = 0
    MARKDOWN = 1  
    # For Markdown links, use %20 to encode spaces in the link, e.g.
    # [asdf](localization_of_a_module#Localization%20of%20a%20module%201)
    # Links to the header `"Localization of a module 1"` in the file
    # localization_of_a_module



In [None]:
#| export
# TODO: implment equality, copy
class ObsidianLink:
    """Object representing an obsidian link
    
    **Attributes**

    - `is_embedded` - `bool`
        - Whether or not the link is embedded.
    - `file_name` - `str`, or `-1`
        - The destination of the link. It is either 
        
          1. The Obsidian-vault-recognized name of the file that the link
          points to. It can be a path relative to the Obsidian vault path 
          without the file extension (.md), 
          2. an external link, such as a URL, or
          3. -1, in which case the object represents a generic link pointing
          to any file (this is for generating regex).
          
          Note that if `file_name` is the empty string, then the link is a
          link to the same file

    - `anchor` - `str`, `0`, or `-1`
        - The title of the header of the anchor in the destination that the
        link points to or the ID to the markdown block link (preceded by a
        carat `^`). If 0, then the `ObsidianLink` object represents a link
        without an anchor. If -1, then the object represents a generic link
        with or without an anchor (this is for generating regex).
    - `custom_text` - `str`, `0`, or `-1`
        - The custom text of the link. Is `None` if no such text is specified.
        If 0, then the `ObsidianLink` object represents an internal link
        without custom text. If -1, then the object represents a generic
        internal link of any custom text (this is for generating regex).
    - `link_type` - `LinkType`
        - If `LinkType.WIKILINK`, then the str should be of the format
        `'[[<Obsidian-vault-recognized-name>(#anchor)?(|custom_text)]]'` 
        (The question marks here indicate optional components). Otherwise,
        the str should be a more standard Markdown link. Defaults to
        `LinkType.WIKILINK`.
    
    **Parameters**

    - is_embedded - bool
    - file_name - str or `None`
        - If `None`, set `self.file_name` to `-1`.
    - anchor - str or `None`
    - custom_text - str or `None`
    - link_type - `LinkType`
    """
    
    def __init__(
            self, is_embedded: bool, file_name: Union[str, int],
            anchor: Union[str, int], custom_text: Union[str, int],
            link_type: LinkType = LinkType.WIKILINK):
        self.is_embedded = is_embedded
        self.file_name = file_name
        self.anchor = anchor
        self.custom_text = custom_text
        self.link_type = link_type


    @staticmethod
    def from_text(text: str) -> ObsidianLink:
        """Returns an ObsidianLink object from text.
                
        **Raises**

        - InteralLinkFormatError
            - If the text is not properly formatted as an Obsidian internal link.
        """
        is_embedded = text.startswith("!")
        regex_object = re.compile(r"!?\[\[([^#\|]*?)(#(.*?))?(\|(.*?))?\]\]")
        matches = regex_object.match(text)
        if matches:
            file_name = matches.group(1)
            anchor = matches.group(3)
            custom_text = matches.group(5)
            link_type = LinkType.WIKILINK
        else:
            regex_object = re.compile(r'!?\[([^\]]*)\]\(([^)#]+)(#([^)]+))?\)')
            matches = regex_object.match(text)
            if not matches:
                raise LinkFormatError(text)
            file_name = matches.group(2).replace('%20', ' ')
            anchor = matches.group(4)
            if anchor:
                anchor = anchor.replace('%20', ' ')
            custom_text = matches.group(1)
            link_type = LinkType.MARKDOWN
        if anchor is None:
            anchor = 0
        if custom_text is None:
            custom_text = 0
        return ObsidianLink(is_embedded, file_name, anchor, custom_text, link_type)


    def to_regex(self) -> str:
        """Returns the regex for that this `Link` object represents

        Assumes that `self.file_name`, `self.anchor`, and `self.custom_text` are
        regex-formatted strings, e.g. if self.custom_text is `denotes?`, then the
        outputted regex-pattern matches links whose custom text is either `denote`
        or `denotes`.

        If neither `self.file_name`, `self.anchor` nor `self.custom_text` is `-1`,
        then the regex will in fact be a concrete string.

        **Returns**
        - str
            - Representing a regex.
        """
        embedding = '!' if self.is_embedded else ''

        if type(self.file_name) == str:
            filing = self.file_name
        else:  # self.file_name == -1
            filing = r'([^#\|]*)?'
        
        if type(self.anchor) == str:
            anchoring = f'#{self.anchor}'
        elif self.anchor == 0:
            anchoring = ''
        else:  # self.anchor == -1
            anchoring = '(#(.*?))?'
          
        if type(self.custom_text) == str and self.link_type == LinkType.WIKILINK:
            customing = fr'\|{self.custom_text}'
        elif type(self.custom_text) == str and self.link_type == LinkType.MARKDOWN:
            customing = self.custom_text
        elif self.custom_text == 0:
            customing = ''
        else:  # self.custom == -1
            if self.link_type == LinkType.MARKDOWN:
                customing = fr'(.*?)?'
            else:
                customing = fr'(\|(.*?))?'

        if self.link_type == LinkType.WIKILINK:
            return fr'{embedding}\[\[{filing}{anchoring}{customing}\]\]'
        else:
            # Markdown links format whitespace with '%20'
            filing = filing.replace(' ' , '%20')  
            anchoring = anchoring.replace(' ', '%20')
            return fr'{embedding}\[{customing}\]\({filing}{anchoring}\)'
    
    def __str__(self) -> str:
        # TODO: Choose what to do about | vs. \|.
        return self.to_string()

    def to_string(self) -> str:
        """Returns the string for the link if it is concrete.
        
        **Returns**
        - str
        
        **Raises**
        - ValueError
            - If `self.file_name`, `self.anchor` or `self.custom_text`
            is -1, i.e.  ambiguously represents an anchor or custom text.
        """
        if self.is_abstract():
            raise ValueError(
                f'The ObsidianLink object is abstract.'
            )
        assert (self.anchor != -1 and self.custom_text != -1
                and self.file_name != -1)
        embedding = '!' if self.is_embedded else ''

        if type(self.anchor) == str:
            anchoring = f'#{self.anchor}'
        else:  # self.anchor == 0
            anchoring = ''
          
        if type(self.custom_text) == str:
            if self.link_type == LinkType.WIKILINK:
                customing = fr'|{self.custom_text}'
            else:
                customing = self.custom_text
        else:  # self.custom_text == 0:
            customing = ''
        
        if self.link_type == LinkType.WIKILINK:
            return f'{embedding}[[{self.file_name}{anchoring}{customing}]]'
        else:
            # Markdown links format whitespace with '%20'
            file_name = self.file_name.replace(' ' , '%20')  
            anchoring = anchoring.replace(' ', '%20')
            return fr'{embedding}[{customing}]({file_name}{anchoring})'
    
    def convert_link_type(self, link_type: LinkType) -> ObsidianLink:
        """Returns an equivalent Link object which has the specified
        ``LinkType``.
        
        **Parameters**
        - link_type - ``LinkType``
        """
        # TODO
        return
    
    def displayed_text(self) -> str:
        """Returns the displayed str of this link.
        
        `self.file_name`, `self.custom_text` and `self.anchor` are
        assumed to be not `-1`.
        """
        if self.custom_text:
            return self.custom_text
        else:
            if not self.anchor:
                return self.file_name
            else:
                return f'{self.file_name} > {self.anchor}'

    def is_abstract(self) -> bool:
        """
        Returns `True` if self is abstract, i.e. file_name, anchor,
        or custom_text is `-1`.
        """
        return self.anchor == -1 or self.file_name == -1 or self.anchor == -1
    