Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs with symbol links to /(_:_:) operators #717

Merged
merged 7 commits into from
Dec 13, 2023

Conversation

d-ronnqvist
Copy link
Contributor

@d-ronnqvist d-ronnqvist commented Sep 26, 2023

Bug/issue #, if applicable: rdar://112555102 #714

Summary

This fixes two bugs where symbols with "/" in the name couldn't be linked to and where their data files in the output archive could collide with other pages's data files.

To avoid needing to escape slashes this fix relies some rules about

  • the allowed characters in module names in Swift, C, Objective-C
  • the allowed characters in Swift operators (C and Objective-C can't define custom operators).

These rules means that ``/Something`` can unambiguously be parsed as an absolute link to the "Something" module (excluding the slash from the path component) whereas ``/(_:_:)`` can unambiguously be parsed as a relative symbol link to the /(_:_:) operator (including the slash in the path component).

It also means that ``+/-(_:_:)`` can unambiguously be parsed as a single path component because operators need at least argument.

Dependencies

None

Testing

  • Define a public type with /(_:_:) operators, for example
public struct MyNumber: SignedNumeric, Comparable, Equatable, Hashable {
    public static func / (lhs: MyNumber, rhs: MyNumber) -> MyNumber { fatalError("This only needs to compile for a unit test") }
    public static func /= (lhs: inout MyNumber, rhs: MyNumber) -> MyNumber { fatalError("This only needs to compile for a unit test") }
    public static func *= (lhs: inout MyNumber, rhs: MyNumber) { fatalError("This only needs to compile for a unit test") }
    public var magnitude: Int.Magnitude
    public static func - (lhs: MyNumber, rhs: MyNumber) -> MyNumber { fatalError("This only needs to compile for a unit test") }
    public init(integerLiteral value: Int) { fatalError("This only needs to compile for a unit test") }
    public static func < (lhs: MyNumber, rhs: MyNumber) -> Bool { fatalError("This only needs to compile for a unit test") }
    public init?<T>(exactly source: T) where T : BinaryInteger { fatalError("This only needs to compile for a unit test") }
    public static func * (lhs: MyNumber, rhs: MyNumber) -> MyNumber { fatalError("This only needs to compile for a unit test") }
    public static func + (lhs: MyNumber, rhs: MyNumber) -> MyNumber { fatalError("This only needs to compile for a unit test") }
}
  • Link to ``MyNumber//(_:_:)`` and ``MyNumber//=(_:_:)`` from anywhere in the project
    and to ``/(_:_:)`` and ``/=(_:_:)`` from MyNumber.

  • Build documentation for the project. Both links above should resolve successfully.

  • Inspect the "data" directory in the .doccarchive. Both division operators should have file names that start with "_". You can verify this by not seeing files named "(_:_:).json" and "=(_:_:).json" (as those files names would be if the leading slash was dropped from the file name)

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist linked an issue Sep 26, 2023 that may be closed by this pull request
2 tasks
@d-ronnqvist

This comment was marked as outdated.

@d-ronnqvist

This comment was marked as resolved.

@d-ronnqvist d-ronnqvist marked this pull request as draft October 14, 2023 00:33
@d-ronnqvist d-ronnqvist changed the title Fix two bugs with symbol links to /(_:_:) and -(_:_:) operators Fix bug with symbol links to /(_:_:) operators Oct 14, 2023
# Conflicts:
#	Tests/SwiftDocCTests/Infrastructure/DocumentationContext/DocumentationContextTests.swift
#	Tests/SwiftDocCTests/Infrastructure/PathHierarchyTests.swift
@d-ronnqvist
Copy link
Contributor Author

There's no measurable performance impact from these changes

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main                 │ current              │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'bundle-registration'       │ no change¹      │ 2.4 sec              │ 2.401 sec            │
│ Duration for 'convert-total-time'        │ no change²      │ 7.414 sec            │ 7.494 sec            │
│ Duration for 'documentation-processing'  │ no change³      │ 4.879 sec            │ 4.957 sec            │
│ Duration for 'finalize-navigation-index' │ no change⁴      │ 0.058 sec            │ 0.059 sec            │
│ Peak memory footprint                    │ no change⁵      │ 538.2 MB             │ 537 MB               │
│ Data subdirectory size                   │ no change       │ 168.5 MB             │ 168.5 MB             │
│ Index subdirectory size                  │ no change       │ 1.5 MB               │ 1.5 MB               │
│ Total DocC archive size                  │ no change       │ 223.7 MB             │ 223.7 MB             │
│ Topic Anchor Checksum                    │ no change       │ 9ca210cb9901eb38d157 │ 9ca210cb9901eb38d157 │
│ Topic Graph Checksum                     │ no change       │ 311f4d32005c9bd7dd72 │ 311f4d32005c9bd7dd72 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

 1: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -0.170          degrees of freedom : 28       95% confidence critical value : +2.042

 2: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -0.939          degrees of freedom : 28       95% confidence critical value : +2.042

 3: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -0.894          degrees of freedom : 28       95% confidence critical value : +2.042

 4: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -0.320          degrees of freedom : 28       95% confidence critical value : +2.042

 5: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : +0.946          degrees of freedom : 28       95% confidence critical value : +2.042

@d-ronnqvist d-ronnqvist marked this pull request as ready for review November 16, 2023 16:54
@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist changed the title Fix bug with symbol links to /(_:_:) operators Fix bugs with symbol links to /(_:_:) operators Dec 11, 2023
Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I love the use of Substring slices my guess is this will be faster than what we had before, in addition to fixing bugs.

}

enum PathParser {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of this Swift idiom, which I've seen a few times now: enum EmptyEnum { } and then later extension EmptyEnum { ...functions defined here... }

Why not just create a normal class or struct and directly add functions to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I mainly did it to avoid the extra indentation. By using extension OuterType.InnerType { ... } the functions defined on InnerType are only indented once instead of twice. This mostly doesn't matter unless the API is added to a deeply nested type.

However, since I was moving API from PathHierarchy into PathHierarchy.PathParser while also modifying it, I wanted to preserve the unchanged lines in the diff which wouldn't have worked if the whole code block was intended more than it was before.


mutating func scanPathComponent() -> Substring {
// If the next component is an operator, parse the full operator before splitting on "/" ("/" may appear in the operator name)
if remaining.unicodeScalars.prefix(3).allSatisfy(\.isValidOperatorHead) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the value 3 come from? Can you calculate this constant at compile time somehow? (I'm guessing it's the maximum length of the string literal cases in isValidOperatorHead?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant to be the shortest possible1 operator name2; one operator head character followed by () for no arguments.

We could shorten it to just two characters because we only need to find if the next component isn't a module with a "/" prefix.

Footnotes

  1. Having no arguments doesn't really make sense for a real operator since there's no value to operate on. Even ! has one argument.

  2. It's really too permissive for real operator names since the operator head supports more characters than the rest of the operator name does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to this effect (someday... not urgent) - or define a let constant with a descriptive name, so the number is less magical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably revisit this piece of code for rdar://119616317

let isAbsolute: Bool
if path.first == PathComponentScanner.separator {
guard let maybeModuleName = components.first?.dropFirst(), !maybeModuleName.isEmpty else {
return ([], true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this line be reached? for a path that begins with // ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reached if the path just a single /. If that path was // there would still be a non-empty string after dropping the leading slash.

return ([], true)
}
isAbsolute = maybeModuleName.isValidModuleName()
assert(NodeURLGenerator.Path.documentationFolderName.isValidModuleName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these assert calls achieve? Just a sanity check that the logic in isValidModuleName is not broken somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more to check that we don't change NodeURLGenerator.Path.documentationFolderName and add some unexpected characters.

guard let maybeModuleName = components.first?.dropFirst(), !maybeModuleName.isEmpty else {
return ([], true)
}
isAbsolute = maybeModuleName.isValidModuleName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rely on isValidModuleName distinguishing between operators and modules just based on the type of characters in the path component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the path splitting will return ["/ModuleName"] if the path was /ModuleName and ["/(_:_:)"] if the path was /(_:_:). To distinguish between an absolute (starting with a leading slash) reference the module and a relative to the division operator we check if the rest of the path component is a valid module name.

// If the next component is an operator, parse the full operator before splitting on "/" ("/" may appear in the operator name)
if remaining.unicodeScalars.prefix(3).allSatisfy(\.isValidOperatorHead) {
guard let operatorEndIndex = remaining.firstIndex(of: Self.operatorEnd) else {
defer { remaining.removeAll() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very clever how you call different defer closures depending on the contents of the string and the logic here and how you update the remaining value after determining the return value (if I'm reading this correctly). Nice!

switch value {
case
// A-Z
0x0041...0x005A,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we can save these unicode values/characters into a config file elsewhere? That would simplify this source code file a lot.

Or would the overhead of opening that config file be too much?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion a separate config file would be more complicated and likely less readable.

The biggest complication would be that we'd have a resource that we need to ship alongside the docc executable. This is a fairly minor thing but it does mean that we need to update build-script-helper.py to copy this config file into the install directory and DocC needs to handle the cases when this config file is missing or not formatted correctly.

There's a tradeoff between readability and ease-of-parsing in a separate file. The easiest to parse would probably be to just write a stream of all the padded bytes for each character but that wouldn't be human readable at all. We could print them as characters which makes it easier to see what characters are supported but it also makes it look like entire Unicode code blocks are supported because most characters in that block are supported. For example, 98 characters in the "Greek" Unicode block are supported C99 identifiers (such as "Ϛ") but 12 characters aren't supported (such as "ϛ").

I feel that to make something that's more readable than in-source character ranges with comments we need to write a small one-off parser and tests for that parser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right a runtime config file would be a bad idea for all the reasons you have here. Maybe just moving these large switch statements into a separate Swift file called something like PathHierarchy+ValidC99ExtendedIdentifier or whatever would remove all the verbose code from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of moving it to a separate file is that it would need to be internal making it available everywhere in DocC whereas now it's only available in this file. That's a very small thing but I like to keep very specialized single use API private in the same file and only move it out if there's other places that need that functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one day I'll find the time to break up SwiftDocC into multiple smaller modules and then each piece would have better control over what API it exposes to other pieces 🤞

@daniel-grumberg
Copy link
Contributor

the allowed characters in Swift operators (C and Objective-C can't define custom operators).

This is not entirely true anymore as we lump C++/Objective-C++ together with Objective-C variants and those languages support operator overloading including the division operator.

@d-ronnqvist
Copy link
Contributor Author

the allowed characters in Swift operators (C and Objective-C can't define custom operators).

This is not entirely true anymore as we lump C++/Objective-C++ together with Objective-C variants and those languages support operator overloading including the division operator.

Do you know what the allowed characters are for operators in C++/Objective-C++ operators? Are there characters that C++/Objective-C++ supports in operators that Swift doesn't?

@@ -10,7 +10,7 @@

import Foundation

private let nonAllowedPathCharacters = CharacterSet.urlPathAllowed.inverted
private let nonAllowedPathCharacters = CharacterSet.urlPathAllowed.inverted.union(["/"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ supports overloading the following operators:
+ - * / % ^ & | ~ ! = < > += -= *= /= %= ^= &= |= << >> >>= <<= == != <= >= <=> && || ++ -- , ->* -> ( ) [ ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to comment on another line? This comment doesn't seem to be about this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, assuming that the names for those operators include at least (), all of them parse as expected with the current code in this PR.


I'm making that assumption because operator definitions in C++ has parameters, for example:

void operator ++ () { ... }
MyNumber operator + (const MyNumber& obj) { ... }
MyNumber operator - () { ... }
MyNumber operator - (const MyNumber& obj) { ... }

so I would expect the parameters to be included in the symbol name just like they'd be for parameters to functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daniel and I talked about this some more offline and agreed to make further changes for C++ operators in a follow up PR. I filed rdar://119616317 to track doing this.

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit efcc793 into swiftlang:main Dec 13, 2023
2 checks passed
d-ronnqvist added a commit to d-ronnqvist/swift-docc that referenced this pull request Dec 14, 2023
* Support escaping forward slash in symbol links

rdar://112555102

* Support operators with "/" without escaping

rdar://112555102

* Remove need to pass original link string to create error info

* Replace "/" in symbol names with "_" in resolved topic references
d-ronnqvist added a commit that referenced this pull request Dec 14, 2023
* Support escaping forward slash in symbol links

rdar://112555102

* Support operators with "/" without escaping

rdar://112555102

* Remove need to pass original link string to create error info

* Replace "/" in symbol names with "_" in resolved topic references
@d-ronnqvist d-ronnqvist deleted the forward-slash-in-symbol-link branch February 19, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't reference forward-slash operator functions
3 participants