-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bugs with symbol links to /(_:_:)
operators
#717
Fix bugs with symbol links to /(_:_:)
operators
#717
Conversation
@swift-ci please test |
This comment was marked as outdated.
This comment was marked as outdated.
rdar://112555102
22ea004
to
209112f
Compare
This comment was marked as resolved.
This comment was marked as resolved.
/(_:_:)
and -(_:_:)
operators/(_:_:)
operators
# Conflicts: # Tests/SwiftDocCTests/Infrastructure/DocumentationContext/DocumentationContextTests.swift # Tests/SwiftDocCTests/Infrastructure/PathHierarchyTests.swift
rdar://112555102
There's no measurable performance impact from these changes
|
@swift-ci please test |
d6140ff
to
a80f56f
Compare
@swift-ci please test |
/(_:_:)
operators/(_:_:)
operators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I love the use of Substring slices my guess is this will be faster than what we had before, in addition to fixing bugs.
} | ||
|
||
enum PathParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of this Swift idiom, which I've seen a few times now: enum EmptyEnum { }
and then later extension EmptyEnum { ...functions defined here... }
Why not just create a normal class or struct and directly add functions to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I mainly did it to avoid the extra indentation. By using extension OuterType.InnerType { ... }
the functions defined on InnerType
are only indented once instead of twice. This mostly doesn't matter unless the API is added to a deeply nested type.
However, since I was moving API from PathHierarchy
into PathHierarchy.PathParser
while also modifying it, I wanted to preserve the unchanged lines in the diff which wouldn't have worked if the whole code block was intended more than it was before.
|
||
mutating func scanPathComponent() -> Substring { | ||
// If the next component is an operator, parse the full operator before splitting on "/" ("/" may appear in the operator name) | ||
if remaining.unicodeScalars.prefix(3).allSatisfy(\.isValidOperatorHead) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the value 3 come from? Can you calculate this constant at compile time somehow? (I'm guessing it's the maximum length of the string literal cases in isValidOperatorHead
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment to this effect (someday... not urgent) - or define a let constant with a descriptive name, so the number is less magical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably revisit this piece of code for rdar://119616317
let isAbsolute: Bool | ||
if path.first == PathComponentScanner.separator { | ||
guard let maybeModuleName = components.first?.dropFirst(), !maybeModuleName.isEmpty else { | ||
return ([], true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this line be reached? for a path that begins with //
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reached if the path just a single /
. If that path was //
there would still be a non-empty string after dropping the leading slash.
return ([], true) | ||
} | ||
isAbsolute = maybeModuleName.isValidModuleName() | ||
assert(NodeURLGenerator.Path.documentationFolderName.isValidModuleName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these assert calls achieve? Just a sanity check that the logic in isValidModuleName
is not broken somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more to check that we don't change NodeURLGenerator.Path.documentationFolderName
and add some unexpected characters.
guard let maybeModuleName = components.first?.dropFirst(), !maybeModuleName.isEmpty else { | ||
return ([], true) | ||
} | ||
isAbsolute = maybeModuleName.isValidModuleName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rely on isValidModuleName
distinguishing between operators and modules just based on the type of characters in the path component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the path splitting will return ["/ModuleName"]
if the path was /ModuleName
and ["/(_:_:)"]
if the path was /(_:_:)
. To distinguish between an absolute (starting with a leading slash) reference the module and a relative to the division operator we check if the rest of the path component is a valid module name.
// If the next component is an operator, parse the full operator before splitting on "/" ("/" may appear in the operator name) | ||
if remaining.unicodeScalars.prefix(3).allSatisfy(\.isValidOperatorHead) { | ||
guard let operatorEndIndex = remaining.firstIndex(of: Self.operatorEnd) else { | ||
defer { remaining.removeAll() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very clever how you call different defer
closures depending on the contents of the string and the logic here and how you update the remaining
value after determining the return value (if I'm reading this correctly). Nice!
switch value { | ||
case | ||
// A-Z | ||
0x0041...0x005A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we can save these unicode values/characters into a config file elsewhere? That would simplify this source code file a lot.
Or would the overhead of opening that config file be too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion a separate config file would be more complicated and likely less readable.
The biggest complication would be that we'd have a resource that we need to ship alongside the docc
executable. This is a fairly minor thing but it does mean that we need to update build-script-helper.py
to copy this config file into the install directory and DocC needs to handle the cases when this config file is missing or not formatted correctly.
There's a tradeoff between readability and ease-of-parsing in a separate file. The easiest to parse would probably be to just write a stream of all the padded bytes for each character but that wouldn't be human readable at all. We could print them as characters which makes it easier to see what characters are supported but it also makes it look like entire Unicode code blocks are supported because most characters in that block are supported. For example, 98 characters in the "Greek" Unicode block are supported C99 identifiers (such as "Ϛ") but 12 characters aren't supported (such as "ϛ").
I feel that to make something that's more readable than in-source character ranges with comments we need to write a small one-off parser and tests for that parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right a runtime config file would be a bad idea for all the reasons you have here. Maybe just moving these large switch statements into a separate Swift file called something like PathHierarchy+ValidC99ExtendedIdentifier or whatever would remove all the verbose code from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of moving it to a separate file is that it would need to be internal
making it available everywhere in DocC whereas now it's only available in this file. That's a very small thing but I like to keep very specialized single use API private in the same file and only move it out if there's other places that need that functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one day I'll find the time to break up SwiftDocC
into multiple smaller modules and then each piece would have better control over what API it exposes to other pieces 🤞
This is not entirely true anymore as we lump C++/Objective-C++ together with Objective-C variants and those languages support operator overloading including the division operator. |
Do you know what the allowed characters are for operators in C++/Objective-C++ operators? Are there characters that C++/Objective-C++ supports in operators that Swift doesn't? |
@@ -10,7 +10,7 @@ | |||
|
|||
import Foundation | |||
|
|||
private let nonAllowedPathCharacters = CharacterSet.urlPathAllowed.inverted | |||
private let nonAllowedPathCharacters = CharacterSet.urlPathAllowed.inverted.union(["/"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ supports overloading the following operators:
+ - * / % ^ & | ~ ! = < > += -= *= /= %= ^= &= |= << >> >>= <<= == != <= >= <=> && || ++ -- , ->* -> ( ) [ ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to comment on another line? This comment doesn't seem to be about this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, assuming that the names for those operators include at least ()
, all of them parse as expected with the current code in this PR.
I'm making that assumption because operator definitions in C++ has parameters, for example:
void operator ++ () { ... }
MyNumber operator + (const MyNumber& obj) { ... }
MyNumber operator - () { ... }
MyNumber operator - (const MyNumber& obj) { ... }
so I would expect the parameters to be included in the symbol name just like they'd be for parameters to functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daniel and I talked about this some more offline and agreed to make further changes for C++ operators in a follow up PR. I filed rdar://119616317 to track doing this.
@swift-ci please test |
* Support escaping forward slash in symbol links rdar://112555102 * Support operators with "/" without escaping rdar://112555102 * Remove need to pass original link string to create error info * Replace "/" in symbol names with "_" in resolved topic references
* Support escaping forward slash in symbol links rdar://112555102 * Support operators with "/" without escaping rdar://112555102 * Remove need to pass original link string to create error info * Replace "/" in symbol names with "_" in resolved topic references
Bug/issue #, if applicable: rdar://112555102 #714
Summary
This fixes two bugs where symbols with "/" in the name couldn't be linked to and where their data files in the output archive could collide with other pages's data files.
To avoid needing to escape slashes this fix relies some rules about
These rules means that
``/Something``
can unambiguously be parsed as an absolute link to the "Something" module (excluding the slash from the path component) whereas``/(_:_:)``
can unambiguously be parsed as a relative symbol link to the/(_:_:)
operator (including the slash in the path component).It also means that
``+/-(_:_:)``
can unambiguously be parsed as a single path component because operators need at least argument.Dependencies
None
Testing
/(_:_:)
operators, for exampleLink to
``MyNumber//(_:_:)``
and``MyNumber//=(_:_:)``
from anywhere in the projectand to
``/(_:_:)``
and``/=(_:_:)``
from MyNumber.Build documentation for the project. Both links above should resolve successfully.
Inspect the "data" directory in the .doccarchive. Both division operators should have file names that start with "_". You can verify this by not seeing files named "(_:_:).json" and "=(_:_:).json" (as those files names would be if the leading slash was dropped from the file name)
Checklist
Make sure you check off the following items. If they cannot be completed, provide a reason.
./bin/test
script and it succeeded