Non-ASCII characters in Objective-C produce incorrect output #114

1ec5 · 2015-12-08T22:45:26Z

If I run sourcekitten doc --objc MGLMapView.h -x objective-c -isysroot $(xcrun --show-sdk-path) -I $(pwd) with the following (stripped-down) contents of MGLMapView.h, I get the output I expect:

#import <Foundation/Foundation.h>

/**
 ’’’
 */
@interface MGLMapView : NSObject

#pragma mark Initializing a Map View

/**
 Initializes and returns a newly allocated map view with the specified frame and the default style.
 */
- (instancetype)initWithFrame:(NSRect)frame;

#pragma mark Manipulating the Visible Portion of the Map

/**
 The coordinate bounds visible in the receiver’s viewport.
 */
@property (nonatomic) int visibleCoordinateBounds;

#pragma mark Converting Map Coordinates

/**
 Converts a point in the specified view’s coordinate system to a map coordinate.
 */
- (int)convertPoint:(NSPoint)point toCoordinateFromView:(nullable NSObject *)view;

/**
 Converts a map coordinate to a point in the specified view.

 @param view The view in whose coordinate system you want to locate the specified map coordinate. If this parameter is `nil`, the returned point is specified in the window’s coordinate system. If `view` is not `nil`, it must belong to the same window as the map view.
 */
- (CGPoint)convertCoordinate:(int)coordinate toPointToView:(nullable NSObject *)view;

#pragma mark Styling the Map

/**
 URL of the style currently displayed in the receiver.

 The URL may be a full HTTP or HTTPS URL or a Mapbox URL indicating the style’s map ID (`mapbox://styles/my_user_name/abcd1234`).
 */
@property (nonatomic, null_resettable) NSURL *styleURL;

/**
 Returns a Boolean value indicating whether the style class with the given identifier is currently active.
 */
- (BOOL)hasStyleClass:(NSString *)styleClass;

#pragma mark Annotating the Map

/**
 The complete list of annotations associated with the receiver. (read-only)
 */
@property (nonatomic, readonly, nullable) NSArray *annotations;

#pragma mark Annotating the Map 2

/**
 The complete list of annotations associated with the receiver. (read-only)
 */
@property (nonatomic, readonly, nullable) NSArray *annotations2;

@end

However, if I add one more Unicode character (such as ’ or ≠) to the MGLMapView class’s documentation comment, #pragma mark sections go missing and character indices get knocked out of place:

8c8
<           "key.doc.comment" : "’’’",

---
>           "key.doc.comment" : "’’’’",
117c117
<               "key.doc.line" : 36,

---
>               "key.doc.line" : 35,
119c119
<               "key.parsed_scope.start" : 36,

---
>               "key.parsed_scope.start" : 35,
121c121
<               "key.parsed_scope.end" : 36,

---
>               "key.parsed_scope.end" : 35,
153,162d152
<               "key.kind" : "sourcekitten.source.lang.objc.mark",
<               "key.doc.file" : "\/Users\/mxn\/Desktop\/MGLMapView.h",
<               "key.doc.line" : 49,
<               "key.name" : "Annotating the Map",
<               "key.parsed_scope.start" : 49,
<               "key.doc.column" : 1,
<               "key.parsed_scope.end" : 49,
<               "key.filepath" : "\/Users\/mxn\/Desktop\/MGLMapView.h"
<             },
<             {
179c169
<               "key.doc.line" : 56,

---
>               "key.doc.line" : 55,
181c171
<               "key.parsed_scope.start" : 56,

---
>               "key.parsed_scope.start" : 55,
183c173
<               "key.parsed_scope.end" : 56,

---
>               "key.parsed_scope.end" : 55,

As far as I can tell, the issue seems to be that String.pragmaMarks(_:excludeRanges:limitRange:) in String+SourceKitten.swift works with NSString indices, whereas clang_getSpellingLocation() appears to be reporting byte offsets.

/cc @friedbunny

The text was updated successfully, but these errors were encountered:

1ec5 · 2015-12-08T23:29:58Z

I’m pretty sure realm/jazzy#370 has the same root cause: that Clang is reporting byte offsets into the file.

jpsim · 2015-12-09T00:17:31Z

Hi @1ec5, yes we recently fixed a similar issue in SwiftLint: realm/SwiftLint#247

We should do a thorough review of all code in SourceKitten that deals with byte offsets to use String.byteRangeToNSRange(...).

Fixes jpsim#114.

1ec5 mentioned this issue Dec 8, 2015

Generate iOS API documentation using jazzy mapbox/mapbox-gl-native#3203

Merged

1ec5 changed the title ~~Marked sections go missing when too many non-ASCII characters appear in Objective-C comment blocks~~ Non-ASCII characters in Objective-C produce incorrect output Dec 8, 2015

1ec5 added a commit to 1ec5/SourceKitten that referenced this issue Dec 9, 2015

Fixed offset issues looking for Objective-C marks

8d2ce96

Fixes jpsim#114.

1ec5 mentioned this issue Dec 9, 2015

Objective-C marks and non-ASCII characters #115

Merged

jpsim closed this as completed in #115 Dec 9, 2015

WFT mentioned this issue Jul 18, 2016

Syntax command returns byte offset, not character offset #228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-ASCII characters in Objective-C produce incorrect output #114

Non-ASCII characters in Objective-C produce incorrect output #114

1ec5 commented Dec 8, 2015

1ec5 commented Dec 8, 2015

jpsim commented Dec 9, 2015

Non-ASCII characters in Objective-C produce incorrect output #114

Non-ASCII characters in Objective-C produce incorrect output #114

Comments

1ec5 commented Dec 8, 2015

1ec5 commented Dec 8, 2015

jpsim commented Dec 9, 2015