Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule Request: Prefer non-optional UTF8 String <-> Data conversion #5263

Closed
2 tasks done
samrayner opened this issue Oct 4, 2023 · 9 comments · Fixed by #5264
Closed
2 tasks done

Rule Request: Prefer non-optional UTF8 String <-> Data conversion #5263

samrayner opened this issue Oct 4, 2023 · 9 comments · Fixed by #5264

Comments

@samrayner
Copy link
Contributor

samrayner commented Oct 4, 2023

New Issue Checklist

New rule request

Please describe the rule idea, format
this issue's title as Rule Request: [Rule Name] and describe:

  1. Why should this rule be added? Share links to existing discussion about what
    the community thinks about this.

String.data(using:) and String.init(data:encoding:) return optional values due to supporting a large number of encodings, some of which can fail conversion.

However, conversion of UTF8 encoded Strings cannot fail. E.g. Data("foo".utf8) and String(decoding: Data(), as: UTF8.self) do not return Optionals.

https://www.swiftbysundell.com/tips/converting-between-string-and-data-without-optionals/

  1. Provide several examples of what would and wouldn't trigger violations.

Would trigger violation:

"foo".data(using: .utf8) // Data?
String(data: data, encoding: .utf8) // String?

Would not trigger violation:

Data("foo".utf8) // Data
String(decoding: data, as: UTF8.self) // String
  1. Should the rule be configurable, if so what parameters should be configurable?

I don't think any configuration is necessary.

  1. Should the rule be opt-in or enabled by default? Why?
    See README.md for guidelines on when to mark a rule as opt-in.

I suggest this can be on by default. I think non-optional is always preferable to optional where it is completely safe. There is no sensible handling that one can do for an optional returned using the violating methods as that code will never run.

@ben-p-commits
Copy link
Contributor

ben-p-commits commented Oct 4, 2023

I think there's an issue with the .utf16 examples. The proposed Data("".utf16) example isn't supported by Foundation's Data struct.

// works, an initializer that supports Sequence<UInt8> exists
let data = Data("foo".utf8)
            
// compiler error. no initializer for Data exists that supports Sequence<UInt16> out of the box
let data = Data("foo".utf16)

@samrayner samrayner changed the title Rule Request: Prefer non-optional String <-> Data conversion Rule Request: Prefer non-optional UTF8 String <-> Data conversion Oct 4, 2023
@samrayner
Copy link
Contributor Author

Whoops! Thanks @ben-p-commits ! Reduced the scope of the request to just UTF8.

@ben-p-commits
Copy link
Contributor

@samrayner - wrapped up the day with this: #5264

@ortekka
Copy link

ortekka commented May 16, 2024

I think this rule is misguided, dangerous, and based on some confusion around conversion between strings and byte sequences. While it is true that any String can be utf-8-encoded, it is not true that any byte sequence is a valid utf-8 sequence. Consider for instance the 3-byte sequence [0xe2, 0x28, 0xa1]:

	let data = Data([0xe2, 0x28, 0xa1])

This is not a valid utf-8 sequence. If you try to convert it to a String using String(data:encoding:), you will obtain nil, which correctly signals that the input data is malformed. On the other hand, if you try to convert it to a String using:

	let string = String(decoding: data, as: UTF8.self)

you will obtain the string \u{fffd}(\u{fffd}, where the U+FFFD REPLACEMENT CHARACTER is used as a placeholder for malformed portions of the input sequence. Note that if you convert this string back to Data, you get the 7-byte sequence [0xef, 0xbf, 0xbd, 0x28, 0xef, 0xbf, 0xbd] that bears no resemblance to the initial 3-byte sequence.

In other words, using String(decoding: data, as: UTF8.self) on malformed input data amounts to sweeping errors under the rug, leaving downstream code to deal with the consequences. If data comes from an outside source you have no control over (e.g., from the network), this new rule may end up encouraging developers to remove code that sanitizes its input. This could even lead to security issues. I'm with Randall Munroe on this one: always sanitize your inputs.

@samrayner
Copy link
Contributor Author

Thanks for the correction @ortekka ! Based on what you've said it sounds like String(decoding: data, as: UTF8.self) is buggy or at least badly documented and we should instead have a rule doing the reverse of my request!

Do you see any issues with using Data("foo".utf8) over "foo".data(using: .utf8)?

@ortekka
Copy link

ortekka commented May 20, 2024

@samrayner No, I don't see any issues with those two examples, because in that case, Data is being created from a String literal using utf-8, which is one of several Character Encoding Schemes for Unicode, and as such, it covers the whole Unicode repertoire, which is a long-winded way to say, any String in Swift can be represented as utf-8 Data.

My issue and warning has to do with the assumption that you can take an arbitrary byte sequence (as a Data instance) and assume that you can always treat it as a well-formed utf-8 sequence and turn it into a String. This, unfortunately, is not the case, and it's a good thing that String(data:encoding:) returns nil when this happens.

@raggi
Copy link

raggi commented May 23, 2024

Trip report: I don't build swift code super often, but had happened to have updated to a version with this in which then broke my build, and I uninstalled swiftlint to work around it. Would love to see this removed from the homebrew versions soon (as it's bad advice, as @ortekka has explained).

@samrayner
Copy link
Contributor Author

samrayner commented May 23, 2024

I've done my best to introduce a new rule that reverses the misguided half of my suggestion, leaving the other half intact. Please take a look if you get a chance @ortekka #5601.

@andrrrr
Copy link

andrrrr commented May 29, 2024

I think this rule is too restrictive, consider this code:
guard let jsonRequest = try? JSONEncoder().encode(object),
let str = String(data: jsonRequest, encoding: .utf8) else { return }

This triggers swiftlint warning, however due to swift guard mechanism, if any of the optional unwrap fails, code will not be executed further.
Either guard, or if let (optional unwrap) should not trigger this warning, as developer should then decide what to do further.

github-merge-queue bot pushed a commit to mozilla/application-services that referenced this issue May 30, 2024
Hopefully addresses the swift-lint errors.

Fix via realm/SwiftLint#5263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants