Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Standardise the representation of byte sequence as a sequence of unsigned 8 bit integers #2099

Merged
merged 2 commits into from
Apr 18, 2024

Conversation

mzaks
Copy link
Contributor

@mzaks mzaks commented Mar 31, 2024

See the byte-as-uint8.md file

@ConnorGray ConnorGray changed the title Standardise the representation of byte sequence as a sequence of unsigned 8 bit integers [Proposal] Standardise the representation of byte sequence as a sequence of unsigned 8 bit integers Apr 5, 2024
Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for going through the proposal process for this! We're going to discuss this internally within the team this coming week and we'll circle back here. Just wanted to give you a quick update here to let you know we haven't forgotten or missed this.

In general, I'm +1 for this — we're just trying to understand some of the existing history for the rationale why Int8 was chosen for String to begin with to help guide our decision on how we want to move forward.

@JoeLoser JoeLoser added needs-discussion Need discussion in order to move forward stdlib-proposal Standard Library Proposals labels Apr 6, 2024
# Motivation

Logically a byte is an integer value between `0` and `255`. Lots of algorithms make use of arithmetics ground by this assumption.
A signed 8 bit integer on the contrary represents values between `-128` and `127`. This introduces very settle bugs, when an algorithm written for unsigned 8 bit integer is used on a signed 8 bit integer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion s/settle/subtle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzaks after this small typo is fixed, happy to land this 👍

@JoeLoser
Copy link
Collaborator

Thanks so much for going through the proposal process! We as a team really appreciate the thought and motivation presented, and are happy to accept this proposal! 🎉 🎉 🎉

Internally, we went back and forth whether to add an alias Byte = UInt8 and don't feel strongly either way when the technical implementation for this proposal happens. Also, to be explicit about our existing commitments and priorities — we'd love if the community is interested in changing the Int8 to UInt8 in the open-source parts of the standard library, and we're happy to do the rest for the closed source modules when we import said PR. Otherwise, this isn't on our immediate radar given pre-existing internal commitments.

Regarding the alias part, since an alias is a weak type, it's only for readability sake/to denote we're conceptually working with "bytes" since you can't overload functions/APIs on weak types, and we don't yet have a strong Bytes type (and there's no need to couple this proposal with #2096 for example).

Congrats again on the first standard library proposal to be accepted! Now that we have a process internally for discussing these, we expect to be able to triage and discuss these in a much more timely fashion going forward. Thanks for your patience with this one.

@JoeLoser JoeLoser removed the needs-discussion Need discussion in order to move forward label Apr 17, 2024
@gabrieldemarmiesse
Copy link
Contributor

I made #2317 to track the progress of the implementation

…gned 8 bit integers

Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
rephrase as suggested

Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
@mzaks
Copy link
Contributor Author

mzaks commented Apr 18, 2024

Hm, why is the first commit not verified? @JoeLoser do you have an idea?

@JoeLoser JoeLoser merged commit 4688996 into modularml:main Apr 18, 2024
2 checks passed
@JoeLoser
Copy link
Collaborator

Hm, why is the first commit not verified? @JoeLoser do you have an idea?

Looks like one was signed with your GPG key and one was not. The GPG signing isn't required, so I went ahead and merged this. We only require a sign-off using git commit -s to sign your commits (which is different than verifying your commits). In any case, the final squashed and merged commit as seen here shows it as GPG-verified, so that's good if that's what you were wanting. 😄

@JoeLoser
Copy link
Collaborator

FYI I realized this was targeting the main branch and not nightly, but snuck in since it wasn't rebased where we have CI checking it targets the nightly branch. I'm happy to open a PR for this to target the nightly branch as well later today unless you beat me to it.

The main branch will automatically get synced each time we make a new MAX release and it'll pick up all the things in the nightly branch effectively that made it into the stable MAX release.

JoeLoser pushed a commit to JoeLoser/mojo that referenced this pull request Apr 20, 2024
…s a sequence of unsigned 8 bit integers (modularml#2099)

Add a proposal to standardize the representation of byte sequence as a sequence of unsigned 8 bit integers.

Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>

(cherry picked from commit 4688996)
JoeLoser added a commit that referenced this pull request Apr 20, 2024
…s a sequence of unsigned 8 bit integers (#2099) (#2347)

Add a proposal to standardize the representation of byte sequence as a
sequence of unsigned 8 bit integers.

Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>

(cherry picked from commit 4688996)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib-proposal Standard Library Proposals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants