Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add larger than and larger than or equal operator support to String #2346

Open
1 task done
VMois opened this issue Apr 20, 2024 · 8 comments
Open
1 task done
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers mojo-repo Tag all issues with this label

Comments

@VMois
Copy link

VMois commented Apr 20, 2024

Review Mojo's priorities

What is your request?

Add larger than (lt) and larger than or equal (le) operator support to String.

What is your motivation for this change?

Currently, it is not possible to compare Strings easily. It is very useful when sorting Strings.

Any other details?

No response

@VMois VMois added enhancement New feature or request mojo Issues that are related to mojo labels Apr 20, 2024
@JoeLoser JoeLoser added good first issue Good for newcomers mojo-stdlib Tag for issues related to standard library labels Apr 21, 2024
@zhoujingya
Copy link

zhoujingya commented Apr 22, 2024

maybe I can work on this one if upstream think this work is worth?

@VMois
Copy link
Author

VMois commented Apr 22, 2024

@zhoujingya, I have already made a snippet (needed it for a project to compare strings). Do you want me to share it with you, or would you prefer to do it yourself? My snippet is very simple. You might want to see if it is possible to use SIMDs when comparing.

@VMois
Copy link
Author

VMois commented Apr 22, 2024

maybe I can wok on this one if upstream think this work is worth?

I think @JoeLoser already marked it as a good first issue which means you are free to take it.

@zhoujingya
Copy link

zhoujingya commented Apr 22, 2024

@VMois wow, very appreciate for your snippets.thanks

@VMois VMois removed their assignment Apr 22, 2024
@VMois
Copy link
Author

VMois commented Apr 22, 2024

Here is what I used as a quick fix. I would double-check that :) And definitely look at SIMDs.

@always_inline
fn __lt__(self, other: String) -> Bool:
    var p1 = self._buffer
    var p2 = other._buffer
    var min_len = len(self)
    if len(other) < min_len:
        min_len = len(other)
    for i in range(min_len):
        var ai = p1[i].cast[DType.uint8]()
        var bi = p2[i].cast[DType.uint8]()
        if ai > bi:
            return False
        if ai < bi:
            return True
    return len(self) <= len(other)

@always_inline
fn __le__(self, other: String) -> Bool:
    return self < other or self == other

@zhoujingya
Copy link

@VMois ,so maybe SIMDs is another issue?

@VMois
Copy link
Author

VMois commented Apr 22, 2024

@VMois ,so maybe SIMDs is another issue?

Up to you. You can do a simple version or look at SIMD. I am just a random guy :)

@siitron
Copy link

siitron commented Apr 22, 2024

Why not just use memcmp? String's __eq__ already does and I believe that's what Python does too.

I was thinking something like:

struct String(Sized, Stringable, IntableRaising, KeyElement, Boolable):
    """Represents a mutable string."""

    alias _buffer_type = List[Int8]
    var _buffer: Self._buffer_type
    """The underlying storage for the string."""

    ...

    @always_inline
    fn __lt__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using LT comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is strictly less than the RHS String and False otherwise.
        """
        var len1 = len(self)
        var len2 = len(rhs)
        var cmp = memcmp(self._as_ptr(), rhs._as_ptr(), min(len1, len2))

        return cmp < 0 or cmp == 0 and len1 < len2
    
    @always_inline
    fn __le__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using LE comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is less than or equal to the RHS String and False otherwise.
        """
        return not (rhs < self)
    
    @always_inline
    fn __gt__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using GT comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is strictly greater than the RHS String and False otherwise.
        """
        return rhs < self
    
    @always_inline
    fn __ge__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using GE comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is greater than or equal to the RHS String and False otherwise.
        """
        return not (self < rhs)
    
    ...

I haven't testing this myself, it's just an idea :)
Also, this might need to change when Mojo properly handles unicode.
If we want to add this to String then something similar can be done with StringLiteral too

@linear linear bot removed mojo Issues that are related to mojo mojo-stdlib Tag for issues related to standard library labels Apr 29, 2024
@ematejska ematejska added the mojo-repo Tag all issues with this label label Apr 29, 2024
JoeLoser pushed a commit to JoeLoser/mojo that referenced this issue May 11, 2024
[External] [stdlib] String comparisons implemented

For issue modularml#2346 (as an alternative to modularml#2378). All four comparisons
(`__lt__`, `__le__`, `__gt__`, & `__ge__`) uses a single `__lt__`
comparison (instead of checking less/greater than + potentially another
"equals to"-check, for `__le__` & `__ge__`). Sorry if this is considered
a duplicate PR, I only meant to give an alternative suggestion. This is
my first ever PR on GitHub.

StringLiterals also get comparisons.

ORIGINAL_AUTHOR=Simon Hellsten
<56205346+siitron@users.noreply.github.com>
PUBLIC_PR_LINK=modularml#2409

---------

Co-authored-by: Simon Hellsten <56205346+siitron@users.noreply.github.com>
Closes modularml#2409
MODULAR_ORIG_COMMIT_REV_ID: b2ed4756c2741fd27387fa295515f4a7222e0ca5
JoeLoser pushed a commit that referenced this issue May 11, 2024
[External] [stdlib] String comparisons implemented

For issue #2346 (as an alternative to #2378). All four comparisons
(`__lt__`, `__le__`, `__gt__`, & `__ge__`) uses a single `__lt__`
comparison (instead of checking less/greater than + potentially another
"equals to"-check, for `__le__` & `__ge__`). Sorry if this is considered
a duplicate PR, I only meant to give an alternative suggestion. This is
my first ever PR on GitHub.

StringLiterals also get comparisons.

ORIGINAL_AUTHOR=Simon Hellsten
<56205346+siitron@users.noreply.github.com>
PUBLIC_PR_LINK=#2409

---------

Co-authored-by: Simon Hellsten <56205346+siitron@users.noreply.github.com>
Closes #2409
MODULAR_ORIG_COMMIT_REV_ID: b2ed4756c2741fd27387fa295515f4a7222e0ca5
lsh pushed a commit to lsh/mojo that referenced this issue May 17, 2024
[External] [stdlib] String comparisons implemented

For issue modularml#2346 (as an alternative to modularml#2378). All four comparisons
(`__lt__`, `__le__`, `__gt__`, & `__ge__`) uses a single `__lt__`
comparison (instead of checking less/greater than + potentially another
"equals to"-check, for `__le__` & `__ge__`). Sorry if this is considered
a duplicate PR, I only meant to give an alternative suggestion. This is
my first ever PR on GitHub.

StringLiterals also get comparisons.

ORIGINAL_AUTHOR=Simon Hellsten
<56205346+siitron@users.noreply.github.com>
PUBLIC_PR_LINK=modularml#2409

---------

Co-authored-by: Simon Hellsten <56205346+siitron@users.noreply.github.com>
Closes modularml#2409
MODULAR_ORIG_COMMIT_REV_ID: b2ed4756c2741fd27387fa295515f4a7222e0ca5

Signed-off-by: Lukas Hermann <lukashermann28@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

5 participants