Implement LocatedSpan::get_line(). #66

kaj · 2020-10-16T18:10:59Z

Add a function to get the full input line containing the (start point of the) LocatedSpan.

As suggested in #53.

Add a function to get the full input line containing the (start point of the) LocatedSpan. As suggested in fflorent#53.

progval

Thanks!

Could you add some tests for this, checking edge cases? (first/last line, beginning/end of line, etc.)

progval · 2020-10-16T19:45:33Z

src/lib.rs

+        let self_bytes = self.fragment.as_bytes();
+        let self_ptr = self_bytes.as_ptr();
+        let offset = self.get_column() - 1;
+        let the_line = unsafe {
+            assert!(
+                offset <= isize::max_value() as usize,
+                "offset is too big"
+            );
+            let line_start_ptr = self_ptr.offset(-(offset as isize));
+            slice::from_raw_parts(line_start_ptr, offset + self_bytes.len())
+        };


I feel like this code could/should be shared with get_columns_and_bytes_before. Could you add a private (unsafe) function for this?

kaj · 2020-10-16T19:54:02Z

Yes, I'll write some tests and then I'll see if the unsafe code can be unified.

progval · 2020-10-16T20:02:42Z

Actually, I'm not sure, but isn't the_line equal to get_columns_and_bytes_before().1?

kaj · 2020-10-16T20:54:04Z

Actually, I'm not sure, but isn't the_line equal to get_columns_and_bytes_before().1?

Not quite, as the_line may continue after the thing that is get_columns_and_bytes_before().1. If you look at the (newly added) line_for_non_ascii_chars test, i think get_columns_and_bytes_before().1 would be "Förra raden var ".

progval

Does get_line never return None?

src/tests.rs

kaj · 2020-10-16T21:01:42Z

Does get_line never return None?

Actually, as I was writing tests, I found myself asking me the same question. I don't think it does, the option is accidental. I'll get rid of it.

progval · 2020-10-16T21:03:39Z

Not quite, as the_line may continue after the thing that is get_columns_and_bytes_before().1. If you look at the (newly added) line_for_non_ascii_chars test, i think get_columns_and_bytes_before().1 would be "Förra raden var ".

Oh yeah, indeed

kaj · 2020-10-16T21:11:44Z

It's getting late in my time zone, so I'll go to sleep now. I'll look at refactoring and any further questions later in the weekend.

progval · 2020-10-16T21:17:43Z

looks good, just need to deduplicate that code now

kaj · 2020-10-17T22:31:20Z

I've been afk most of the day, it's time for bed in my tz, and I'm not entirely sober, but I've attempted a refactoring of the two similar unsafe blocks to one. Can look more at it in ten hours or so ...

progval

Don't worry, we're in no hurry.

I just realized that this code doesn't work if LocatedSpans are sliced with a right bound.
For example, this fails:

#[test]
fn line_of_word_in_middle() {
    let data = "One line of text\
         \nFollowed by a second\
         \nand a third\n";
    assert_eq!(
        StrSpan::new(data)
            .slice(data.find('\n').unwrap()..data.find('\n').unwrap()+5)
            .get_line(),
        "Followed by a second".as_bytes(),
    );
}

because self.get_unoffsetted_slice() only returns the bytes before the LocatedSpan and the bytes in the LocatedSpan itself, but none of the bytes after, even if they are on the same line.

Unfortunately, I don't see a way out of this without including the size of the original &[u8] in every LocatedSpan; but it would increase the size of LocatedSpan<_, ()> from 20 to 28 bytes (probably 24 to 32 including the padding).
But this seems costly just to provide a convenience function that can already be implemented by users themselves with safe code.

What do you think?

progval · 2020-10-18T08:20:07Z

src/lib.rs

@@ -257,17 +257,24 @@ impl<T: AsBytes, X> LocatedSpan<T, X> {
        &self.fragment
    }

-    fn get_columns_and_bytes_before(&self) -> (usize, &[u8]) {
+    fn get_unoffsetted_slice(&self) -> &[u8] {


Hmm, the name doesn't make it clear what this does, but I can't think of a better one. :/
Also some comment to explain what it does would be good

Yes, I considered get_original_size, as that is pretty much the intent. But I didn't, since it has no way of reconstructing the original length. So unoffsetted is what it actually does, it undos the offset.

As for the larger question about a possibly missing trailer of get_line(); Yes, I was a bit worried about that myself, but decided that it's not a big problem in my main use-case of reporting parse errors. In cases I can think of that involves marking an interval (and not just a position) of a line, I think I would have two LocatedSpans to combine, where each of them would (probably) be "the rest of input from a starting point".

But I agree that this should be explained somehow, both in i a comment at get_unoffsetted_slice and in the docstring of get_line. I'll try to write something.

kaj · 2020-10-18T11:07:49Z

On the other hand, the "original" adress and length won't change while the parser creates all those subslices that are LocatedSpans. So maybe it would make sense for a LocatedSpan to be a slice and a reference to the original slice (since it already borrows the actual data from the original slice, I think lifetimes should not be a problem). The line and offset could be calculated on demand, instead of the other way around. That way, a LocatedSpan should fit in 24 bytes (plus 8*sizeof(extra)/8). Which should be eight bytes (line: u32 + alignment) less than the current size.

But that is a quite big change ...

progval · 2020-10-18T11:56:07Z

Well that's what I was thinking of, minus removing the line (which I don't think we should, because it makes get_line run in linear time instead)

progval · 2020-10-18T11:58:58Z

What about adding a function (or method of LocatedSpan) that takes the original string as argument, to returns the line?

That way we don't make the struct larger and we still provide that feature (and as a bonus, it won't rely on unsafe code)

kaj · 2020-10-18T12:05:56Z

What about adding a function (or method of LocatedSpan) that takes the original string as argument, to returns the line?

That's what I do in rsass before starting to use LocatedSpan at all (in kaj/rsass#62). For me, the whole point of using LocatedSpan is that I can get the line number, column, and the line itself from inside a parser function, where I don't have the original string.

progval · 2020-10-18T12:12:45Z

Hmm yeah, good point. You could put a ref to the original string in the extra, but it would bloat LocatedSpan with duplicate data in your case.

kaj · 2020-10-18T14:22:44Z

For now, I think the docstring on get_line() is enough to handle the case where the LocatedSpan ends before the line.

Replacing the self.offset with a reference to the original data may be a good idea, but I think the pros and cons of that should be considered independent of this PR.

progval · 2020-10-18T14:41:01Z

Yeah, you're right. Could you just rename get_line to get_line_beginning, so we can introduce get_line later without it being a breaking change?

kaj · 2020-10-18T17:13:39Z

Yeah, you're right. Could you just rename get_line to get_line_beginning, so we can introduce get_line later without it being a breaking change?

Ok, done. Should I also squash the commits on this branch to one?

This test documents how `get_line_beginning()` differs from a hypotetical `get_line()` method.

progval · 2020-10-18T17:31:40Z

I can squash it on my end :)

Implement LocatedSpan::get_line().

a1e33fa

Add a function to get the full input line containing the (start point of the) LocatedSpan. As suggested in fflorent#53.

progval requested changes Oct 16, 2020

View reviewed changes

Add some tests.

c4df145

progval reviewed Oct 16, 2020

View reviewed changes

src/tests.rs Outdated Show resolved Hide resolved

Remove bogus comment.

65cf3e1

No need for get_line() to return Option.

942d0c9

kaj added 2 commits October 17, 2020 09:55

The test that uses format! requires std.

060fd1f

Some rustfmt.

1810ec3

progval mentioned this pull request Oct 17, 2020

impl StableDeref for LocatedSpan #65

Merged

Refactor two similar unsafe blocks to one.

84ca913

kaj requested a review from progval October 18, 2020 08:18

progval reviewed Oct 18, 2020

View reviewed changes

Add some disclaimer comments / docs.

6fb916a

Rename get_line to get_line_beginning.

60ea6cb

Add line_begining_may_ot_be_entire_len test.

de713cb

This test documents how `get_line_beginning()` differs from a hypotetical `get_line()` method.

progval merged commit 76f9e90 into fflorent:master Oct 18, 2020

kaj deleted the get_line branch October 18, 2020 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement LocatedSpan::get_line(). #66

Implement LocatedSpan::get_line(). #66

kaj commented Oct 16, 2020

progval left a comment

progval Oct 16, 2020

kaj commented Oct 16, 2020

progval commented Oct 16, 2020 •

edited

Loading

kaj commented Oct 16, 2020

progval left a comment

kaj commented Oct 16, 2020

progval commented Oct 16, 2020

kaj commented Oct 16, 2020

progval commented Oct 16, 2020

kaj commented Oct 17, 2020

progval left a comment •

edited

Loading

progval Oct 18, 2020

kaj Oct 18, 2020

kaj commented Oct 18, 2020

progval commented Oct 18, 2020

progval commented Oct 18, 2020 •

edited

Loading

kaj commented Oct 18, 2020

progval commented Oct 18, 2020 •

edited

Loading

kaj commented Oct 18, 2020

progval commented Oct 18, 2020

kaj commented Oct 18, 2020

progval commented Oct 18, 2020

Implement LocatedSpan::get_line(). #66

Implement LocatedSpan::get_line(). #66

Conversation

kaj commented Oct 16, 2020

progval left a comment

Choose a reason for hiding this comment

progval Oct 16, 2020

Choose a reason for hiding this comment

kaj commented Oct 16, 2020

progval commented Oct 16, 2020 • edited Loading

kaj commented Oct 16, 2020

progval left a comment

Choose a reason for hiding this comment

kaj commented Oct 16, 2020

progval commented Oct 16, 2020

kaj commented Oct 16, 2020

progval commented Oct 16, 2020

kaj commented Oct 17, 2020

progval left a comment • edited Loading

Choose a reason for hiding this comment

progval Oct 18, 2020

Choose a reason for hiding this comment

kaj Oct 18, 2020

Choose a reason for hiding this comment

kaj commented Oct 18, 2020

progval commented Oct 18, 2020

progval commented Oct 18, 2020 • edited Loading

kaj commented Oct 18, 2020

progval commented Oct 18, 2020 • edited Loading

kaj commented Oct 18, 2020

progval commented Oct 18, 2020

kaj commented Oct 18, 2020

progval commented Oct 18, 2020

progval commented Oct 16, 2020 •

edited

Loading

progval left a comment •

edited

Loading

progval commented Oct 18, 2020 •

edited

Loading

progval commented Oct 18, 2020 •

edited

Loading