Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocatedSpan::get_unoffsetted_slice can lead to UB #88

Open
stephaneyfx opened this issue Jan 29, 2023 · 2 comments · May be fixed by #90
Open

LocatedSpan::get_unoffsetted_slice can lead to UB #88

stephaneyfx opened this issue Jan 29, 2023 · 2 comments · May be fixed by #90

Comments

@stephaneyfx
Copy link

stephaneyfx commented Jan 29, 2023

This function is called from public and safe functions like get_line_beginning. It assumes that the current fragment is part of a larger fragment and attempts to read before the beginning of the current fragment. This assumption may be incorrect as demonstrated by the following program that exhibits UB without unsafe and outputs garbage (which can change on every run).

use nom::{AsBytes, InputTake, Offset, Slice};
use nom_locate::LocatedSpan;
use std::{
    cell::Cell,
    ops::{RangeFrom, RangeTo},
    rc::Rc,
};

#[derive(Debug)]
struct EvilInput<'a>(Rc<Cell<&'a [u8]>>);

impl<'a> AsBytes for EvilInput<'a> {
    fn as_bytes(&self) -> &[u8] {
        self.0.get()
    }
}

impl Offset for EvilInput<'_> {
    fn offset(&self, second: &Self) -> usize {
        self.as_bytes().offset(second.as_bytes())
    }
}

impl Slice<RangeFrom<usize>> for EvilInput<'_> {
    fn slice(&self, range: RangeFrom<usize>) -> Self {
        Self(Rc::new(Cell::new(self.0.get().slice(range))))
    }
}

impl Slice<RangeTo<usize>> for EvilInput<'_> {
    fn slice(&self, range: RangeTo<usize>) -> Self {
        Self(Rc::new(Cell::new(self.0.get().slice(range))))
    }
}

fn main() {
    let new_input = [32u8];
    let original_input = [33u8; 3];
    let evil_input = EvilInput(Rc::new(Cell::new(&original_input)));
    let span = LocatedSpan::new(evil_input).take_split(2).0;
    span.fragment().0.set(&new_input);
    let beginning = span.get_line_beginning();
    dbg!(beginning);
    dbg!(new_input.as_ptr() as usize - beginning.as_ptr() as usize);
}

Example output:

[src/main.rs:43] beginning = [
    201,
    127,
    32,
]
[src/main.rs:44] new_input.as_ptr() as usize - beginning.as_ptr() as usize = 2

Related to #45.

@progval progval added the bug label Jan 29, 2023
@progval
Copy link
Collaborator

progval commented Jan 29, 2023

Hmm, that's a nasty side-effect of #76. nom_locate used to only support a limited set of fragment types that are well-behaved. But since that generalization, one can implement AsBytes on the fragment type to behave inconsistently.

I don't see how to fix this without going back on the generalization added by #76. (though at least we wouldn't need to revert to the macros)

@progval
Copy link
Collaborator

progval commented Jan 29, 2023

Well, I guess something like this would work:

diff --git a/src/lib.rs b/src/lib.rs
index 66a19f4..cd097f4 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -118,6 +118,12 @@ use nom::{
 #[cfg(feature = "stable-deref-trait")]
 use stable_deref_trait::StableDeref;
 
+pub unsafe trait WellBehavedFragment {}
+
+unsafe impl<'a> WellBehavedFragment for &'a [u8] {}
+unsafe impl<'a> WellBehavedFragment for &'a str {}
+unsafe impl WellBehavedFragment for String {}
+
 /// A LocatedSpan is a set of meta information about the location of a token, including extra
 /// information.
 ///
@@ -323,7 +329,7 @@ impl<T, X> LocatedSpan<T, X> {
     }
 }
 
-impl<T: AsBytes, X> LocatedSpan<T, X> {
+impl<T: AsBytes + WellBehavedFragment, X> LocatedSpan<T, X> {
     // Attempt to get the "original" data slice back, by extending
     // self.fragment backwards by self.offset.
     // Note that any bytes truncated from after self.fragment will not
diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs
index 74dd0db..48b7662 100644
--- a/tests/integration_tests.rs
+++ b/tests/integration_tests.rs
@@ -1,5 +1,5 @@
 use nom::{error::ErrorKind, error_position, AsBytes, FindSubstring, IResult, InputLength, Slice};
-use nom_locate::LocatedSpan;
+use nom_locate::{LocatedSpan, WellBehavedFragment};
 use std::cmp;
 use std::fmt::Debug;
 use std::ops::{Range, RangeFull};
@@ -59,7 +59,7 @@ struct Position {
 fn test_str_fragments<'a, F, T>(parser: F, input: T, positions: Vec<Position>)
 where
     F: Fn(LocatedSpan<T>) -> IResult<LocatedSpan<T>, Vec<LocatedSpan<T>>>,
-    T: InputLength + Slice<Range<usize>> + Slice<RangeFull> + Debug + PartialEq + AsBytes,
+    T: InputLength + Slice<Range<usize>> + Slice<RangeFull> + Debug + PartialEq + AsBytes + WellBehavedFragment,
 {
     let res = parser(LocatedSpan::new(input.slice(..)))
         .map_err(|err| {

as it requires library users to explicitly use unsafe before they use any function that calls get_unoffsetted_slice on a custom type

@progval progval added this to the v5.0.0 milestone Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants