Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Altrep Integers class. #274

Merged
merged 5 commits into from
Sep 9, 2021
Merged

Conversation

andy-thomason
Copy link
Contributor

This is a sample for an Integers class that uses Altrep
methods to handle very large vectors.

There are no iterators at present as we would need to implement
Rint first.

The USP of this is the ability to make vectors from iterators without
allocating memory.

We support elt(), get_region(), is_sorted() and no_na() correctly.

We do not have access to altrep min() and max() since these were removed
from Rinternals.h

Suggestions on a postcard...

@clauswilke
Copy link
Member

Thanks for your efforts! Two comments:

@andy-thomason
Copy link
Contributor Author

This should handle both the ALTREP and regular vectors.

The methods we call like get_region and elt should also work
for regular vectors.

When constructing vectors, I'm chosing to use regular vectors
for small ones and ALTVEC for larger ones. This is because there
is some overhead in creating the class and vector and we
need to generate the class every time.

The regular conversion rules should accept anything with INTSXP
as input. The macro in mod.rs converts from Robj to Integers and
vice versa.

@clauswilke
Copy link
Member

The methods we call like get_region and elt should also work
for regular vectors.

I see, thanks!

@clauswilke
Copy link
Member

The Rint prototype looks good to me. We'll probably want to add TryFrom traits eventually, but it's a good starting point.

Copy link
Contributor

@yutannihilation yutannihilation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this pull request is a really nice starting point for discussion.

extendr-api/src/wrapper/integers.rs Outdated Show resolved Hide resolved
/// assert_eq!(vec.iter().cloned().sum::<i32>(), 3);
/// }
/// ```
pub fn iter(&self) -> impl Iterator<Item = &i32> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for being nitpicky: Shouldn't the item type be Rint? In the general case, we don't know that we can return an i32 for sure (could be NA).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would return SliceIter<T>. Or, is the idea that doing NA-validation beforehand at once, which is similar to this comment above? If so, i32 might make sense.

do all its checks once for a number of elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented Rint where possible.

I'm in two minds about get_region as we would like to support SIMD arithmetic but can do this too if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SliceIter should probably go. Its only useful feature is that the lifetime
is effectively static.


impl std::iter::Sum for Rint {
/// Sum an integer iterator over Rint.
/// Yields NA on overflow of NAs present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yields NA on overflow of NAs present.

If this sentence means to say no NA-validation is needed because of this, I don't think it's true. (e.g. c(-1, NA))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/of/or/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. So, anyway we need NA-validation here, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I thought this would return i32::MIX - 1, but I found this is handled correctly thanks to <Rint>::checked_sub(). Sorry for my misunderstanding!

rextendr::rust_source(code = r"(
use extendr_api::scalar::Rint;

#[extendr(use_try_from = true)]
fn do_sum(x: Integers) -> Rint {
  x.iter().sum()
}
)", patch.crates_io = list(`extendr-api` = list(git = "https://github.com/extendr/extendr"))
)
#> ℹ build directory: '/tmp/Rtmp7RG5fJ/file77e25279e1bf0'
#> ✓ Writing '/tmp/Rtmp7RG5fJ/file77e25279e1bf0/target/extendr_wrappers.R'.

do_sum(c(NA_integer_, -1L))
#> [1] NA

Created on 2021-09-12 by the reprex package (v2.0.1)

// is problematic when converting a negative value to unsigned
// integer types (e.g. `-1i32 as u8` becomes 255).
if let Some(v) = robj.as_integer() {
if let Ok(v) = Self::try_from(v) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip this try_from in this case. Rint is 32-bit, so it should never overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

extendr-api/src/wrapper/integers.rs Show resolved Hide resolved
/// Get a single element from the vector.
/// Note that this is very inefficient in a tight loop.
pub fn elt(&self, index: usize) -> Rint {
unsafe { INTEGER_ELT(self.get(), index as R_xlen_t).into() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of non-ALTREP, cpp11 accesses to the underlying memory directly without INTEGER_ELT. Can we do the same thing? Doesn't this matter much on the performance?

https://github.com/r-lib/cpp11/blob/1b4beb25f56c8dca6e0448d5dc55fb64fddab5d1/inst/include/cpp11/integers.hpp#L29-L32

unsafe { INTEGER_IS_SORTED(self.get()).into() }
}

/// Return TRUE if the vector has NAs, FALSE if not, or NA_BOOL if unknown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Return TRUE if the vector has NAs, FALSE if not, or NA_BOOL if unknown.
/// Return TRUE if the vector has no NAs, FALSE if not, or NA_BOOL if unknown.

@andy-thomason
Copy link
Contributor Author

andy-thomason commented Sep 6, 2021 via email

@yutannihilation
Copy link
Contributor

Thanks, I just feel I don't have any idea worth arguing about there for now :) Let's move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants