We could do much better than this in terms of performance with a C implementation. vec_group_rle() does a lot more work than is necessary here because it uses a dictionary to keep track of what it has already seen.
Also consider vec_rle(), which would be different from vec_group_rle() since that uses a dictionary to track the first time we saw an individual value. vec_rle() would be much simpler, and would just track changes in x, like vec_runs(). It would return a two column data frame with val and len, very much like rle().
Also, why doesn't vec_group_rle() return a data frame rather than a rcrd? It seems like that would've been simpler for such a low level function.
For vec_runs() and vec_rle() to work efficiently, we need to extract out the equality comparison caching utilities from the dictionary code (i.e. d->equal and d->vec_p). That will allow us to extremely efficiently compare values of x for equality.
Inspired by the adjacency grouping idea in tidyverse/dplyr#5184
We could do much better than this in terms of performance with a C implementation.
vec_group_rle()
does a lot more work than is necessary here because it uses a dictionary to keep track of what it has already seen.I've added this to the vec-prefixes google sheet
Created on 2020-05-05 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: