Add nest_string()/unnest_string() functions #69

aaronwolen · 2015-03-25T13:05:22Z

This PR slightly modifies unnest() so the transform step from the example isn't necessary and can just be written as:

df <- data.frame(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  stringsAsFactors = FALSE
)
unnest(df, y)

I'm often doing these kind of unnesting operations and just wanted to save myself some typing by letting unnest() handle the string splitting.

This also adds a nest() function so it's possible to round trip:

df %>% unnest(y) %>% nest(y)

hadley · 2015-03-25T13:08:30Z

Hmmmm, that's only one type of nesting - the other is when you have something like:

dplyr::data_frame(x = c(1, 2), y = list(1:3, 9:10))

So maybe a more specific name like unnest_string() ?

aaronwolen · 2015-03-25T13:45:09Z

Ah, good call. unnest_string() makes sense and fits with the precedent set by extract_numeric().

But which type of nesting do you think is more common? If it's string nesting then maybe unnest_list() makes more sense?

aaronwolen · 2015-03-25T13:50:25Z

Thinking about it for the past 15 seconds, unnest_list() sounds more like a base::unsplit() replacement than a data.frame function. unnest_string() is probably the way to go.

Are you envisioning nest_string()/unnest_string() and nest()/unnest() for list columns? Or is that nest overload?

hadley · 2015-03-30T15:07:24Z

To me, unnest() seems like the atomic operation (in the sense you combine it with other atoms to do something useful), so I'd like to keep it the primary verb.

hadley · 2015-03-30T15:09:01Z

R/nest-string.R

+#'   stringsAsFactors = FALSE
+#' )
+#' unnest_string(df, y)
+unnest_string <- function(data, col = NULL, sep = "[^[:alnum:]]+", ...) {


Doesn't this always need a col?

hadley · 2015-03-30T15:10:09Z

Could you PTAL at the build failure?

aaronwolen · 2015-03-30T17:32:21Z

Looks like the build failure was introduced in #59. I added a fix here (9c6fcc4) but can submit a separate PR If you prefer.

aaronwolen · 2015-03-30T18:15:29Z

Thanks for the comments. I see your point about nest_string() and removed it to wrap up this PR.

Any interest in extending unnest() (and unnest_string()) to accept multiple columns as suggested in #44?

hadley · 2015-04-16T11:10:32Z

DESCRIPTION

@@ -17,6 +17,8 @@ Imports:
    dplyr (>= 0.2),
    stringi,
    lazyeval
+Suggets:


Was this deliberate?

I meant to add data.table to the suggested packages to address this build failure, but listing it under a redundant/misspelled key was just a silly mistake (insert embarrassed emoji here).

Ah I just discovered that too and fixed it (in a slightly different way). Could you please merge/rebase?

hadley · 2015-04-16T11:11:29Z

Would you mind including a couple of unit tests too please?

aaronwolen · 2015-04-16T11:52:52Z

Sure, I'll rebase and add a few tests.

coveralls · 2015-04-16T12:55:15Z

Coverage increased (+5.75%) to 63.83% when pulling a907a56 on aaronwolen:nest into a2bab46 on hadley:master.

hadley · 2016-05-22T06:48:22Z

I wonder if this should be unnest_separate() since it's similar to separate?

aaronwolen · 2016-05-22T13:55:41Z

Yeah, I think that makes sense. Should I update the PR?

hadley · 2016-05-22T15:26:04Z

Yes, that would be great! I think maybe it should have col and convert arguments to be as similar as possible to separate(). And maybe it should be called separate_rows()? Or something like that.

It would also be useful give this function, extract(), and separate() a common @family, maybe @family string splitting ?

aaronwolen · 2016-05-23T17:53:36Z

From a readability perspective I definitely prefer separate_rows() to unnest_separate().

However, I think an argument can be made that this function is conceptually more similar to unnest(), which produces new rows, than it is to separate(), which produces a new column. If you buy that argument, unnest_*something*() might be more inline with user expectations, and if you don't... well, then separate_rows() it is!

hadley · 2016-05-23T18:31:16Z

It's half-way between both, but I think people are more likely to look for it after discovering that separate() doesn't do what they want. I think if you understand unnest() you might not need unnest_string().

* Preserves grouping * Avoid modifying grouped variable * Convert works as expected

aaronwolen · 2016-05-24T14:05:09Z

R/separate-rows.R

+#' @export
+separate_rows_.grouped_df <- function(data, cols, sep = "[^[:alnum:].]+",
+                                  convert = FALSE) {
+


Should separate_rows() prevent modification of grouped variables, @hadley?

It shouldn't - you can copy the approach from separate_:

#' @export separate_.grouped_df <- function(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) { regroup(NextMethod(), data, if (remove) col) }

aaronwolen · 2016-05-24T14:49:47Z

Cool, this should be ready for review.

hadley · 2016-05-24T15:37:49Z

Thanks!

aaronwolen changed the title ~~Simplify unnest's interface and add nest function~~ Add nest_string()/unnest_string() functions Mar 25, 2015

aaronwolen force-pushed the nest branch from c05cd39 to b4b4e4c Compare March 27, 2015 13:24

hadley reviewed Mar 30, 2015
View reviewed changes

hadley reviewed Apr 16, 2015
View reviewed changes

aaronwolen force-pushed the nest branch from e5048cb to a907a56 Compare April 16, 2015 12:39

aaronwolen force-pushed the nest branch 2 times, most recently from 3c3e044 to 5938ed1 Compare May 26, 2015 16:18

hadley mentioned this pull request Jun 15, 2015

tidying key-value pair columns #86

Closed

aaronwolen added 6 commits May 23, 2016 14:25

Add unnest_string

57a7335

Update docs

087b2e5

Add some tests for unnest

feb0464

Update news

9cc7e75

Use stringi to implement unnest_string

0ba7f0c

Allow unnest_string to work with multiple columns

249f1b0

aaronwolen added 4 commits May 23, 2016 14:31

Prevent unnest_string() from splitting decimals

7d8c8ca

Fix unnest example

819efd6

Rename unnest_string to separate_rows

9b2db19

Add convert argument to separate_rows()

2727386

aaronwolen force-pushed the nest branch from eb8ea1d to 5a0640e Compare May 24, 2016 12:54

aaronwolen added 3 commits May 24, 2016 08:56

Group separate_rows() tests with separate()'s

6c5f4c1

Update NEWS

f330569

Additional tests for separate_rows()

38db7d4

* Preserves grouping * Avoid modifying grouped variable * Convert works as expected

aaronwolen force-pushed the nest branch from 5a0640e to 38db7d4 Compare May 24, 2016 13:50

aaronwolen reviewed May 24, 2016
View reviewed changes

Allow row separation on grouped variable

07a4e06

hadley merged commit 32d9dd2 into tidyverse:master May 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nest_string()/unnest_string() functions #69

Add nest_string()/unnest_string() functions #69

aaronwolen commented Mar 25, 2015

hadley commented Mar 25, 2015

aaronwolen commented Mar 25, 2015

aaronwolen commented Mar 25, 2015

hadley commented Mar 30, 2015

hadley Mar 30, 2015

hadley commented Mar 30, 2015

aaronwolen commented Mar 30, 2015

aaronwolen commented Mar 30, 2015

hadley Apr 16, 2015

aaronwolen Apr 16, 2015

hadley Apr 16, 2015

hadley commented Apr 16, 2015

aaronwolen commented Apr 16, 2015

coveralls commented Apr 16, 2015

hadley commented May 22, 2016

aaronwolen commented May 22, 2016

hadley commented May 22, 2016

aaronwolen commented May 23, 2016

hadley commented May 23, 2016

aaronwolen May 24, 2016

hadley May 24, 2016

aaronwolen commented May 24, 2016

hadley commented May 24, 2016

Add nest_string()/unnest_string() functions #69

Add nest_string()/unnest_string() functions #69

Conversation

aaronwolen commented Mar 25, 2015

hadley commented Mar 25, 2015

aaronwolen commented Mar 25, 2015

aaronwolen commented Mar 25, 2015

hadley commented Mar 30, 2015

hadley Mar 30, 2015

Choose a reason for hiding this comment

hadley commented Mar 30, 2015

aaronwolen commented Mar 30, 2015

aaronwolen commented Mar 30, 2015

hadley Apr 16, 2015

Choose a reason for hiding this comment

aaronwolen Apr 16, 2015

Choose a reason for hiding this comment

hadley Apr 16, 2015

Choose a reason for hiding this comment

hadley commented Apr 16, 2015

aaronwolen commented Apr 16, 2015

coveralls commented Apr 16, 2015

hadley commented May 22, 2016

aaronwolen commented May 22, 2016

hadley commented May 22, 2016

aaronwolen commented May 23, 2016

hadley commented May 23, 2016

aaronwolen May 24, 2016

Choose a reason for hiding this comment

hadley May 24, 2016

Choose a reason for hiding this comment

aaronwolen commented May 24, 2016

hadley commented May 24, 2016