Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strsplittrail #27

Merged
merged 16 commits into from
May 14, 2016
Merged

Strsplittrail #27

merged 16 commits into from
May 14, 2016

Conversation

brodieG
Copy link
Contributor

@brodieG brodieG commented Mar 5, 2016

Proposed "fix" for #26 (Fixes #26). A few noteworthy comments:

  • I "fixed" this issue only in col_strsplit. It could potentially be "fixed" in crayon:::re_table which would be cleaner, but would affect multiple other functions
  • I also enabled the empty parameter for non_matching; since it wasn't being used (I think), I updated all calls to that function to specify empty=TRUE to avoid changing behavior

Let me know if you want me to change / revert any of this.

Seemed to default to TRUE no matter what; also switched
'map_to_ansi' to call 'non_matching' with 'empty=TRUE'
since that seemed to be what it was doing before fix
This lines up with what 'substr' does.  Note
we fixed this in 'col_substr' instead of in
're_table' which would have been the other
option, but that could possibly have cause
other stuff to break
@brodieG
Copy link
Contributor Author

brodieG commented Mar 8, 2016

Fyi, I'll submit the resolved conflicts later today.

@brodieG
Copy link
Contributor Author

brodieG commented Mar 8, 2016

@gaborcsardi, fyi, the branch is now up to date.

@brodieG
Copy link
Contributor Author

brodieG commented Mar 12, 2016

@gaborcsardi, let me know if you intend to eventually merge this one or not. Fine if you don't, but it would be helpful for me to know one way or other since it affects how I write my package that imports crayon.

chunks <- lapply(
chunks,
function(y)
if(nrow(y) > 1L && !tail(y, 1L)[, "length"]) head(y, -1L) else y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the nrow(y) > 1L condition? I have

❯ strsplit("", "a")[[1]]
character(0)

so we want to drop a single "" as well, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to look at it again later. Clearly this is wrong, but I think I was trying to ensure that we don't drop this:

> strsplit("a", "a")[[1]]
[1] ""

Hopefully I can look at it this evening.

empty strings, empty splits, etc.
@brodieG
Copy link
Contributor Author

brodieG commented Mar 16, 2016

Okay, so unfortunately addressing the corner case you highlight is not completely straightforward. This latest push is NOT ready for merging as there remains some corner cases and I want to test this more thoroughly.

I put it here for your inspection in because the (currently partial) solution I put in is not particularly elegant. As far as I can tell, I can't easily rely on the existing infrastructure to handle the corner cases, as passing 0-row chunks causes col_substr and co. to break. Instead, I have to handle the corner cases in col_strsplit directly. I could probably fix the internal functions to be more elegant, but that will likely mean modifying a fair bit more code.

Let me know if have any concerns with me just handling the corner cases as the col_strsplit.

EDIT: actually, I'm not sure we can modify the internal funs as the issue is that we're basically doing something like col_substr(x, integer(0L), integer(0L)) which is undefined even for substr, so I think we have to handle this as the col_strsplit level. Haven't fully thought through, but I'm out of time. I'll try to address all this at some point this week.

@gaborcsardi
Copy link
Member

Thanks! I'll try to look at it this evening.

Throw an error if 'split' is length greater than 1 instead
of letting 'gregexpr' gag on it.
Explicitly coerce to character since 'gregexpr' doesn't
necessarily do it.
Substitute "" for zero length splits to mimic strsplit
behavior.
@brodieG
Copy link
Contributor Author

brodieG commented Mar 18, 2016

Okay, this should be good to go now, assuming you're okay with how I've addressed the issues. I wish strsplit was a bit more consistent in how it handles corner cases.


expect_equal(col_strsplit("", "-"), list(""))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these tests no longer behave like they used to so I changed them.

@brodieG
Copy link
Contributor Author

brodieG commented Mar 24, 2016

I addressed the last remaining issue I opened. Note that one issue is that substr("abc", "hello", "hello") works (returns NA), whereas I just sidestepped that by disallowing string position arguments in col_substr.

Also, appveyor is failing, but I think the issue is not related to the code changes I made (for some reason devtools didn't load properly).

Finally, haven't heard much one way or another on this pull request. Feedback would be appreciated. Thanks.

@gaborcsardi
Copy link
Member

I am sorry this is taking long. I'll look at it today.

@brodieG
Copy link
Contributor Author

brodieG commented Apr 14, 2016

Let me know if you want me to simplify this pull request by breaking it up into smaller ones.

@gaborcsardi
Copy link
Member

Thanks much! I am sorry that it took so long!

@gaborcsardi gaborcsardi merged commit c12e5f3 into r-lib:master May 14, 2016
@brodieG
Copy link
Contributor Author

brodieG commented May 14, 2016

Great, thanks for accepting the pull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Align col_strsplit to strsplit, warts and all?
2 participants