Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split at every character if using empty string as delimiter #1335

Closed
wants to merge 1 commit into from

Conversation

malthe
Copy link
Contributor

@malthe malthe commented Oct 18, 2016

No description provided.

end

result.push(cur = recover String end)
let found = match chars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will incur a newly added match cost of "unwrapping the type union" at every iteration of this loop. Has this change been benchmarked for how it affects the base case (non-empty delim)?

I think a more performant solution would be to decide once outside the loop whether the delim is empty or non-empty, then use a different implementation of the loop based on that decision, rather than forcing the same calculation of the same decision at every iteration of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that the cost of the match is negligible compared to the rest of what goes on in this method.

Isn't the match cost in this case simply a branch instruction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply a branch instruction is not necessarily cheap. We are writing an application where that branch might get called hundreds of thousands of times a second. I haven't tested but under heavy load, jemc's variant sounds like it will perform much better.

Copy link
Contributor Author

@malthe malthe Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix that.

Would we need an RFC to change the signature of the split method to work on String val instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malthe - I think it would need an RFC, probably.

@SeanTAllen
Copy link
Member

Sylvan and I talked during sync, if this has no impact on the normal case, we are good with this but if it causes a performance degradation for the normal use case, we are not in favor of this.

@malthe
Copy link
Contributor Author

malthe commented Oct 19, 2016

I have submitted an RFC and implementation for a version of this that also changes String.split to work on immutable strings only. That version has no impact on the normal case (the two cases are split up in the implementation).

@SeanTAllen
Copy link
Member

There's a release underway. Please rebase against master and verify that your CHANGELOG entry appears in the "unreleased" section post rebasing.

@malthe malthe force-pushed the string-split-empty branch 3 times, most recently from 0512954 to 98f4ef3 Compare October 21, 2016 13:08
@malthe
Copy link
Contributor Author

malthe commented Oct 21, 2016

@SeanTAllen – do some of the tests fail because of this pull request or because of some general test instability?

@SeanTAllen
Copy link
Member

We are having issues with some of the network tests. Its probably unrelated to your change.

@jemc
Copy link
Member

jemc commented Oct 21, 2016

This time the one failure appears to be Just Travis' Fault™:

An error occurred while generating the build script.

@SeanTAllen
Copy link
Member

That's been happening a decent amount today. Probably a result of DDoS on DYN.

@SeanTAllen
Copy link
Member

CHANGELOG changed due to 0.7.0 release, this probably needs to be rebased against master.

@malthe
Copy link
Contributor Author

malthe commented Oct 26, 2016

This should be ready now.

@SeanTAllen
Copy link
Member

@malthe I don't see any changes to address the performance concern. this introduces a performance regression in the common case to deal with an edge case. I'm not in favor of this in that case.

Given that I'm not sure that I think splitting at every character is the right thing to do with an empty string AND it it going to impact on performance for "the usual cases", I am not in favor of this change at this time.

@malthe
Copy link
Contributor Author

malthe commented Oct 26, 2016

I'll close this one and promote ponylang/rfcs#46 instead (which currently includes this change).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants