Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom regex #9

Closed
yocontra opened this issue Dec 12, 2013 · 11 comments · Fixed by #13
Closed

Support custom regex #9

yocontra opened this issue Dec 12, 2013 · 11 comments · Fixed by #13

Comments

@yocontra
Copy link

It would be cool if this supported passing in a regex instead of a string that becomes a regex.

@sindresorhus
Copy link

👍

2 similar comments
@sandinmyjoints
Copy link

👍

@nicolashery
Copy link

👍

@malgorithms
Copy link

I imagine the difficulty here is the cross-boundary matching... @eugeneware - is that right?

If Eugene's constructing a regular expression out of a string you pass, he only needs to keep strlen(search term) data around from the previous chunk to watch for replacements.

I would like this too...just saying I wouldn't know how it would be implemented without using a non-native regular expression implementation.

I suppose it could support regular expressions but also require you pass it a maximum match length at the same time.

// remove html comments up to 1024 chars
// longer comments would not get replaced
.pipe(replaceStream(/<!--(.|\s)*-->/g, '', { max_match_len: 1024 } ))

@yocontra
Copy link
Author

@malgorithms max_match_len requirement on custom regexs sounds fine to me, that would solve most cases

@eugeneware
Copy link
Owner

Correct, you could end up buffering the entire stream in memory if you
don't get a hit, which would kind of suck.

But implementing a maximum match length might be an option, though not
deterministic of course.

On Fri, Apr 25, 2014 at 7:37 AM, Eric Schoffstall
notifications@github.comwrote:

@malgorithms https://github.com/malgorithms max_match_len requirement
on custom regexs sounds fine to me, that would solve most cases


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-41335792
.

Eugene Ware
Chief Executive Officer

Phone: +61 3 9955 7041
Email: eugene@noblesamurai.com
Twitter: @eugeneware http://twitter.com/EugeneWare

Noble Samurai Pty Ltd
Level 1, 234 Whitehorse Rd
Nunawading, Victoria, 3131, Australia

noblesamurai.com http://www.noblesamurai.com/ | eugeneware.com |
facebook.com/Eugene.S.Ware http://www.facebook.com/Eugene.S.Ware

@malgorithms
Copy link

@eugeneware - what do you mean "not deterministic" in this context? That it would be unpredictable based on boundaries?

Like, for a simple example, if you decided you wanted to replace /a+/ with just b then if you happened to just swallow a few a's but then hit a boundary, you might mistakenly replace it with a b even though there were more a's coming, which should've been part of the match, because it was all less than max_match_len?

I think if you kept a moving window of data of size (max_match_length * 2) and only performed replacements on matches which began in the first half of the window, there wouldn't be any ambiguities. I think. Does that sound right to you? It would swallow up to max_match_length of repeating a's and replace them with a b, as expected by the call.

@eugeneware
Copy link
Owner

The moving window approach would definitely work. Should be easy enough to
implement. Happy to take a PR if you'd like to take a shot at this too :-)

On Fri, Apr 25, 2014 at 11:34 PM, Chris Coyne notifications@github.comwrote:

@eugeneware https://github.com/eugeneware - what do you mean "not
deterministic" in this context? That it would be unpredictable based on
boundaries?

Like, for a simple example, if you decided you wanted to replace /a+/with just
b then if you happened to just swallow a few a's but then hit a boundary,
you might mistakenly replace it with a b even though there were more a's
coming, which should've been part of the match, because it was all less
than max_match_len?

I think if you kept a moving window of data of size (max_match_length * 2)
and only performed replacements on matches which began in the first half of
the window, there wouldn't be any ambiguities. I think. Does that sound
right to you? It would swallow up to max_match_length of repeating a's
and replace them with a b, as expected by the call.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-41392722
.

Eugene Ware
Chief Executive Officer

Phone: +61 3 9955 7041
Email: eugene@noblesamurai.com
Twitter: @eugeneware http://twitter.com/EugeneWare

Noble Samurai Pty Ltd
Level 1, 234 Whitehorse Rd
Nunawading, Victoria, 3131, Australia

noblesamurai.com http://www.noblesamurai.com/ | eugeneware.com |
facebook.com/Eugene.S.Ware http://www.facebook.com/Eugene.S.Ware

@malgorithms
Copy link

Fair enough! I am unlikely to do this as I no longer need it, but if I get some hobby time...

@mehtaphysical
Copy link
Collaborator

I can take this on. I just used this in a project using regex. Here is the changes I made: https://github.com/mehtaphysical/replacestream/tree/add-regex

I'll make a pull request so we can create a dialog about it. But first I want to make a test.

@eugeneware
Copy link
Owner

Thanks @mehtaphysical that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants