Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented low-level replace_all function #436

Merged
merged 9 commits into from
Jul 4, 2021

Conversation

aman-godara
Copy link
Member

@aman-godara aman-godara commented Jun 15, 2021

#366

Current implement provides with replace_overlapping optional dummy argument which is logical.
This argument helps provide functionality of both OPTION: 1 and OPTION: 2 of overlapping problem depending upon whether it is set to .true. or .false..

[EDIT]: replace_overlapping has been removed from replace_all now.

Tasks:

  • Implemented replace_all function
  • Added some test cases for replace_all to show the behaviour of the function
  • Added more test cases for replace_all
  • Document the function
  • Pure or Elemental -> Pure.

@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label Jun 15, 2021
@aman-godara
Copy link
Member Author

Although I have added replace_overlapping feature to the function but I would like to know the opinion of the community on what behaviour do they find more intuitive from the function in case of overlapping substrings pattern.

@aman-godara
Copy link
Member Author

This comment is regarding commit e325c2b.
replace_all(string, pattern, replacement, replace_overlapping)

Please note that for input: replace_all("abcc", "abc", "ab")
commit e325c2b returns "abc" even if replace_overlapping is set to .true. because we are returning a new string as the output. Function is using the input string just to read from it and not to write in it.


If we would have designed replace_all function differently (i.e. by keeping replace_all as subroutine and making few other modifications in the algorithm), function could have been made to give "ab" as the output. Answer "ab" occurs as such:
1st replacement: "abcc" -> "abc" (replaced "abc" with "ab")
2nd replacement: "abc" -> "ab" (replaced "abc" with "ab")
No further replacements.

Both these replacements are happening in the one single call of replace_all. Here after 1st replacement, "ab" fitted very well with "c" acting as the pattern for the 2nd replacement.

Moreover, this behaviour cannot be achieved by running replace_all (as implement in commit e325c2b even with replace_overlapping as .true.) several times, as:
running replace_all 2 times on "acaaca" to replace pattern: "aca" with replacement: "ac" returns "acc" whereas correct output should be "acca" as such:
1st replacement: "acaaca" -> "acaca"
2nd replacement: "acaca" -> "acca"
No further replacements.

@arjenmarkus too has highlighted a similar behaviour through this sentence in his comment:

If the replacing string is qwqwq you would get an infinitely long string.

Whole aim of the comment was to highlight the problems with subroutine kind of replace_all function and to bring forward what implementation e325c2b cannot do.

@aman-godara aman-godara marked this pull request as ready for review June 22, 2021 16:40
src/stdlib_strings.f90 Outdated Show resolved Hide resolved
src/stdlib_strings.f90 Outdated Show resolved Hide resolved
@awvwgk awvwgk self-requested a review June 24, 2021 09:26
src/stdlib_strings.f90 Outdated Show resolved Hide resolved
Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@aman-godara
Copy link
Member Author

On page 124 of https://j3-fortran.org/doc/year/18/18-007r1.pdf, It is mentioned that output of character slicing is of length max(l - f + 1, 0) which is exactly what is desired in here by the function replace_all. Thus, I am making use of intrinsic character slicing instead of slice in line 633 and 648. This will slightly improve the performance of the function.

image

@awvwgk
Copy link
Member

awvwgk commented Jul 3, 2021

@fortran-lang/push-access Can anyone help out with reviewing this patch to allow @aman-godara to move forward with his GSoC project?

@milancurcic milancurcic self-requested a review July 3, 2021 15:54
Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aman-godara, I think this is a very nice addition.

I can't find the explanation in this thread or in the related issue: Why did you choose pure over elemental?

@milancurcic
Copy link
Member

I can't find the explanation in this thread or in the related issue: Why did you choose pure over elemental?

All good, I see why, please disregard 👍

@awvwgk awvwgk merged commit 4ed69eb into fortran-lang:master Jul 4, 2021
@aman-godara
Copy link
Member Author

I can't find the explanation in this thread or in the related issue: Why did you choose pure over elemental?

Let me still mention it for future reference,
if the length of replacement string is not equal to the length of pattern string, then different outputs might have varying length which become problematic to be stored in a single output array.

for example:
replace_all (string, pattern, replacement)

replace_all(["abcabcT", "bcdabcT"], ["abc", "bcd"], ["X", "Y"])
output array in this case will be: ["XXT", "YabcT"] which cannot be stored in a single output array because of varying lengths of different outputs "XXT" and "YabcT.

@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants