Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented intelligent slice functionality #414

Merged
merged 19 commits into from
Jun 12, 2021

Conversation

aman-godara
Copy link
Member

@aman-godara aman-godara commented May 23, 2021

Status: code is open for review
resolves #413

Tasks:

  • Implement slice interface in stdlib_strings.f90: add two procedures under slice interface
  • Write test cases for slice in stdlib_string_functions.f90
  • Discuss about include_last feature after getting the basic slice function merged
  • Document slice function and correct documentation of to_title and to_sentence

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid keywords as (dummy) variable names, how about end -> last?

@aman-godara
Copy link
Member Author

I would prefer to avoid keywords as (dummy) variable names, how about end -> last?

Using start with last might be non-intuitive for some. How about first and last or from and to?

@awvwgk
Copy link
Member

awvwgk commented May 23, 2021

I like first and last.

@ivan-pi
Copy link
Member

ivan-pi commented May 24, 2021

Is their any reason to keep this in stdlib_ascii module? See #399.

@aman-godara
Copy link
Member Author

Is their any reason to keep this in stdlib_ascii module? See #399.

So we will have a slice interface which will have 2 procedures in it like this:

interface slice
    module procedure :: slice_character_sequence
    module procedure :: slice_string
end interface slice

and implementation of slice_string will call slice_character_sequence?

@ivan-pi
Copy link
Member

ivan-pi commented May 24, 2021

I'm guessing you have the right idea in mind. You can drop the _sequence postfix, IMO.

use stdlib_string, only: slice
use stdlib_string_type, only: string => string_type
character(len=16) :: cstr
type(string) :: sstr
cstr = "Hello, World"
sstr = string(slice(cstr,last=6)) ! sstr contains "Hello", call to slice_character
print *, slice(sstr,first=3)      ! prints "llo", call to slice_string

@aman-godara
Copy link
Member Author

aman-godara commented May 24, 2021

Yeah, I got your point.

Your above comments start a new discussion here:

sstr = string(slice(cstr,last=6)) ! sstr contains "Hello", call to slice_character
print *, slice(sstr,first=3) ! prints "llo", call to slice_string

Currently the function slice is first index and last index inclusive but some other languages like Java as well as Python keep first index inclusive and last index exclusive. So output of slice(cstr, last=6) will be 'Hello,' for Fortran (output included index 6 as well)
whereas for Python index 6 will not be included:

str = 'Hello, World'
print(str[:6])
# prints 'Hello,' (index 0 inclusive but index 6 exclusive)

This might be linked with the fact that Java & Python uses 0-based indexing.

@ivan-pi
Copy link
Member

ivan-pi commented May 24, 2021 via email

@aman-godara aman-godara marked this pull request as ready for review May 26, 2021 08:29
@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label May 26, 2021
Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is a very useful addition to stdlib.

@aman-godara
Copy link
Member Author

slice function has some overlap with char_string_range.

@ivan-pi
Copy link
Member

ivan-pi commented May 26, 2021

slice function has some overlap with char_string_range.

Indeed, they perform roughly the same operation. But char_string_range is only accessed through it's public interface char, which has an entirely different purpose, that is transform a string_type into the intrinsic character type.

On the other hand slice is more of an alternative to the slice syntax (1:n:s) in form of a function (since the language is missing the the capability to overload the access operator).

Copy link
Member

@ivan-pi ivan-pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Only thing I was confused about was what happens in case of input like:

result = slice(str,8,2,stride=-2)

Are the characters now taken in reverse reverse? I would need to perform some tests with the function, to make sure all "invalid" input cases still produce sensible results.

doc/specs/stdlib_strings.md Outdated Show resolved Hide resolved
doc/specs/stdlib_string_type.md Show resolved Hide resolved
doc/specs/stdlib_strings.md Outdated Show resolved Hide resolved
@aman-godara
Copy link
Member Author

aman-godara commented May 27, 2021

result = slice(str,8,2,stride=-2)

Are the characters now taken in reverse reverse?

Behaviour of the function is similar to this fortran loop:

j = 1
do i = start, end, stride
    << insert the character present at index i of input_string to index j of output_string >>
    j = j + 1
end do

let's say input_string = "abcdefgh"
start = 8
end = 2
stride = -2

I am showing few iterations as the function progresses here:

function is at index = 8
output_string = "h"

function is at index = 6
output_string = "hf"

function is at index = 4
output_string = "hfd"

function is at index = 2
output_string = "hfdb"

so we get a reversed output "hfdb"

Had the stride been positive 2 (stride = 2, start = 8, end = 2): the above mentioned loop would have executed for 0 number of times i.e. function will never reach any index.

@aman-godara
Copy link
Member Author

aman-godara commented May 27, 2021

If

input_string = 'abcdefgh'
output_string = slice(input_string, 12, 16)

output_string : 'h'
This output is non-intuitive to me. Should the function better return '' in this case?

@ivan-pi
Copy link
Member

ivan-pi commented May 27, 2021

I've written a new comment in #413 (comment)

The implementation of extract by @everythingfunctional can be found here.

In the case above where the last is larger than the length of the string, the call slice(input_string,12,16) should be transliterated to input_string(12:min(len(input_string),finish)), or more specifically input_string(12:8) which should automatically return a zero-length string.

@aman-godara aman-godara requested a review from awvwgk May 27, 2021 12:27
@ivan-pi
Copy link
Member

ivan-pi commented May 27, 2021

In the zen of python it is stated that,

There should be one-- and preferably only one --obvious way to do it.

What is the value in allowing a negative stride, if we could also achieve this using the nested calls, e.g.

result = reverse(slice(str,first,last,stride))

I'm happy to hear what are some possible counter-arguments. One is of course shorter call syntax. But it comes at the price of the complicated logic inside slice.

@aman-godara
Copy link
Member Author

What is the value in allowing a negative stride, if we could also achieve this using the nested calls, e.g.

I can try implementing slice which takes positive strides only but I think it will still look more or less equally complicated.

@ivan-pi
Copy link
Member

ivan-pi commented May 27, 2021

I'd stay with the current behavior for now. If we can get the corner cases right, I have nothing against the current behavior. It is just a matter of preference in the end.

@aman-godara
Copy link
Member Author

aman-godara commented May 27, 2021

    pure function clip(x, xmin, xmax) result(res)
        integer, intent(in) :: x
        integer, intent(in) :: xmin
        integer, intent(in) :: xmax
        integer :: res

        res = max(min(x, xmax), xmin)
    end function clip

    function slice(string, first, last, stride) result(sliced_string)
        character(len=*), intent(in) :: string
        integer, intent(in), optional :: first, last, stride
        integer :: first_index, last_index, stride_vector, strides_taken, length_string, i, j
        character(len=:), allocatable :: sliced_string

        length_string = len(string)
        if (length_string > 0) then
            first_index = 1
            last_index = length_string
            stride_vector = 1

            if (present(stride)) then
                stride_vector = max(1, abs(stride))
            end if

            if (present(first)) then
                first_index =  first
            end if
            if (present(last)) then
                last_index = last
            end if

            if((last_index < first_index) .or. &
                (first_index < 1 .and. last_index < 1) .or. &
                (first_index > length_string .and. last_index > length_string)) then
                
                    sliced_string = ""
            else
                first_index = clip(first_index, 1, length_string)
                last_index = clip(last_index, 1, length_string)

                strides_taken = (last_index - first_index) / stride_vector
                allocate(character(len=strides_taken + 1) :: sliced_string)

                j = 1
                do i = first_index, last_index, stride_vector
                    sliced_string(j:j) = string(i:i)
                    j = j + 1
                end do
            end if
        else
            sliced_string = ""
        end if
    end function slice

Actually most of the complexity came when I started handling invalid cases. With invalid cases here I was referring to the case when user has given wrong indexes as inputs in first and last arguments or user has given a stride which is incoherent with other given arguments.

@awvwgk
Copy link
Member

awvwgk commented May 27, 2021

I tried to break the implementation and haven't found any deviating behavior from the array slice so far (see 81f028a for a hacky slice implementation using the transfer intrinsic).

@aman-godara
Copy link
Member Author

aman-godara commented May 28, 2021

What would this return?

array = [1, 2, 3, 4, 5]
print *, array(-2 : 10 : 2)

in my case it returns 2 4 .
Is it right or should it return output equivalent to array(1 : 5 : 2) which would be 1 3 5 .

commit fa88905 version and commit 42a905d version of strings' slice function are designed keeping in mind that 1 3 5 would be returned by array slicing but I realised that my compiler returns 2 4 .

@ivan-pi
Copy link
Member

ivan-pi commented May 29, 2021

What would this return?

You can think of the slice syntax -2:10:2 as the integer array [-2,0,2,4,6,8,10]. The 2nd and 4th elements are then printed. However, I don't think we want to follow the behavior of the compiler in this case because the example violates good programming practice. gfortran raises the warning

test_slice_index.f90:5:15:

    5 | print *, array(-2 : 10 : 2)
      |               1
Warning: Lower array reference at (1) is out of bounds (-2 < 1) in dimension 1

If you add the -fcheck=all flag the program will produce: Fortran runtime error: Index '-2' of dimension 1 of array 'array' outside of expected range (1:5).

In other words, the Fortran intrinsic slicing does not do any smart resetting. If you access out of bounds you will get undefined behavior. Moreover, negative indexes are allowed because Fortran arrays can use custom bounds:

integer :: array(-2:10)
integer :: i
array = [(i, i = -2, 10)]
print *, array(-2 : 10 : 2)
print *, array([-2,0,2,4,6,8,10])
end

@awvwgk
Copy link
Member

awvwgk commented Jun 10, 2021

We need a couple of more reviewers here to move this patch forward.

doc/specs/stdlib_string_type.md Show resolved Hide resolved
src/stdlib_strings.f90 Show resolved Hide resolved
src/tests/string/test_string_functions.f90 Outdated Show resolved Hide resolved
Comment on lines +111 to +154
subroutine test_slice_gen
character(len=*), parameter :: test = &
& "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
integer :: i, j, k
integer, parameter :: offset = 3

do i = 1 - offset, len(test) + offset
call check_slicer(test, first=i)
end do

do i = 1 - offset, len(test) + offset
call check_slicer(test, last=i)
end do

do i = -len(test) - offset, len(test) + offset
call check_slicer(test, stride=i)
end do

do i = 1 - offset, len(test) + offset
do j = 1 - offset, len(test) + offset
call check_slicer(test, first=i, last=j)
end do
end do

do i = 1 - offset, len(test) + offset
do j = -len(test) - offset, len(test) + offset
call check_slicer(test, first=i, stride=j)
end do
end do

do i = 1 - offset, len(test) + offset
do j = -len(test) - offset, len(test) + offset
call check_slicer(test, last=i, stride=j)
end do
end do

do i = 1 - offset, len(test) + offset
do j = 1 - offset, len(test) + offset
do k = -len(test) - offset, len(test) + offset
call check_slicer(test, first=i, last=j, stride=k)
end do
end do
end do
end subroutine test_slice_gen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it even harder to see what the expected behavior is supposed to be. The goal of a good test suite is to serve as an example of how to use the code, and a definition of it's expected behavior. You should go back to the specific tests you had and just give them more meaningful descriptions. The message from a failing test should be a hint as to what aspect of the code is not working correctly. Something as vague as "it failed" doesn't accomplish that. Ideally, I should be able to read the test suite alone and understand the expected behavior.

@awvwgk awvwgk dismissed ivan-pi’s stale review June 12, 2021 16:36

Requested changes have been addressed

@awvwgk
Copy link
Member

awvwgk commented Jun 12, 2021

Thanks everybody for the discussion on this patch. To move #433 forward I will go ahead and merge this PR.

@awvwgk awvwgk merged commit d2ac7ae into fortran-lang:master Jun 12, 2021
@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Jun 12, 2021
@aman-godara aman-godara deleted the develop_slice branch June 15, 2021 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

intelligent slice functionality for strings
4 participants