Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting2 #408

Merged
merged 51 commits into from
Jun 3, 2021
Merged

Sorting2 #408

merged 51 commits into from
Jun 3, 2021

Conversation

wclodius2
Copy link
Contributor

Merged inconsistent changes in doc/specs/stdlib_sorting.mb and src/stdlib_sorting.fypp.

wclodius2 and others added 20 commits April 10, 2021 21:23
Added the draft markdown documentation file, stdlib_sorting.md.

[ticket: X]
Removed trailing whitespace from the markdown document, stdlib_sorting.md.

[ticket: X]
Added a reference to stdlib_sorting.md to index.md

[ticket: X]
Added a licnsing section. Changed the description of `ORD_SORT`, `ORD_SORTING`,
and `UNORD_SORT`. Added more examples for `ORD_SORTING`. Simplified the
discussion of `INT_SIZE`. Used `stdlib_kinds` instead of `iso_fortran_env`.
Changed `processor` to `compiler`. Fixed spelling of array.

[ticket: X]
Added sorting for `character(*)` and `string_type` rank 1 arrays. Improved the
overall documentation. Added newer benchmarks for the addiitons and the Intel
One API compiler system.

[ticket: X]
Added and commited the main sorting source codes: stdlib_sorting.fypp,
stdlib_sorting_sort.fypp, stdlib_sorting_ord_sort.f90, and
stdlib_sorting_sort_index.fypp. Also commited the revised CMakeLists.txt and
Makefile.manual files needed co compile the library including the new codes.

[ticket: X]
Commited the stdlib_sorting test code tests/sorting/test_sorting.f90, and related
CMakeLists.txt, and Makefile.manual files needed to compile the test code:
tests/CMakeLists.txt, tests/Makefile.manual, tests/sorting/CMakeLists.txt, and
tests/sorting/Makefile.manual.

[ticket: X]
Commited the revised stdlib_sorting markdown document stdlib_sorting.md. It was
revised largely to reflect changes in test_sorting.f90 intended to shorten the
processor time devoted to string_type sorting.

[ticket: X]
In both stdlib_sorting_ord_sort.fypp and stdlib_sorting_sort_index.fypp there
was an off by one array assignment, that was repeated several times throough
copy/paste/edit.
                buf(0:array_len-mid) = array(mid:array_len-1)
should have been
                buf(0:array_len-mid-1) = array(mid:array_len-1)
Fixed in both files.

[ticket: X]
The code stdlib_sorting_sort.f90 was failing on some systems becasue it
improperlly assumed short circuiting
                do while( i >= 0 .and. array(i) > key )
should have been
                do while( i >= 0 )
		    if (array(i) <= key ) exit

[ticket: X]
Changed the markdown document docs/specs/stdlib_sorting.md to correct problems
noted by gareth_nx. Changed comments in the FYPP file
src/stlib_sorting_sort_index.fypp to note that the input array is sorted on
output.

[ticket: X]
Forgot to save an edit prior to the previous commit of
docs/specs/stdlib_sorting.md. Saved and commited that edit.

[ticket: X]
Review-dog caught several misspellings in the comments of the source codes.

[ticket: X]
Modified doc/specs/index.md so it mentioned stdlib_sorting.md.

[ticket: X]
Added changes in the description of `SORT_INDEX`.

[ticket: X]
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Made various corrections in the document stdlib_sorting.md and the code
stdlib_sorting.fypp.

[ticket: X]
I accepted some of Jeremie's proposed changes while implementing some
in my local copy so they became inconsistent.
Copy link
Contributor

@gareth-nx gareth-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- noting I already reviewed almost the same code at in the related pull request. I hope we get some more reviewers and this can be merged.

@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label May 15, 2021
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these implementations. Here are some general comments. Based on your answers , I will continue the review.

src/stdlib_sorting.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Outdated Show resolved Hide resolved
@w6ws
Copy link

w6ws commented May 17, 2021

A few API suggestions:

1.) In subroutine sort_index - the index, work, and iwork arrays should probably be intent(out), not intent(inout). Same with the work array in ord_sort.

2.) All three subroutines should have an optional return status that indicates success or failure. Right now there are some hard stops in the code - specifically when allocate is called. It would be good to have the ability to avoid these.

3.) Shouldn't all three have a 'reverse' argument so the caller can specify either ascending or decending sort order?

FWIW, in some code I've worked on, we use private types with public parameters for such things. E.g.,:

module sorting_mod
implicit none
private
...
type :: sortFlag_t ! private type
logical :: descending
end type

type(sortFlag_t), parameter, public :: &
SORTFLAG_ASCENDING = sortFlag_t (.false.), &
SORTFLAG_DESCENDING = sortFlag_t (.true.)

While in the case of sort_index there are only two options, so not strictly needed, the technique is general to other areas of a library where many more options for an argument might be offered. It ensures that the compiler can enforce correct option selections at compile time. And it provides clear indication of what is being specified - rather than some "magic number". It also removes some need for run-time validation checks.

@wclodius2
Copy link
Contributor Author

I am on travel and it is awkward for me to find the time to do the changes, but I will see what I can do.

Made various changes:
* in stdlib_sorting.fypp changed
  "public :: int_size ... integer, parameter :: in_size..." to
  "integer, parameter, public :: int_size ..."

* Eliminated the specific procedures public in stdlib_sorting.fypp and their
  discussion in stdlib_sorting.md.

* replaced specific reference to INT_KINDS and REAL_KINDS in
  stdlib_sorting_sort.fypp, stdlib_sorting_ord_sort.fypp, and
  stdlib_sorting_sort_index.fypp with reference to common.fypp.

* in stdlib_sorting_ord_sort.fypp made ERROR STOP return the procedure name
  in which the allocation error occured.

[ticket: X]
@wclodius2
Copy link
Contributor Author

I have committed and pushed changes suggested by Jeremie.

@wclodius2
Copy link
Contributor Author

@w6ws sorry for my late response, but I have been busy on my trip.

Seems like a pain to require the manual steps of reversing inputs and outputs to get a descending order. Especially since, perhaps stability aside, it is only a matter of which comparison operator to use within the sorting algorithms. Again, user-friendliness.

While logically the only difference is in the comparison operators in practice it wouldn't be implemented that way as it would result in a large duplication of code. It would instead be implemented using a reverse procedure.

@jvdp1
Copy link
Member

jvdp1 commented May 27, 2021

  1. The errors can only be "caught" for a memory allocation problem. With lazy allocation on Linux. and virtual memory on all systems, it is not clear what can cause an allocation error, or that such an error is recoverable. I am right now inclined to remove the stat argument to the allocate statements and simply let a memory error halt processing and let the processor report the problem if it occurs. This will allow my procedures to be pure.

In general I like library routines to return control to their caller when they detect errors - then let the caller decide what to do. For example in the case of sort_ord and sort_index, besides allocation errors, checks could also be made to ensure the scratch and output arrays are large enough. Just makes things more user friendly. But good point on enabling the procedures to be pure.

The topic of handling errors come often in stdlib. Would it be an idea to add an intent(out) integer (e.g., called info, as in BLAS/LAPACK), that would return a value different than 0, with allocation erros, and on checks on scratch and output arrays? With such an approach, the stat argument in the allocate statement could be kept, while error stop could be removed, and the procedure be pure.

I had the impression that @w6ws wanted the argument to be optional, in which case I would want the ERROR STOP to be present and we would not gain purity.

I agree with you: if the argument is optional, then an error stop is likely to be needed. However, I would not make such an argument optional (similarly to the info argument in many procedures of LAPACK, or other error arguments in other libraries): up to the user to use/check it or not.

Did you have any opinion on the @w6ws proposal for a reverse optional argument? I am concerned at the impact on the complexity of the API, and also on its impaction my time. I want to get this PR done, though the error handling is the bigger potential project.

I would suggest to keep this implementation of reverse for another issue/PR. I also like the approach proposed by @w6ws in this comment. This would allow different options in the future, like sorting in increasing or decreasing order, and in a stable or unstable approach,... However, this requires more discussions on the API and the approach before implementation, because it could influence future procedures.

@awvwgk @milancurcic any opinions on this PR?

@w6ws
Copy link

w6ws commented May 27, 2021

First, thank you for accepting my suggestion on intent(out) stuff!

Second, on the reverse capability - I'd think that if some of the sorts only support sorting in one direction, you'll have a lot of users scratching their heads wondering why. (Like me.)

Years ago I added a sorting routine to the ESMF library - https://github.com/esmf-org/esmf/blob/develop/src/Infrastructure/Util/src/ESMF_UtilSort.cppF90. This factored out a bunch of little sorting routines that had been sprinkled throughout the library. It is based on the LAPACK SLASRT sort. I templatized it for various data types using the C preprocessor (same technique is also used in many other parts of the ESMF library.) Taking a look at it, the IF ELSE ENDIF blocks for direction are done at an outermost level. The loops and interiors of the blocks are identical - except for comparison direction. A characteristic of the SLASRT-based code is that it doesn't use much temporary storage space. So passing in, or creating local, work arrays are not needed.

I've also, for personal use, implemented simplified versions of the HPF SORT_UP, SORT_DOWN, GRADE_UP, and GRADE_DOWN functions. In these cases, the direction of the sort is embedded in the entry point name. So no conditional code is needed for sort order. FWIW, I implemented SORT_UP and SORT_DOWN again using SLASRT-based code. Basically copied the input array to the functions result area - then did the sort on the result area. I based GRADE_UP and GRADE_DOWN on the Numerical Recipies INDEXX routine. INDEXX also uses minimal scratch space. So again, large scratch arrays weren't needed.

Third, on error processing - yes I would suggest making an error return argument optional. Referring back to the ESMF library again, the coding style we used was to make an optional Return Code (rc=) argument as the last argument of every library routine. We also strongly encouraged users to check the RC. If you look at ESMF_UtilSort, it checks for a couple of things and if it finds an error it will (optionally) set the RC to something other than ESMF_SUCCESS, and then return. So it returns bad results and (optionally) an error RC, rather than a hard stop. The ESMF philosophy, for better or worse, has been to encourage programmers to always check the RC for success. I should note that ESMF normally operates in a MPI environment. Some users routinely run climate and weather models with hundreds or thousands of MPI processes. So if one thread abends, it can cause a calamity. We really wanted the users to handle their own errors.

@jvdp1
Copy link
Member

jvdp1 commented May 27, 2021

While logically the only difference is in the comparison operators in practice it wouldn't be implemented that way as it would result in a large duplication of code. It would instead be implemented using a reverse procedure.

I though a bit about this issue. If the main implementation is to replace all comparison operators (e.g., >) by its opposite (e.g., <), then I think it could be easily done by adding an additional fypp loop. In this case, no code should be duplicated. @wclodius2 I can give a try if it is a possible solution for you.

@wclodius2
Copy link
Contributor Author

While logically the only difference is in the comparison operators in practice it wouldn't be implemented that way as it would result in a large duplication of code. It would instead be implemented using a reverse procedure.

I though a bit about this issue. If the main implementation is to replace all comparison operators (e.g., >) by its opposite (e.g., <), then I think it could be easily done by adding an additional fypp loop. In this case, no code should be duplicated. @wclodius2 I can give a try if it is a possible solution for you.

At least for stdlib_sorting_ord_sort more than one form of comparison operator is used, i.e., > and >=. Other than that you are the expert on FYPPand if you think it is straightforward go for it.

@jvdp1
Copy link
Member

jvdp1 commented May 28, 2021

While logically the only difference is in the comparison operators in practice it wouldn't be implemented that way as it would result in a large duplication of code. It would instead be implemented using a reverse procedure.

I though a bit about this issue. If the main implementation is to replace all comparison operators (e.g., >) by its opposite (e.g., <), then I think it could be easily done by adding an additional fypp loop. In this case, no code should be duplicated. @wclodius2 I can give a try if it is a possible solution for you.

At least for stdlib_sorting_ord_sort more than one form of comparison operator is used, i.e., > and >=. Other than that you are the expert on FYPPand if you think it is straightforward go for it.

@wclodius2 I opened a PR against your branch sorting2.
I added:

  • an additional option in sort and ord_sort for reverse sorting
  • check (error stop) to be sure that it is failing when tests are not succesful

Note: I didn't update the specs, in case the proposed approach is not correct/appropriate.

Also, it could be cleaner to create a new branch (e.g., "sorting3") from "sorting2", to keep the lattter a bit clean.

wclodius2 and others added 2 commits May 28, 2021 21:59
Add option for reverse sort in `sort` and `ord_sort`
@jvdp1
Copy link
Member

jvdp1 commented May 29, 2021

@wclodius2 I opened this PR with some updates in the specs regarding the reverse optional arg.

jvdp1 and others added 6 commits May 30, 2021 20:28
Update of the specs for stdlib_sorting
Changed `[sorting](.stdlib_sorting.md)` to `[sorting](./stdlib_sorting.md)`

[ticket: X]
Uupdate of the comments in source code
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments on some checks. If you agree, I can have a look to implement them.

src/stdlib_sorting_ord_sort.fypp Show resolved Hide resolved
src/stdlib_sorting_ord_sort.fypp Outdated Show resolved Hide resolved
src/stdlib_sorting_sort_index.fypp Show resolved Hide resolved
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all the changes made, it is ready to be merged IMO

Thank you @wclodius2 for this module

@jvdp1
Copy link
Member

jvdp1 commented Jun 2, 2021

Could someone with a write access review this too? @milancurcic? @awvwgk? @ivan-pi?

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review, I followed the discussion in this thread a bit and started looking over the specs, but won't be able to give an in-depth review in the next days. Since we already have two approving reviews here, I'll approve based on @gareth-nx's review to make it count to towards the merge requirements. Sorry for being a bit missing in action here.

@wclodius2
Copy link
Contributor Author

This branch now has two approvals but still needs someone to merge the PR.

@awvwgk awvwgk merged commit 888b5d3 into fortran-lang:master Jun 3, 2021
@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants