Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of stdlib_experimental_stats function mean #119

Closed
wants to merge 4 commits into from

Conversation

jvdp1
Copy link
Member

@jvdp1 jvdp1 commented Jan 21, 2020

Here is a proposal of a function mean in stdlib_experimental_stats (see #113 for discussions and comments on the API). The API of mean follows the one of Fortran functions such as sum, minval, maxval (except for mask that is not implemented (yet) in mean).

The function mean supports arrays with up to rank 15 for Fortran2003 and higher; otherwise, it supports up to rank 7 (for CI on Ubuntu 7).

All files were generated with fypp and the fypp files are still in the directory src (fypp comments could be probably cleaner). The .f90 files are probably easier to use for checking and commenting.

Note: I modified the Makefile and CMake files but it need to be checked carefully.

A Markdown spec file is in src.

To do in a following PR (if accepted): if this PR is accepted and merged: addition of mask as optional argument for mean, to fully follow Fortran conventions. It should be easy to implement when the current proposition is validated.

Squashed commit of the following:

commit 3266163
Merge: e96c12d 4274f0d
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 20:47:32 2020 +0100

    modification of CMake and Makefile

    Merge branch 'stat_cmake' into stat_dev

commit 4274f0d
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 20:44:24 2020 +0100

    stat_cmake: update Makefile

commit 17e3d16
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 20:35:19 2020 +0100

    second try cmake

commit 397eb18
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 20:18:37 2020 +0100

    Modifications of CMake for tests on Ubuntu 7

commit e96c12d
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 19:40:05 2020 +0100

    small change in md

commit 7eec9ae
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 19:37:06 2020 +0100

    stat_dev: renamed stat to stats

commit 8199b6d
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 19:26:20 2020 +0100

    stat_dev: changed spec

commit b1c481d
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 19:15:25 2020 +0100

    stat_dev: modifs following comments

commit e64657c
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 14:23:59 2020 +0100

    stat_dev: addition of .md file for mean

commit ad504e8
Merge: 5a1adcb bab50e3
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 13:16:21 2020 +0100

    Merge remote-tracking branch 'jvdp1/stat_dev_1' into stat_dev

commit bab50e3
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 13:13:00 2020 +0100

    stat_dev_1: changed all to iterations

commit 8d4c11f
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 10:31:26 2020 +0100

    stat_dev_1:moved all calls to mean functions to loops

commit 922e523
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Tue Jan 21 09:13:10 2020 +0100

    stat_dev_1: update test_mean

commit 5a1adcb
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 23:55:15 2020 +0100

    stat_dev: inverting loops for efficiency

commit 86970ae
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 22:54:55 2020 +0100

    stat_dev: use specific interface

commit 6574a67
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 22:36:33 2020 +0100

    stat_dev: addition of calls to error_stop

commit e98090b
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 21:32:27 2020 +0100

    stat_dev: extension to rank 15

commit e0e3092
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 12:46:44 2020 +0100

    stat_dev: simplified merge

commit 22ff6e4
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 10:38:41 2020 +0100

    stat_dev: progress rank 3

commit 7612613
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Mon Jan 20 10:34:06 2020 +0100

    stat_dev: add rank 3

commit 60ab523
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 22:14:51 2020 +0100

    stat_dev: addition of integer cases

commit 6fb6ca5
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 21:03:46 2020 +0100

    stat_dev: avoid allocatable functions

commit a1c6353
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 20:11:49 2020 +0100

    modification to have the same behaviour as Fortran sum

commit 72500e1
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 15:23:34 2020 +0100

    stat_dev: add error_stop

commit 1272574
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 15:02:52 2020 +0100

    stat_dev: update Makefile

commit 426d43f
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 12:21:09 2020 +0100

    stat_dev: addition of test and creation of modules and submodules with fypp

    how to use pure functions inside submodules

commit 965f37b
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 11:35:11 2020 +0100

    moved to submodules

    how to use pure functions in submodules

commit d9af336
Author: Vandenplas, Jeremie <jeremie.vandenplas@gmail.com>
Date:   Sun Jan 19 11:22:52 2020 +0100

    stat_dev: init
Copy link
Member

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me as a start.

Couple questions:

  1. Couldn't the f03_stdlib_experimental_stats and f90__stdlib_experimental_stats be combined? Which compiler does not support the modern syntax?

  2. Inside f03_stdlib_experimental_stats, couldn't the contents be generated using fypp also? It seems repetitive.

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 21, 2020

  1. Couldn't the f03_stdlib_experimental_stats and f90__stdlib_experimental_stats be combined? Which compiler does not support the modern syntax?

The compiler in the CI Ubuntu 7 does not support arrays with rank >7.

  1. Inside f03_stdlib_experimental_stats, couldn't the contents be generated using fypp also? It seems repetitive.

@certik No sure to understand the question. Both files f90_stdlib_experimental_stats.f90 and f03_stdlib_experimental_stats.f90 are generated from stdlib_experimental_stats.fypp with fypp. Same thing for fxx_stdlib_experimental_stats_mean.f90. The flag VERSION90 is used to generate f90_ or f03_ files.

@certik
Copy link
Member

certik commented Jan 21, 2020

Ah ok, so all the repetitive files are generated from a simple template. That answers my question.

What is Ubuntu 7? Do you mean GFortran 7?

I am fine with the PR as is for now. Later on I think we should not check in generated files, as discussed in #35.

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 21, 2020

Ah ok, so all the repetitive files are generated from a simple template. That answers my question.

Yes, indeed.

What is Ubuntu 7? Do you mean GFortran 7?

GFortran from the Github actions under the name 'ubuntu-latest, 7) (see here for some details, before I modified the CMake files)

Here is the reported message:

/home/runner/work/stdlib/stdlib/src/stdlib_experimental_stat.f90:134:44:

 real(sp), intent(in) :: x(:,:,:,:,:,:,:,:)
                                        1

Error: Array specification at (1) has more than 7 dimensions
/home/runner/work/stdlib/stdlib/src/stdlib_experimental_stat.f90:138:44:

 real(sp), intent(in) :: x(:,:,:,:,:,:,:,:,:)
                                        1

Error: Array specification at (1) has more than 7 dimensions
/home/runner/work/stdlib/stdlib/src/stdlib_experimental_stat.f90:142:44:

 real(sp), intent(in) :: x(:,:,:,:,:,:,:,:,:,:)
                                        1

Error: Array specification at (1) has more than 7 dimensions

I am fine with the PR as is for now. Later on I think we should not check in generated files, as discussed in #35.

I agree we should not check in generated files. However, no decision has been made in #35. So I prefered to keep both fypp files and the generated files. When a decision will be made in #35, generated files will be removed.
I guess that later CMake should trigger the generation of the f90 files (but I don't know how to do that).

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. The spec doc needs work, see my comments and question there.

Sorry to be so pedantic about this. Considering that this PR comes in before we have a spec template, we need to get the formatting right, otherwise it will only call for another PR. We can nail these things down here.

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 22, 2020

Sorry to be so pedantic about this. Considering that this PR comes in before we have a spec template, we need to get the formatting right, otherwise it will only call for another PR. We can nail these things down here.

@milancurcic No problem. I will integrate them when the question about dim will be solved.

Comment on lines 154 to 158
#:def isuffix(rank)
#:if rank > 0
i#{for i in range(2,rank)}#${",i"+"_" * (i-1)}$#{endfor}#
#:endif
#:enddef
Copy link
Member

@ivan-pi ivan-pi Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How complicated would it be to use more standard variable names such as i,j,k,l,m,n...? Maybe it is worth discussing whether we care about the style of auto-generated code.

I cannot find the place where you encoded the 132 line character limit. The generated code seems to obey this rule:

do i______________ = 1, size(x, 15)
     do i_____________ = 1, size(x, 14)
      do i____________ = 1, size(x, 13)
       do i___________ = 1, size(x, 12)
        do i__________ = 1, size(x, 11)
         do i_________ = 1, size(x, 10)
          do i________ = 1, size(x, 9)
           do i_______ = 1, size(x, 8)
            do i______ = 1, size(x, 7)
             do i_____ = 1, size(x, 6)
              do i____ = 1, size(x, 5)
               do i___ = 1, size(x, 4)
                do i__ = 1, size(x, 3)
                 do i_ = 1, size(x, 2)
                  do i = 1, size(x, 1)
                   res = res + x(i,i_,i__,i___,i____,i_____,i______,i_______,i________,i_________,i__________,i___________,i_______&
                       &_____,i_____________,i______________)
                  end do
                 end do
                end do
               end do
              end do
             end do
            end do
           end do
          end do
         end do
        end do
       end do
      end do
     end do
    end do

Impressive work!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review.

How complicated would it be to use more standard variable names such as i,j,k,l,m,n...? Maybe it is worth discussing whether we care about the style of auto-generated code.

I am not sure that I would call i,j,k,l,... standard variables. Anyway, I started this implementation by using i,j,k,l,.... But afterwards, I changed my idea because it means that it will reserve 15 characters (in these cases) for loop iterations. Also, it was easier for me to use additional "_" in fypp because it involves a simple loop.

Maybe @aradi has some suggestions regarding this issue.

I cannot find the place where you encoded the 132 line character limit. The generated code seems to obey this rule:

It can be changed in fypp by using the argument -l. Indeed the default is 132. Should we limit it to 120? Note that in the future I would expect that these files are generated by CMake, and would not be directly available on Github.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that I would call i,j,k,l,... standard variables. Anyway, I started this implementation by using i,j,k,l,.... But afterwards, I changed my idea because it means that it will reserve 15 characters (in these cases) for loop iterations. Also, it was easier for me to use additional "_" in fypp because it involves a simple loop.

I understand your reasoning. It is also easier to understand for the maintainers of the template file. I was only wondering whether we care about "style" (including variable name choice) in the auto-generated files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find the place where you encoded the 132 line character limit. The generated code seems to obey this rule:

It can be changed in fypp by using the argument -l. Indeed the default is 132. Should we limit it to 120? Note that in the future I would expect that these files are generated by CMake, and would not be directly available on Github.

In the style guide of stdlib, it is mentioned that:

Line length should be limited to 80 characters and must not exceed 132.

So we could limit fypp to 80 characters, or use the default lenght which is 132.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivan-pi Following @aradi 's suggestion I kept the default lenght (132) for the auto-generated files.
I also replaced all the underscores by numbers. This is now much easier to read. Thank you for mentioning this issue!

@aradi
Copy link
Member

aradi commented Jan 23, 2020

As for the line length, yes fypp defaults to 132. You can override it with the -l option, but I am not sure, whether it is worth the effort. Most of the time, programmers will have to deal with the fypp-source file not with the generated one, so I think, one should impose the 80 character limit in on those.

As for the variable names: One shoud probably use number suffixes (e.g. i1, i2, etc.) instead of repeated underscores, as latter are very difficult to debug in my opinion. Something along the line:

#:def varsuffix(rank)
${str(rank)}$
#:enddef

#:def varlist(varname, listsize)
${",".join([varname + varsuffix(i) for i in range(1, listsize + 1)])}$
#:enddef

[...]

integer :: ${varlist("i", 10)}$

[...]

#:for fj in range(rank, 0, -1)
  do i${varsuffix(fj}$ = 1, size(x, dim=${fj}$)
#:endfor

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

As for the variable names: One shoud probably use number suffixes (e.g. i1, i2, etc.) instead of repeated underscores, as latter are very difficult to debug in my opinion. Something along the line:

Thank you @aradi for this suggestion. Using numbers instead of "_" should be easier to debug indeed. I will implement your suggestion in this fypp file.

@aradi
Copy link
Member

aradi commented Jan 23, 2020

I guess I am overseeing something trivial, but are the nested do loops really necessary? For example in mean_${rank}$_all_${k1}$_${k1}$(x) they seem to be just making a simple summation, where we could use simply sum() instead. We could then probably spare a few fypp-instructions. (According to my experience with pre-processing, each spared pre-processor instruction improves readability, even if it was a fypp-directive 😉 )

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

I guess I am overseeing something trivial, but are the nested do loops really necessary? For example in mean_${rank}$_all_${k1}$_${k1}$(x) they seem to be just making a simple summation, where we could use simply sum() instead. We could then probably spare a few fypp-instructions. (According to my experience with pre-processing, each spared pre-processor instruction improves readability, even if it was a fypp-directive )

I agree. There are indeed simple sums. I implemented loops to be in agreement with the other scenarios, and also because I observed that such loops might outperform sum. In addition, when computing the sum of an integer array, it is first converted to a double-precision array. I think that doing something like sum(real(x, dp)) may request a huge temporary array in some cases, while only a temporary scalar is needed when loops are used. Am I wrong?

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

Thank you @aradi for the suggestion of using number suffixes (e.g. i1, i2, etc.) instead of repeated underscores. It is indeed easier to debug. This suggestion is now implemented in the last commit.

@certik @milancurcic @ivan-pi @aradi

I decided to keep all sums in loops in mean_${rank}$_all_${k1}$_${k1}$(x) (instead of replacing them by sum(x). My reasoning is that their structure remained similar to the one of mean_${rank}$_all_${k1}$_dp$(x) for which I want to avoid something like sum(real(x, dp)) (which may imply a temporary array). If I am wrong, please correct me.

If you agree with the PR in this state, please approve it explicitely. Then I guess it may be merged.

@certik
Copy link
Member

certik commented Jan 23, 2020

I have issues with committing autogenerated 30K lines long files into a git repository. The diff at some of your commits won't even show at GitHub because it's so long.

Why not to use this opportunity and do the right thing --- not commit autogenerated files, and rather build them as needed at the CI? I can help.

The other issue I have is that I think we should use sum instead of the long loops.

My last issue is that macOS builds fail. It's probably unrelated to this PR (#122), but we need to fix it.

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

I have issues with committing autogenerated 30K lines long files into a git repository. The diff at some of your commits won't even show at GitHub because it's so long.

Why not to use this opportunity and do the right thing --- not commit autogenerated files, and rather build them as needed at the CI? I can help.

I started to implement that in another branch. I can merge it in this branch (or I open another PR).

The other issue I have is that I think we should use sum instead of the long loops.

As discussed earlier in this thread, I avoided to use sum to avoid temporary arrays, especially 1) when the mean of integer arrays are computed (e.g., to avoid sum(real(x, dp))) and 2) when the mean along a dimension need to be computed (e.g., sum(x, 8)). There will be no issue for small arrays, but for large arrays, it might create memory problems. Also, my experience is that loops might be more (memory-)efficient for such scenarios. But I may be wrong on that and overlook some things.

@certik
Copy link
Member

certik commented Jan 23, 2020

If the generated file is not committed to the repository, then I am fine with loops, as we can easily change that later.

@certik
Copy link
Member

certik commented Jan 23, 2020

Can you merge with master to get #123 in? The tests should pass now.

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

Can you merge with master to get #123 in? The tests should pass now.

I will do that. Or should I first merge the master in my branch stat_mean_dev_fypp (with modified CMake files and Makefile.manual). The 2 large .f90 are now removed and auto-generated by CMake. I also modify the CI (but not for Windows).

@certik
Copy link
Member

certik commented Jan 23, 2020

In that case I would suggest you create a fresh PR with clean commits on top of the master (no generated files are checked in).

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

In that case I would suggest you create a fresh PR with clean commits on top of the master (no generated files are checked in).

Good. I will do that.

@jvdp1
Copy link
Member Author

jvdp1 commented Jan 23, 2020

I will close this PR in favor of #124, as suggested by @certik.
In additition to this PR, #124 supports the auto-generation of .f90 files by fypp.

@jvdp1 jvdp1 closed this Jan 23, 2020
@jvdp1 jvdp1 deleted the stat_mean_dev branch January 27, 2020 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants