Add initial scope for stdlib #43

certik · 2019-12-23T19:07:42Z

In this PR, let's try to summarize the agreed upon scope from #1. I suggest we keep this very general, because we will have to discuss in each case the details of what should be in, but it would be very helpful to have general guidelines of what the scope is, to guide people when proposing ideas for stdlib.

certik · 2019-12-23T19:13:27Z

Perhaps we can write it like this:

Utilities (Containers, Algorithms, Strings, Files, OS/Environment integration, Unit testing & assertions stuff, Logging, Searching and sorting, ...)
Mathematics (Linear algebra, Sparse matrices, Special functions, FFT, Random numbers, Statistics, ODE solvers, Numerical integration, Optimization, ...)

and list things that we seem to agree should be part of it, and leave the ... in there to show that more ideas, along the lines of the listed items, are allowed. Instead of Utilities and Mathematics, is there a way to structure the future contents in a few more categories?

Matlab has nice top level categories and subcategories: https://www.mathworks.com/help/matlab/mathematics.html. In addition to what is in there, we need to have the Utilities (is there a better name?) section listed above. Besides the Matlab list and Utilities, is there anything else we want to include?

jvdp1 · 2019-12-23T20:28:20Z

Not sure what "algorithms" includes. So maybe something like that:

Utilities (Containers, Strings, Files, OS/Environment integration, Unit testing & assertions stuff, Logging, ...)
Algorithms (Searching and sorting, merging, ...)
Mathematics (Linear algebra, Sparse matrices, Special functions, FFT, Random numbers, Statistics, ODE solvers, Numerical integration, Optimization, ...)

certik · 2019-12-23T20:42:00Z

@jvdp1 thanks! I updated the PR based on your feedback.

certik · 2019-12-23T22:22:43Z

@milancurcic any ideas here? This will not be set in stone (we will iterate on this in the future), but I feel we do need at least some general guidelines today of what is in scope, and some general guidelines of what is not in scope.

Here are some ideas of what is not in scope:

physics specific algorithms: electronic structure, fluid dynamics, ...
specific mathematical methods: finite difference support, finite element support (except sparse matrices, which are in scope, because they are not tied to a specific mathematical method, but apply to all kinds of fields, such as finite differences, finite volumes, graphs, etc.)
highly optimized specific algorithms that require tens of thousands of lines to implement, such as OpenBlas --- if the algorithm and infrastructure requires a project on its own, then stdlib can perhaps depend on it, but it should not be part of stdlib itself.

Some ideas of what is in scope:

Things that are applicable to more than one field, and that can be implemented in a single module (preferably) with the order of ~1000 lines (could be a bit more) per feature as opposed to a 100,000 lines per feature, and that can be (preferably) done in Fortran, as opposed to OpenBlas which must be done in assembly.
In some sense, stdlib would contain a "reference implementation" of the algorithms, similar to reference LAPACK. We will try to optimize as much as we can of course, in Fortran. Then compiler vendors can provide a highly optimized versions in assembly (OpenBLAS or MKL). We can even provide some highly optimized versions ourselves, as an option, but the pure Fortran reference implementation would be the main and default implementation.

Let's discuss this, and finish this PR relatively soon (even if it is not 100% perfect), so that people know what the goal of stdlib is.

milancurcic · 2019-12-23T23:52:19Z

The categories listed are quite broad and I agree with the overall direction.

Here are some specific items that I would like to have, most fit within the broad categories listed. First, a reasonable list of items for stdlib -- would be immediately applicable, and libraries already exist:

Generic linked list and dictionary;
String functions and perhaps even String type;
High-level interface to I/O (Proposal for high level I/O #14)
Interface to processes (POSIX)

Now, for a less reasonable wishlist, these are the things that I am particularly interested, but may or may not be fitting for stdlib:

A parallel array (abstraction over coarrays);
Reading and writing common image formats, ppm, tiff, jpeg, png (Reading and writing common image formats, ppm, tiff, jpeg, png #45);
Interface to SQL (sqlite would cover most needs, @arjenmarkus has an interface);
A web client and server, with a stack of common protocols (UDP, TCP, HTTP) (once you have items 1, 2, and 4, this becomes much easier to implement).

These latter items would take a longer journey, first through their own standalone libraries, and later could be evaluated for stdlib.

certik · 2019-12-24T00:09:12Z

What would be the advantage / use case for 8.? Python does have a basic webserver in the standard library. But I am curious what application Fortran users would like to use a web server for.

certik · 2019-12-24T00:12:36Z

It seems in the first phase, we should only include things for which we already have prior implementations, it's "just" about agreeing on the API, and making the implementation complete with regards to all combinations of real and integer kinds and other corner cases.

For things that there is no prior implementation, or the implementation is not straightforward, we should probably first have them in separate libraries, and only later consider inclusion into stdlib, as you said.

milancurcic · 2019-12-24T01:31:49Z

A Fortran web client would allow Fortran programs to read data over the network. This is only becoming more useful as more and more data is stored in the cloud, typically in flat object stores. For example applications, weather (100% Fortran) and ocean (99% Fortran) prediction systems rely on external data that is downloaded periodically. In typical workflows, this is done in some other language or some external tool from shell, like wget or curl.

The kind of gluing of tools and languages that is ubiquitous to weather prediction systems (and other similar systems) is not for any other reason but that Fortran is adequate for heavy and parallel compute, but not much more of the workflow -- downloading and storing data, logging, databases etc. If Fortran was adequate for the other tasks, the whole system would be implemented in the same language.

This specific example can then be extended to any web service out there that is serving data via HTTP or similar protocol. You make a request, get a JSON dict from it. All of a sudden, Fortran programs have direct access to a zillion existing web services. Great!

A web server would provide similar, but reverse. If you'd like a Fortran app to serve data (whether it's logging data from an instrument or a parallel CFD solver) to web clients (any tool or programming language -- they all speak the same language), now you can. A single-user use case would be a long-running HPC application that logs progress through the web server. Now you can watch it from your browser, rather than going through your terminal, ssh, and tail the log file.

A web server + client is a convenient way to interoperate programs written in any language, as long as you properly match their HTTP APIs. Over a network or locally.

I think this is one of the things where it's not easy to imagine that this would be useful in Fortran, only because it hasn't been easy to do in Fortran, so nobody did it.

I agree that this is a task for a specialized library for the time being.

certik · 2019-12-24T06:21:05Z

@milancurcic I see. Yes, I think you are right. Being able to create JSON based HTTP API would be very useful.

certik · 2019-12-24T18:12:35Z

Let's keep iterating on this. How about:

In the first phase, we are trying to stay in pure Fortran and most of these items already have a prior Fortran implementation by various people, and our job is to agree with a wide community on the API. This is our initial scope:

Utilities (strings, files, OS/environment integration and interface to processes, unit testing & assertions, logging, high level interface to IO, ...)
Algorithms and containers (searching and sorting, merging, hash tables / dictionaries, ...)
Mathematics (linear algebra, sparse matrices, special functions, fast Fourier transform, random numbers, statistics, ordinary differential equations, numerical integration, optimization, ...)
Reading and writing images (PPM, ...)

The following items are potential features (they should start as separate projects and we can discuss later if they should be included):

A parallel array (abstraction over coarrays);
Reading and writing more common image formats beyond PPM: tiff, jpeg, png
Interface to SQL (sqlite would cover most needs)
A web client and server, with a stack of common protocols (UDP, TCP, HTTP)

Here are example items that are not in scope:

physics specific algorithms: electronic structure, fluid dynamics, ...
specific mathematical methods: finite difference support, finite element support (except sparse matrices, which are in scope, because they are not tied to a specific mathematical method, but apply to all kinds of fields, such as finite differences, finite volumes, graphs, etc.)
highly optimized specific algorithms that require tens of thousands of lines to implement, such as OpenBlas --- if the algorithm and infrastructure requires a project on its own, then stdlib can perhaps depend on it, but it should not be part of stdlib itself.

jvdp1 · 2019-12-24T18:23:57Z

Reading and writing images (PPM, ...)

Should it not be in utilities? Otherwise, I would create a specific item for I/O operations:
"I/O operations: files, high-level interface for IO, reading and writing PPM images"

physics specific algorithms: electronic structure, fluid dynamics, ...
I would write:
"field-specific (e.g., physics) algorithm:...",
Not everybody programming in Fortran works in physics ;)

For the rest, I think it is a good first general presentation.

certik · 2019-12-24T19:04:58Z

Good point, let's put images into Utilities for now. It will be PPM only, so I think Utilities is a good fit for it now.

Yes, field-specific is better. Do you have some examples of non-physics fields that we can list there?

milancurcic · 2019-12-24T19:34:03Z

specific mathematical methods: finite difference support

I'd argue that basic finite difference (analog to numpy.diff()) would be quite generally useful and in scope. Think of just calculating a first derivative of a time series or a gradient. I agree that this shouldn't cover all the fancy 17th order finite difference schemes, but a simple 1st order diff() would go a long way.

certik · 2019-12-24T19:44:27Z

I'd argue that basic finite difference (analog to numpy.diff()) would be quite generally useful and in scope. Think of just calculating a first derivative of a time series or a gradient. I agree that this shouldn't cover all the fancy 17th order finite difference schemes, but a simple 1st order diff() would go a long way.

Yeah, I was thinking that too.

I just want to have some examples that are clearly out of scope, so that we have some guidelines to prevent growing stdlib into a huge bloated library doing everything.

jvdp1 · 2019-12-24T22:25:28Z

Yes, field-specific is better. Do you have some examples of non-physics fields that we can list there?

@certik A non-physics example: quantitative genetics

arjenmarkus · 2019-12-27T20:19:14Z

Wrt Milan's point 7: the SQLite interface in my Flibs project (http://flibs.sf.org - yes, I should probably move it to Github ;)) could definitely use a "facelift", as the interface implementation predates the ISO C binding that now makes life so much easier.

zbeekman · 2019-12-29T22:40:22Z

This can be a living document. I think all these ideas are good, and we shouldn't get too hung up on the details quite yet. Providing broad scopes, and some specific examples is certainly worthwhile, and I've liked everything I've seen here. 👏

certik · 2020-01-02T18:52:07Z

Looks like we mostly agree. So I am going to merge it, as this is better to have at least some scope in the README than nothing. And we can iterate on this as we go.

Add initial scope for stdlib

f97bb2a

certik mentioned this pull request Dec 23, 2019

What should be part of stdlib? #1

Open

Improve the scope list based on feedback

ddb77d5

certik merged commit 1903066 into fortran-lang:master Jan 2, 2020

certik deleted the scope branch January 2, 2020 18:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial scope for stdlib #43

Add initial scope for stdlib #43

certik commented Dec 23, 2019

certik commented Dec 23, 2019 •

edited

Loading

jvdp1 commented Dec 23, 2019

certik commented Dec 23, 2019

certik commented Dec 23, 2019 •

edited

Loading

milancurcic commented Dec 23, 2019 •

edited by certik

Loading

certik commented Dec 24, 2019 •

edited

Loading

certik commented Dec 24, 2019

milancurcic commented Dec 24, 2019

certik commented Dec 24, 2019

certik commented Dec 24, 2019 •

edited

Loading

jvdp1 commented Dec 24, 2019 •

edited

Loading

certik commented Dec 24, 2019

milancurcic commented Dec 24, 2019

certik commented Dec 24, 2019

jvdp1 commented Dec 24, 2019 •

edited

Loading

arjenmarkus commented Dec 27, 2019

zbeekman commented Dec 29, 2019

certik commented Jan 2, 2020

Add initial scope for stdlib #43

Add initial scope for stdlib #43

Conversation

certik commented Dec 23, 2019

certik commented Dec 23, 2019 • edited Loading

jvdp1 commented Dec 23, 2019

certik commented Dec 23, 2019

certik commented Dec 23, 2019 • edited Loading

milancurcic commented Dec 23, 2019 • edited by certik Loading

certik commented Dec 24, 2019 • edited Loading

certik commented Dec 24, 2019

milancurcic commented Dec 24, 2019

certik commented Dec 24, 2019

certik commented Dec 24, 2019 • edited Loading

jvdp1 commented Dec 24, 2019 • edited Loading

certik commented Dec 24, 2019

milancurcic commented Dec 24, 2019

certik commented Dec 24, 2019

jvdp1 commented Dec 24, 2019 • edited Loading

arjenmarkus commented Dec 27, 2019

zbeekman commented Dec 29, 2019

certik commented Jan 2, 2020

certik commented Dec 23, 2019 •

edited

Loading

certik commented Dec 23, 2019 •

edited

Loading

milancurcic commented Dec 23, 2019 •

edited by certik

Loading

certik commented Dec 24, 2019 •

edited

Loading

certik commented Dec 24, 2019 •

edited

Loading

jvdp1 commented Dec 24, 2019 •

edited

Loading

jvdp1 commented Dec 24, 2019 •

edited

Loading