Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial scope for stdlib #43

Merged
merged 2 commits into from
Jan 2, 2020
Merged

Add initial scope for stdlib #43

merged 2 commits into from
Jan 2, 2020

Conversation

certik
Copy link
Member

@certik certik commented Dec 23, 2019

In this PR, let's try to summarize the agreed upon scope from #1. I suggest we keep this very general, because we will have to discuss in each case the details of what should be in, but it would be very helpful to have general guidelines of what the scope is, to guide people when proposing ideas for stdlib.

@certik
Copy link
Member Author

certik commented Dec 23, 2019

Perhaps we can write it like this:

  • Utilities (Containers, Algorithms, Strings, Files, OS/Environment integration, Unit testing & assertions stuff, Logging, Searching and sorting, ...)
  • Mathematics (Linear algebra, Sparse matrices, Special functions, FFT, Random numbers, Statistics, ODE solvers, Numerical integration, Optimization, ...)

and list things that we seem to agree should be part of it, and leave the ... in there to show that more ideas, along the lines of the listed items, are allowed. Instead of Utilities and Mathematics, is there a way to structure the future contents in a few more categories?

Matlab has nice top level categories and subcategories: https://www.mathworks.com/help/matlab/mathematics.html. In addition to what is in there, we need to have the Utilities (is there a better name?) section listed above. Besides the Matlab list and Utilities, is there anything else we want to include?

@jvdp1
Copy link
Member

jvdp1 commented Dec 23, 2019

Not sure what "algorithms" includes. So maybe something like that:

  • Utilities (Containers, Strings, Files, OS/Environment integration, Unit testing & assertions stuff, Logging, ...)
  • Algorithms (Searching and sorting, merging, ...)
  • Mathematics (Linear algebra, Sparse matrices, Special functions, FFT, Random numbers, Statistics, ODE solvers, Numerical integration, Optimization, ...)

@certik
Copy link
Member Author

certik commented Dec 23, 2019

@jvdp1 thanks! I updated the PR based on your feedback.

@certik
Copy link
Member Author

certik commented Dec 23, 2019

@milancurcic any ideas here? This will not be set in stone (we will iterate on this in the future), but I feel we do need at least some general guidelines today of what is in scope, and some general guidelines of what is not in scope.

Here are some ideas of what is not in scope:

  • physics specific algorithms: electronic structure, fluid dynamics, ...
  • specific mathematical methods: finite difference support, finite element support (except sparse matrices, which are in scope, because they are not tied to a specific mathematical method, but apply to all kinds of fields, such as finite differences, finite volumes, graphs, etc.)
  • highly optimized specific algorithms that require tens of thousands of lines to implement, such as OpenBlas --- if the algorithm and infrastructure requires a project on its own, then stdlib can perhaps depend on it, but it should not be part of stdlib itself.

Some ideas of what is in scope:

  • Things that are applicable to more than one field, and that can be implemented in a single module (preferably) with the order of ~1000 lines (could be a bit more) per feature as opposed to a 100,000 lines per feature, and that can be (preferably) done in Fortran, as opposed to OpenBlas which must be done in assembly.

  • In some sense, stdlib would contain a "reference implementation" of the algorithms, similar to reference LAPACK. We will try to optimize as much as we can of course, in Fortran. Then compiler vendors can provide a highly optimized versions in assembly (OpenBLAS or MKL). We can even provide some highly optimized versions ourselves, as an option, but the pure Fortran reference implementation would be the main and default implementation.

Let's discuss this, and finish this PR relatively soon (even if it is not 100% perfect), so that people know what the goal of stdlib is.

@milancurcic
Copy link
Member

milancurcic commented Dec 23, 2019

The categories listed are quite broad and I agree with the overall direction.

Here are some specific items that I would like to have, most fit within the broad categories listed. First, a reasonable list of items for stdlib -- would be immediately applicable, and libraries already exist:

  1. Generic linked list and dictionary;
  2. String functions and perhaps even String type;
  3. High-level interface to I/O (Proposal for high level I/O #14)
  4. Interface to processes (POSIX)

Now, for a less reasonable wishlist, these are the things that I am particularly interested, but may or may not be fitting for stdlib:

  1. A parallel array (abstraction over coarrays);
  2. Reading and writing common image formats, ppm, tiff, jpeg, png (Reading and writing common image formats, ppm, tiff, jpeg, png #45);
  3. Interface to SQL (sqlite would cover most needs, @arjenmarkus has an interface);
  4. A web client and server, with a stack of common protocols (UDP, TCP, HTTP) (once you have items 1, 2, and 4, this becomes much easier to implement).

These latter items would take a longer journey, first through their own standalone libraries, and later could be evaluated for stdlib.

@certik
Copy link
Member Author

certik commented Dec 24, 2019

What would be the advantage / use case for 8.? Python does have a basic webserver in the standard library. But I am curious what application Fortran users would like to use a web server for.

@certik
Copy link
Member Author

certik commented Dec 24, 2019

It seems in the first phase, we should only include things for which we already have prior implementations, it's "just" about agreeing on the API, and making the implementation complete with regards to all combinations of real and integer kinds and other corner cases.

For things that there is no prior implementation, or the implementation is not straightforward, we should probably first have them in separate libraries, and only later consider inclusion into stdlib, as you said.

@milancurcic
Copy link
Member

A Fortran web client would allow Fortran programs to read data over the network. This is only becoming more useful as more and more data is stored in the cloud, typically in flat object stores. For example applications, weather (100% Fortran) and ocean (99% Fortran) prediction systems rely on external data that is downloaded periodically. In typical workflows, this is done in some other language or some external tool from shell, like wget or curl.

The kind of gluing of tools and languages that is ubiquitous to weather prediction systems (and other similar systems) is not for any other reason but that Fortran is adequate for heavy and parallel compute, but not much more of the workflow -- downloading and storing data, logging, databases etc. If Fortran was adequate for the other tasks, the whole system would be implemented in the same language.

This specific example can then be extended to any web service out there that is serving data via HTTP or similar protocol. You make a request, get a JSON dict from it. All of a sudden, Fortran programs have direct access to a zillion existing web services. Great!

A web server would provide similar, but reverse. If you'd like a Fortran app to serve data (whether it's logging data from an instrument or a parallel CFD solver) to web clients (any tool or programming language -- they all speak the same language), now you can. A single-user use case would be a long-running HPC application that logs progress through the web server. Now you can watch it from your browser, rather than going through your terminal, ssh, and tail the log file.

A web server + client is a convenient way to interoperate programs written in any language, as long as you properly match their HTTP APIs. Over a network or locally.

I think this is one of the things where it's not easy to imagine that this would be useful in Fortran, only because it hasn't been easy to do in Fortran, so nobody did it.

I agree that this is a task for a specialized library for the time being.

@certik
Copy link
Member Author

certik commented Dec 24, 2019

@milancurcic I see. Yes, I think you are right. Being able to create JSON based HTTP API would be very useful.

@certik
Copy link
Member Author

certik commented Dec 24, 2019

Let's keep iterating on this. How about:


In the first phase, we are trying to stay in pure Fortran and most of these items already have a prior Fortran implementation by various people, and our job is to agree with a wide community on the API. This is our initial scope:

  • Utilities (strings, files, OS/environment integration and interface to processes, unit testing & assertions, logging, high level interface to IO, ...)
  • Algorithms and containers (searching and sorting, merging, hash tables / dictionaries, ...)
  • Mathematics (linear algebra, sparse matrices, special functions, fast Fourier transform, random numbers, statistics, ordinary differential equations, numerical integration, optimization, ...)
  • Reading and writing images (PPM, ...)

The following items are potential features (they should start as separate projects and we can discuss later if they should be included):

  • A parallel array (abstraction over coarrays);
  • Reading and writing more common image formats beyond PPM: tiff, jpeg, png
  • Interface to SQL (sqlite would cover most needs)
  • A web client and server, with a stack of common protocols (UDP, TCP, HTTP)

Here are example items that are not in scope:

  • physics specific algorithms: electronic structure, fluid dynamics, ...
  • specific mathematical methods: finite difference support, finite element support (except sparse matrices, which are in scope, because they are not tied to a specific mathematical method, but apply to all kinds of fields, such as finite differences, finite volumes, graphs, etc.)
  • highly optimized specific algorithms that require tens of thousands of lines to implement, such as OpenBlas --- if the algorithm and infrastructure requires a project on its own, then stdlib can perhaps depend on it, but it should not be part of stdlib itself.

@jvdp1
Copy link
Member

jvdp1 commented Dec 24, 2019

  • Reading and writing images (PPM, ...)

Should it not be in utilities? Otherwise, I would create a specific item for I/O operations:
"I/O operations: files, high-level interface for IO, reading and writing PPM images"

physics specific algorithms: electronic structure, fluid dynamics, ...
I would write:
"field-specific (e.g., physics) algorithm:...",
Not everybody programming in Fortran works in physics ;)

For the rest, I think it is a good first general presentation.

@certik
Copy link
Member Author

certik commented Dec 24, 2019

Good point, let's put images into Utilities for now. It will be PPM only, so I think Utilities is a good fit for it now.

Yes, field-specific is better. Do you have some examples of non-physics fields that we can list there?

@milancurcic
Copy link
Member

specific mathematical methods: finite difference support

I'd argue that basic finite difference (analog to numpy.diff()) would be quite generally useful and in scope. Think of just calculating a first derivative of a time series or a gradient. I agree that this shouldn't cover all the fancy 17th order finite difference schemes, but a simple 1st order diff() would go a long way.

@certik
Copy link
Member Author

certik commented Dec 24, 2019

I'd argue that basic finite difference (analog to numpy.diff()) would be quite generally useful and in scope. Think of just calculating a first derivative of a time series or a gradient. I agree that this shouldn't cover all the fancy 17th order finite difference schemes, but a simple 1st order diff() would go a long way.

Yeah, I was thinking that too.

I just want to have some examples that are clearly out of scope, so that we have some guidelines to prevent growing stdlib into a huge bloated library doing everything.

@jvdp1
Copy link
Member

jvdp1 commented Dec 24, 2019

Yes, field-specific is better. Do you have some examples of non-physics fields that we can list there?

@certik A non-physics example: quantitative genetics

@arjenmarkus
Copy link
Member

Wrt Milan's point 7: the SQLite interface in my Flibs project (http://flibs.sf.org - yes, I should probably move it to Github ;)) could definitely use a "facelift", as the interface implementation predates the ISO C binding that now makes life so much easier.

@zbeekman
Copy link
Member

This can be a living document. I think all these ideas are good, and we shouldn't get too hung up on the details quite yet. Providing broad scopes, and some specific examples is certainly worthwhile, and I've liked everything I've seen here. 👏

@certik
Copy link
Member Author

certik commented Jan 2, 2020

Looks like we mostly agree. So I am going to merge it, as this is better to have at least some scope in the README than nothing. And we can iterate on this as we go.

@certik certik merged commit 1903066 into fortran-lang:master Jan 2, 2020
@certik certik deleted the scope branch January 2, 2020 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants