Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Are we scientists yet? #77

Open
mratsim opened this issue Nov 17, 2017 · 32 comments
Open

[Meta] Are we scientists yet? #77

mratsim opened this issue Nov 17, 2017 · 32 comments

Comments

@mratsim
Copy link
Collaborator

mratsim commented Nov 17, 2017

This is a meta-issue to keep track of discussion around Nim scientific libraries.

Primitive libraries

Decimal128: https://github.com/JohnAD/decimal128
Fixed-point: https://gitlab.com/lbartoletti/fpn

Multidimensional arrays, Linear-algebra

Multidimensional arrays are the basic block of scientific computing, it goes beyond the 2D or 3D vectors and matrices. Notable non-Nim implementations include Fortran, Julia, Matlab and Numpy.

Status: in-progress
Libraries:

Support

Arraymancer supports dense multidimensional arrays of any type, on CPU (integers, floats, complex), Cuda and OpenCL (float only) and uses BLAS, CuBLAS and Clblast under the hood.

Flambeau is provide libtorch bindings and reproduces PyTorch functionality.

Manu is a pure Nim matrix library with no external dependencies

Neo supports dense and sparse float vectors and matrices, on CPU and Cuda (Nvidia GPUs) and also uses BLAS and LAPACK under the hood.

Status: stalled
Libraries:

NimTorch supports most PyTorch features regarding multidimensional arrays, on CPU, Cuda, OpenCL and AMD ROCm provided you compiled PyTorch's Aten backend with the relevant features.

Plotting

Data analysis requires plotting, notable non-Nim implementations include Python matplotlib and seaborn, Plot.ly (Python, R, Javascript), R ggplot2, Matlab and Facebook Visdom (a simple interface to Plot.ly).

Note that there are a couple of approach to plotting, either having a charting library or having a high-level grammar library (similar to SQL) that hides low-level details of a chart.

Status: in-progress
Libraries:

Proof-of-concepts:

Unmaintained:

  • arraymancer-vision has a very simple interface to Facebook's Visdom here.

ggplotnim is an implementation in pure Nim of the graphics of grammar.
gnuplot.nim is a wrapper of gnuplot.
Nim-Plotly uses the plot.ly charting library as a backend. Both MetaPlot and Monocle uses the Vega visualization grammar.

Image processing library

Computer vision is a thriving area of research. Vision scientists needs algorithms that works on images represented as a multidimensional arrays (different from say Photoshop), preferably multithreaded and GPU accelerated.

Notable non-Nim libraries include OpenCV, Matlab, Python scikit-image, scipy.ndimage and mahotas.

Status: in-progress

Libraries:

Unmaintained:

Nim-opencv provides rough low-level bindings of OpenCV functions.

Dataframe and columnar/tabular data processing

Dataframes are essential to process structured data (say Name, Age, number of products bought, last time of visit). They allow very efficient data manipulation, including easily creating new columns, joining dataframes, converting between types.

Notable non-Nim packages include Python Pandas and R datatable. When data does not fit in RAM, dataframe packages are interfaced with SQL or HDF5 datastores or even Spark for very large scale processing.

Status: in-progress
Libaries:

  • NimData provides dataframe facilities to Nim

Random library

Lots of scientific algorithms rely on stochastic processes or random distribution.
At the very least pseudo-random generator that samples from a normal/gaussian distribution is needed.

Notable non-Nim library include Scipy

Status: in-progress
Libraries:

Statistics library

Notable language: R

Status: standard lib statistics module

Machine learning

Machine learning is how to teach a computer to learn/generalize patterns from data.

Notable non-Nim libraries include: Python's Scikit-Learn and R's Caret.
State-of-the-art C++ library to wrap: XGBoost

Status: in-progress

Deep learning & neural network.

Deep learning is machine learning with neural networks and arguably eating the world (or atleast Reddit, Hacker News and sponsors). In comparison to most traditional machine learning tools, neural networks can also learn very well from non-structured data (images, sounds, text ...).

Notable non-Nim libraries include: Facebook Torch, Google Tensorflow, Apache and Amazon Mxnet

Status: in-progress
Libraries:

Proof-of-concept:

  • Neurotic was a proof of concept to build simple neural network on Neo/linalg

Non-linear optimization

Status: in-progress
Libraries:

  • MPFIT (Non-Linear Least squares fitting)
  • NLOPT, wrapper for the nlopt library

Linear programming

Status: in-progress
Libraries:

  • nim-isl, wrapper for the ISL parametric integer linear programming library

Computational Physics

Status: in-progress
Libraries:

Geometry

Computational geometry also require tuned algorithms for: geometry primitives, polygons and polyhedron, triangulations, distances, shape analysis ...

Noteable non-Nim library: CGAL

Status: no library

Scientific serialization format

There are many formats specific to science ot even science domains.

Libraries:

  • nim-hdf5, wrapper for the HDF5 data format

Geospatial library

Often scientist needs to deal with geospatial coordinate (latitude, longitude), maps and distances.
This include efficient data-structures like KD-Tree or RTree to compute distances between points and distance formulas like Haversine to compute distance on a sphere.

Notable non-Nim libraries include Python's scipy.spatial, Geopy, Shapely

Status: in-progress
R-tree forum thread.

Proof-of-concepts:

  • GDAL wrapper (Geospatial Data Abstraction Library)

Scientific language bindings

Python:

Unmaintained

@mratsim
Copy link
Collaborator Author

mratsim commented Nov 17, 2017

Placeholder.

To avoid polluting this meta-thread with specific discussion on certain topics (say what I want in the random library), this will link to the discussion topics:

Multidimensional arrays, Linear-algebra

#14, #17, #25, #50, #59

Plotting

#17, #51, #70

Geospatial

#13, #69

Image processing

#69

Dataframes, columnar/tabular data processing

#20, #47, #33

Random

#40

Statistics

#16

Machine learning

#48

Deep learning

No issue open

Computational Geometry

#53

@andreaferretti
Copy link

For sampling from other distributions, there is Alea. I have to clean it up - some examples fail with the latest concept changes in devel - but I hope to make these work again soon

@dom96
Copy link
Contributor

dom96 commented Nov 17, 2017

This almost makes me want to buy arewescientistsyet.org ala http://www.arewewebyet.org/. Perhaps you'd be interesting in creating something like this? :)

@mratsim mratsim changed the title Are we scientists yet? [Meta] Are we scientists yet? Nov 17, 2017
@sdwfrost
Copy link

I would also add in differential equation solvers as well as Markov chain Monte Carlo samplers...

@Vindaar
Copy link

Vindaar commented Jan 30, 2018

Over the last 2 months I've been working on high level bindings to the HDF5 library:

https://github.com/Vindaar/nimhdf5

It's still very much work in progress (also due to my limited knowledge of Nim and the more low level parts of HDF5).
As a raw wrapper it should be fully functional, with the downside of the (imo not very intuitive) C API. But the high level bindings are improving slowly. There's an example (examples/h5_create_dataset_hl.nim) showing the available features.

@narimiran
Copy link
Member

@EelcoHoogendoorn
Copy link

By far the most important category is missing from this list I feel; and that is first-class two way python bindings.

The ability of python to easily (relatively, for the time) interface with the then-dominant languages was pivotal in its adoption in scientific computing.

Id use a ton of nim from python right away if there was a clean, boiler plate free method of sending ndarrays back and forth between the two. Last time I checked there was not, and as much as i like nim I dont see it replacing my entire python ecosystem any day soon.

In particular, I would much rather use nim than cython or numba or any such half-baked language. Boost-python has the bindings figured out pretty well but then again I can rarely justify having to deal with C++.

But a system of bindings with the convenience of boost-python but without the C++ would massively expand the usability of nim for my (and I think its not just me) scientific programmers.

Also, starting out a project in nim would be a much better proposition if i had the reassurance I could always pop up a matplotlib debug figure without any hassle.

@andreaferretti
Copy link

@EelcoHoogendoorn there are a few projects.

  • nim-pymod is not mantained and a little cumbersome in that it requires its own scripts to build, but it allows to send ndarrays back and forth
  • nimpy looks more actively mantained but I am not sure whether it supports Numpy types
  • python3 seems to be another one, but I am not sure of its status

None of these projects is fully mature at this point, but this is definitely something doable

@EelcoHoogendoorn
Copy link

EelcoHoogendoorn commented May 2, 2018 via email

@brentp
Copy link

brentp commented May 17, 2018

I think most active nim users are aware of this by now, but there's a functioning plotting library here: https://github.com/brentp/nim-plotly

since it serializes to json and uses plotly.js to plot (but it works for the C backend), it will have a limited number of points, but when using webGL it can plot ~200K points in my browser and still be tolerably responsive.

@EelcoHoogendoorn
Copy link

Hi brentp;

Thats looking pretty cool indeed! Note that I am not trying to take a jab at plotting in nim specifically, but trying to make a point about the relative size of the ecosystem of python and nim generally; plotting is just an example.

I think itd be foolish to expect nim to be able to compete with python anytime soon on that front; making sure we have first-class two-way interop between the two sounds like it might happen a decade sooner at least.

@Vindaar
Copy link

Vindaar commented Jun 20, 2018

And finally we can do non-linear least square fitting in Nim :)

https://github.com/Vindaar/nim-mpfit

@Vindaar
Copy link

Vindaar commented Jul 2, 2018

Finally spent some time to make the interface for my NLopt wrapper nicer and create a PR for nimble for it.
So if non-linear least square fitting isn't for you, maybe general nonlinear optimization is. ;)

https://github.com/Vindaar/nimnlopt

@abudden
Copy link

abudden commented Aug 13, 2018

For some precision engineering/scientific applications, the ability to use arbitrary precision floating point arithmetic would be useful. Does an MPFR wrapper a la Julia's built-in support for BigFloat belong on this list?

@Araq
Copy link
Member

Araq commented Aug 13, 2018

@abudden Certainly.

@retsyo
Copy link

retsyo commented Aug 31, 2018

it seems that there is still no computer algebra system module like https://www.sympy.org/. I also made a post https://forum.nim-lang.org/t/4165

@brentp
Copy link

brentp commented Sep 11, 2018

a decent stats package would be a huge boon for my work. Even if it started with t-test and anova.

@sinkingsugar
Copy link

https://github.com/fragcolor-xyz/nimtorch

Full pytorch for nim, for you.

@ihendley
Copy link

Do we want a category for natural language processing? Examples of Python libraries are nltk, gensim, spacy, and scikit-learn.

@ihendley
Copy link

Also, how about mathematical optimization - like scipy.optimize for example, and how about signal processing - like scipy.signal?

@Araq
Copy link
Member

Araq commented Mar 22, 2019

@ihendley I think so, yes.

@mantielero
Copy link

mantielero commented Jun 29, 2019

Simulation

What about simulation? Something like simulink, modelica or Modia (in Julia).

It would be nice something similar to Modia in particular, given Nim's metaprogramming capabilities.

One area where I believe nim could shine is in exporting FMU model (following the FMI standard). I don't see python doing that. An even for Julia is a struggle because they need to export the runtime for compiled stuff which is big and not straightforward (here you can see how the libraries take above 100Mb for a simple example, when compiled ahead of time).

Relevant links

FMI Code Generator
FMU SDK
Sundials: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers in order to embed the solver in the FMU. Bindings for this would be useful even on itself.
SimulatorToFMU

@mratsim
Copy link
Collaborator Author

mratsim commented Jun 30, 2019

It's been a while since I updated the original post but it's done :)

@brentp
Copy link

brentp commented Oct 25, 2019

having a (nearly?) fully functional jupyter kernel would be quite useful for my work and, I suspect for many people.

@Vindaar
Copy link

Vindaar commented Oct 25, 2019

having a (nearly?) fully functional jupyter kernel would be quite useful for my work and, I suspect for many people.

@brentp: There is (or was) jupyternim: https://github.com/stisa/jupyternim
I'm not sure if it's abandoned and/or still compiles (last activity Oct 2018); I have never used it. Its downside is that it was written without hot code reloading in mind of course. However, I think it'd provide a nice basis for an updated implementation, which uses HCR for the relevant parts and the socket communication of jupyternim.

I once started playing around with HCR, but wasn't very successful even implementing a trivial repl, https://github.com/vindaar/brokenrepl. Posting it here if anyone wants to give it a try.

@brentp
Copy link

brentp commented Oct 25, 2019

yes, I saw that and inim from @stisa, now that there are ggplots and dataframes, the notebook would a be a boon.

@stisa
Copy link

stisa commented Oct 25, 2019

(my) jupyternim and inim are the same code, there was a naming conflict with https://github.com/AndreiRegiani/INim so I renamed it. I agree it's due an update, but I have been pretty busy this year.
Last time I saw, HCR was limited to JS target, looking at https://nim-lang.org/docs/hcr.html there was a lot of progress so I may have a look into adopting it when I get some free time, if nobody starts working on it first.

@jblindsay
Copy link

I've just published a pure Nim k-d tree implementation here.

@Vindaar
Copy link

Vindaar commented Apr 24, 2020

@mratsim, @brentp, @HugoGranstrom and me chatted recently about trying to unify the science related code a little more. While we didn't decide anything specific yet, we talked about creating an organization to hold related repositories in the future:

https://github.com/SciNim

I only invited a few people that from the top of my head use Nim for science related stuff. If you want join, feel free to message me or just join the gitter channel here:

https://gitter.im/SciNim/community

and say hi.

@mantielero
Copy link

mantielero commented Apr 24, 2020

I played during easter about creating a web based on Hugo for this purpose. I am happy to provide it to you.

I have uploaded it here:
https://mantielero.github.io/nim4science/

Feel free to use it.

@lbartoletti
Copy link

I've just released a pure Nim fixed point number library here

I started working on a geometry (mainly focus on GIS and CAD) library, but it is not yet presentable :)

@planetis-m
Copy link

My linear algebra package: https://github.com/planetis-m/manu is still in development and I am happy accept contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests