Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interface files to simplify .mli files in owl/dense #471

Merged
merged 11 commits into from
Dec 10, 2019

Conversation

tachukao
Copy link
Member

@tachukao tachukao commented Dec 8, 2019

  • Reduced .mli redundancy in owl/dense

Currently, there's a lot of redundancy in the .mli files in src/owl/dense. Adding a new ndarray function involves updating multiple .mli files. I've added two interface files (src/owl/owl_dense_ndarray_intf.ml and src/owl/owl_dense_matrix_intf.ml) to simplify this process.

In doing so, I've also discovered that we are exposing multiple functions that are not defined for complex ndarrays such as signum. This is likely a mistake due to copying and pasting the .mli files. The added interface files make sure that we separate signatures that are specific to real ndarrays from those that are specific to complex ndarrays.

I suspect something similar can be done for .mli files in src/owl/linalg and also other parts of the repo.

  • applied ocamlformat to files in src/owl/dense

TODO: finish adding docstrings to the interface files

@ryanrhymes ryanrhymes self-requested a review December 8, 2019 14:21
@ryanrhymes
Copy link
Member

It is better to separate ocamlformat as a separate PR. First it is easier for code reviewing, to focus on core code and structural changes rather than formatting changes. Second, it is better to gradually adopt ocamlformat.

Dense is core part and has many lines of code, I personally think the format changes are a bit too drastic at the moment, also impacts the documentation a lot. For structural changes, I will take a deeper look later.

@tachukao
Copy link
Member Author

tachukao commented Dec 8, 2019

I've reversed the effect of ocamlformat in the latest commits

@jzstark
Copy link
Collaborator

jzstark commented Dec 8, 2019

Agree. I think we should separate the change of format to another PR for ease of revision. Personally I don't find updating to multiple .mli files a big inconvenience, since that doesn't happen very frequently for now. I'm also not sure about if the interfaces of float ndarray should be the same with that of complex ndarray.

@tachukao
Copy link
Member Author

tachukao commented Dec 8, 2019

@jzstark agreed, I've reversed the effect of ocamlformat. As for the interface, I've made it such that you have a Base interface which is shared among both complex and real Ndarrays, while interface Real is specific to real Ndarrays and Complex is specific to complex Ndarrays.

@ryanrhymes ryanrhymes requested a review from mor1 December 8, 2019 15:46
Copy link
Member

@ryanrhymes ryanrhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes overall. The new strucutre generalises the interfaces of Dense module very well and reduces the codebase significantly. Similar method was actually already adopted in CGraph and some other modules in Owl, I think this can/should be applied to Linalg as well to simplify Owl's core.

This certainly poses some challenges in Owl's document building process, but in general I think pros are over cons. Especially with the new tutorial book and gradual evolution of Owl's API doc, the impacts can be mitigated.

I am currently pushing hard on architectural retrospection on Owl's core components. Owl repo itself has now over 180k LOC (incl. c, c++, ocaml, etc.). Other sublibs atop of Owl are also growing fast as well. We need to keep the core concise (improve maintainability) without losing flexibility and functionality. I think this PR is a good attempt.

I included more reviewers so we are open to different opinions, we shall also discuss more on this kind of architectural improvement. Actually architectural discussion is on the TODO list in our meeting next year in Groningen.

@ryanrhymes ryanrhymes added enhancement R&D Core research and development labels Dec 8, 2019
@ryanrhymes ryanrhymes added this to In Progress in owl development via automation Dec 8, 2019
@jzstark
Copy link
Collaborator

jzstark commented Dec 8, 2019

Thank Calvin. I think categorising existing ndarray signatures Base/Real/Complex/Distribution is quite neat. Besides the small concern that perhaps the base_ndarray should also change with it, I cannot think of a case where this change would lead to practical problems. The only thing is that there is this design issue about whether it is a proper time to do the generalisation/abstraction. I think it would be wise to hear Liang's opinion on the choice design.

@ryanrhymes
Copy link
Member

ryanrhymes commented Dec 8, 2019

The only thing is that there is this design issue about whether it is a proper time to do the generalisation/abstraction ...

Chatted with Jianxin a couple of minutes ago, he has some insightful comments. Both Base and Core have Dense module, and ideally they should be consistent in both functionality and design. Therefore, it may actually make (even more) sense to move *_intf.ml files to Base lib since they are pure OCaml anyway and both Core and Base can use them.

The issue is due to the limited recources, the Dense module in Base only implements a limited set of APIs comparing to its corresponding one in Core. Using these *_inft.ml file probably means we need to create a lot of dummy stubs and implement these functions in future. However, if we keep the Dense in Base as it is now, Dense in Base and Core will have inconsistent design. I am myself
not 100% sure whether it is a big issue or not.

@tachukao
Copy link
Member Author

tachukao commented Dec 8, 2019

The issue is due to the limited recources, the Dense module in Base only implements a limited set of APIs comparing to its corresponding one in Core. Using these *_inft.ml file probably means we need to create a lot of dummy stubs and implement these functions in future

Yea I thought about implementing the *_intf.ml files in src/base/dense, but gave up on the idea in the end precisely because of this. Adding dummy stubs is a bit tedious, but I guess it's also a good way of documenting todo...but yea it's going to be quite a lot of work...

@mseri
Copy link
Member

mseri commented Dec 8, 2019

This is a welcome change! I have some comments and will not be able to write them down and do a careful review before tomorrow or Tuesday morning. It would be great if you can wait until then before the merge

@ryanrhymes
Copy link
Member

This is a welcome change! I have some comments and will not be able to write them down and do a careful review before tomorrow or Tuesday morning. It would be great if you can wait until then before the merge

A sure thing, this is not super urgent, merge can wait.

@ryanrhymes
Copy link
Member

The issue is due to the limited recources, the Dense module in Base only implements a limited set of APIs comparing to its corresponding one in Core. Using these *_inft.ml file probably means we need to create a lot of dummy stubs and implement these functions in future

Yea I thought about implementing the *_intf.ml files in src/base/dense, but gave up on the idea in the end precisely because of this. Adding dummy stubs is a bit tedious, but I guess it's also a good way of documenting todo...but yea it's going to be quite a lot of work...

No problem, I am happy with current PR. We can create another PR for migrating this to Base. Yes there will be some tedious work to make dummy stubs in Base to make it consistent with the new design. But I am myself happy to take that task :D

@tachukao
Copy link
Member Author

In the latest commit, I did the same thing for owl/linalg as discussed above

@mseri
Copy link
Member

mseri commented Dec 10, 2019

I have had a look at the code. Thanks again for the cleanup.

My only remaining comment is that one of the most important features of Owl is that it reduces by a great margin the failures to compile time, having the full interface in Base with lots of not implemented stubs will, imho, make it much worse.

I think a middle ground would be the best: having a base interface as large as possible with the implemented functions, and a core interface (exposed only in owl-core) with what is missing. Then we can slowly work to reduce the core interface and extend the base one, without breaking existing code and making sure that whatever ends up in base, ends up there with an implementation.

What do you think about it?

@jzstark
Copy link
Collaborator

jzstark commented Dec 10, 2019

Thanks for the update Calvin! I think in the end we should move the signatures to Base and get unified between core and base, and Marcello's incremental solution seems quite nice. E.g in Linear algebra module, a lot of routines are to be implemented, and there is also some mismatch between base ndarray and core ndarray, such as the to_array function, so updating the signature one by one looks like a good choice, though I'm not quite clear et how such a transitional solution should be implemented and cope with existing base signatures.

Btw, sorry that I forgot to mention it, but here is the style suggestion I asked about making space between one-line functions in ocamlformat, and it turns out that adding module-item-spacing=sparse in the configuration file would work nicely.

@mseri
Copy link
Member

mseri commented Dec 10, 2019

though I'm not quite clear et how such a transitional solution should be implemented and cope with existing base signatures.

Core signature will include base siugnatures, so the transition will simply require take us moving the definitions of the signature from one interface to the other and add the necessary definitions in base ml files.

@mseri
Copy link
Member

mseri commented Dec 10, 2019

and there is also some mismatch between base ndarray and core ndarray, such as the to_array function

Indeed, these are annoying and will be fixed once the signatures are unified

@mseri
Copy link
Member

mseri commented Dec 10, 2019

Btw, sorry that I forgot to mention it, but here is the style suggestion I asked about making space between one-line functions in ocamlformat, and it turns out that adding module-item-spacing=sparse in the configuration file would work nicely.

🎉

@tachukao
Copy link
Member Author

@mseri I like that solution. Should we merge this PR for now and I can get cracking on migrating the interface to base? Or do you think it's best to do it all in this PR?

@jzstark RE ocamlformat. That's great! perhaps we should leave the formatting to a second PR as well?

@mseri
Copy link
Member

mseri commented Dec 10, 2019

@mseri I like that solution. Should we merge this PR for now and I can get cracking on migrating the interface to base? Or do you think it's best to do it all in this PR?

This seems a good idea to me. Doing it all in this PR would be overkill

@ryanrhymes
Copy link
Member

module-item-spacing=sparse is already in the ocamlformat, added a couple of days ago. :)

@ryanrhymes ryanrhymes merged commit 0fdf627 into owlbarn:master Dec 10, 2019
@ryanrhymes
Copy link
Member

fyi, the PR reduces about 2.1K LOC, great job!

@ryanrhymes ryanrhymes moved this from In Progress to Done (2019) in owl development Dec 19, 2019
mseri added a commit to mseri/opam-repository that referenced this pull request Feb 25, 2020
CHANGES:

*  Fix bug in _squeeze_broadcast (owlbarn/owl#503)
*  Added the Dawson function (Ndarray + Matrix + Algodiff op) (owlbarn/owl#502)
*  Fix bug in reverse mode gradients of aiso operations and pow (owlbarn/owl#501)
*  Added poisson_rvs to Owl_distribution (owlbarn/owl#499)
*  Draw poisson RVs in Ndarray and Mat modules (owlbarn/owl#498)
*  Broadcast bug for higher order derivatives (owlbarn/owl#495)
*  add sem to dense ndarray and matrix (owlbarn/owl#497)
*  Avoid input duplication with Graph.model and multi-input nn (owlbarn/owl#494)
*  Added Graph.get_subnetwork for constructing subnetworks (owlbarn/owl#491)
*  Make Graph.inputs give unique names to inputs (owlbarn/owl#493)
*  modify nlp interfaces
*  Re-add removed DiffSharp acknowledgment (owlbarn/owl#486)
*  add pretty printer for hypothesis type
*  update lambda neuron (owlbarn/owl#485)
*  fix example due to owlbarn/owl#476
*  Extend base linalg functions to complex numbers (owlbarn/owl#479)
*  [breaking] use a separate module for algodiff instead of ndarray directly (owlbarn/owl#476)
*  temp workaround and unittest (owlbarn/owl#478)
*  [breaking] Interface files for base/dense and base/linalg (owlbarn/owl#472)
*  Port code to dune2 (owlbarn/owl#474)
*  [breaking]  interface files to simplify .mli files in owl/dense (owlbarn/owl#471)
*  Save and load Npy files (owlbarn/owl#470)
*  Owl: relax bounds on base and stdio (owlbarn/owl#469)
*  Merged tests for different AD operations into one big test + autoformat tests with ocamlformat (owlbarn/owl#468)

### 0.7.2 (2019-12-06)

* fourth order finite diff approx to grad
* changes to improve stability of sylv/discrete_lyap
* fix bug in concatenate function
* add mli for owl_base_linalg_generic
* Owl-base linalg routines: LU decomposition  (owlbarn/owl#465)
* bug fixes
* Update owl.opam

### 0.7.1 (2019-11-27)

* Add unit basis
* Fix issue owlbarn/owl#337 and owlbarn/owl#457 (owlbarn/owl#458)
* owl-base: drop seemingly unnecessary dependency on integers (owlbarn/owl#456)

### 0.7.0 (2019-11-14)

* Add unsafe network save (owlbarn/owl#429)
* Sketch Count-Min and Heavy-Hitters
* Various bugfixes
* Owl_io.marshal_to_file: use to_channel
* Do not create .owl folder when loading owl library
* Re-design of exceptions and replace asserts with verify
* Add OWL_DISABLE_LAPACKE_LINKING_FLAG
* Reorganise Algodiff module
* Add parameter support to Zoo
* Two new features in algodiff: eye and linsolve (triangular option) + improved stability of qr and chol
* Implemented solve triangular
* Added linsolve and lq reverse-mode differentiation
* Fix build on archlinux (pkg-config cblas)
* Add median and sort along in ndarray
* Improve stability of lyapunov gradient tests

### 0.6.0 (2019-07-17)

* Add unsafe network save (owlbarn/owl#429)
* Sketch Count-Min and Heavy-Hitters
* Various ugfixes
* Owl_io.marshal_to_file: use to_channel
* Do not create .owl folder when loading owl library
* Re-design of exceptions and replace asserts with verify
* Add OWL_DISABLE_LAPACKE_LINKING_FLAG
* Reorganise Algodiff module
* Add parameter support to Zoo
* Two new features in algodiff: eye and linsolve (triangular option) + improved stability of qr and chol
* Implemented solve triangular
* Added linsolve and lq reverse-mode differentiation
* Fix build on archlinux (pkg-config cblas)
* Add median and sort along in ndarray
* Improve stability of lyapunov gradient tests

### 0.5.0 (2019-03-05)

* Improve building and installation.
* Fix bugs and improve performance.
* Add more functions to Algodiff.
* Split plot module out as sub library.
* Split Tfgraph module out as sub library.

### 0.4.2 (2018-11-10)

* Optimise computation graph module.
* Add some core math functions.
* Fix bugs and improve performance.

### 0.4.1 (2018-11-01)

* Improve the APIs of Dataframe module.
* Add more functions in Utils module.

### 0.4.0 (2018-08-08)

* Fix some bugs and improve performance.
* Introduce computation graph into the functor stack.
* Optimise repeat and tile function in the core.
* Adjust the OpenCL library according to computation graph.
* Improve the API of Dataframe module.
* Add more implementation of convolution operations.
* Add dilated convolution functions.
* Add transposed convolution functions.
* Add more neurons into the Neural module.
* Add more unit tests for core functions.
* Move from `jbuilder` to `dune`
* Assuage many warnings

### 0.3.8 (2018-05-22)

* Add initial support for dataframe functionality.
* Add IO module for Owl's specific file operations.
* Add more helper functions in Array module in Base.
* Add core functions such as one_hot, slide, and etc.
* Fix normalisation neuron in neural network module.
* Fix building, installation, and publishing on OPAM.
* Fix broadcasting issue in Algodiff module.
* Support negative axises in some ndarray functions.
* Add more statistical distribution functions.
* Add another higher level wrapper for CBLAS module.

### 0.3.7 (2018-04-25)

* Fix some bugs and improve performance.
* Fix some docker files for automatic image building.
* Move more pure OCaml implementation to base library.
* Add a new math module to support complex numbers.
* Improve the configuration and building system.
* Improve the automatic documentation building system.
* Change template code into C header files.
* Add initial support for OpenMP with evaluation.
* Tidy up packaging using TOPKG.

### 0.3.6 (2018-03-22)

* Fix some bugs and improve performance.
* Add more unit tests for Ndarray module.
* Add version control in Zoo gist system.
* Add tensor contract operations in Ndarray.
* Add more documentation of various functions.
* Add support of complex numbers of convolution and pooling functions.
* Enhance Owl's own Array submodule in Utils module.
* Fix pretty printer for rank 0 ndarrays.
* Add functions to iterate slices in an ndarray.
* Adjust the structure of OpenCL module.

### 0.3.5 (2018-03-12)

* Add functions for numerical integration.
* Add functions for interoperation.
* Add several root-finding algorithms.
* Introduce Owl's exception module.
* Add more functions in Linalg module.
* Add native convolution function in core.
* Remove Eigen dependency of dense data structure.
* Fix some bugs and improve performance.

### 0.3.4 (2018-02-26)

* Update README, ACKNOWLEDGEMENT, and etc.
* Initial implementation of OpenCL Context module.
* Fix some bugs and improve the performance.
* Add Adam learning rate algorithm in Optimise module.
* Add a number of statistical functions into Stats.
* Enhance View functor and add more functions.
* Include and documentation of NLP modules.
* Add dockerfile for various linux distributions.

### 0.3.3 (2018-02-12)

* Fix some bugs and improve the performance.
* Integrate with Owl's documentation system.
* Add native C implementation of pooling operations.
* Add more operators in Operator module.
* Add more functions in Linalg module.
* Optimise the Base library.
* Add more unit tests.

### 0.3.2 (2018-02-08)

* Fix some bugs and improve the performance.
* Functorise many unit tests and add more tests.
* Rewrite the documentation migrate to Sphinx system.
* Migrate many pure OCaml code into Base library.
* Implement the initial version of Base library.

### 0.3.1 (2018-01-25)

* Design View module as an experimental module for Ndarray.
* Include Mersenne Twister (SFMT) to generate random numbers.
* Implement random number generator of various distributions.
* Implement native functions for maths and stats module.
* Include FFTPACK to provide native support for FFT functions.
* Minimise dependency, remove dependencies on Gsl and etc.
* Implement slicing and indexing as native C functions.
* Use new extended indexing operators for slicing functions.
* Refine ndarray fold function and introduce scan function.
* Reorganise the module structure in the source tree.
* Fix some bugs and enhance the performance of core functions.
* Add another 200+ unit tests.

### 0.3.0 (2017-12-05)

* Migrate to jbuilder building system.
* Unify Dense Ndarray and Matrix types.
* Split Toplevel out as a separate library.
* Redesign Zoo system for recursive importing.
* Simplify the module signature for Ndarray.
* Introduce functions in Ndarray module to support in-place modification.
* Introduce reduction functions to reduce an ndarray to a scalar value.
* Add Lazy functor to support lazy evaluation, dataflow, and incremental computing.
* Implement a new and more powerful pretty printer to support both ndarray and matrix.
* Fix bugs in the core module, improve the performance.

### 0.2.8 (2017-09-02)

* New Linalg module is implemented.
* New neural network module supports both single and double precision.
* New Optimise and Regression module is built atop of Algodiff.
* Experimental Zoo system is introduced as a separate library.
* Enhance core functions and fix some bugs.

### 0.2.7 (2017-07-11)

* Enhance basic math operations for complex numbers.
* Redesign Plot module and add more plotting functions.
* Add more hypothesis test functions in Owl.Stats module.
* Support both numerical and algorithmic differentiation in Algodiff.
* Support both matrices and n-dimensional arrays in Algodiff module.
* Support interoperation of different number types in Ext and new operators.
* Support more flexible slicing, tile, repeat, and concatenate functions.
* Support n-dimensional array of any types in Dense.Ndarray.Any module.
* Support simple feedforward and convolutional neural networks.
* Support experimental distributed and parallel computing.

### 0.2.0 (2017-01-20)

* Support both dense and sparse matrices.
* Support both dense and sparse n-dimensional arrays.
* Support both real and complex numbers.
* Support both single and double precisions.
* Add more vectorised operation: sin, cos, ceil, and etc.
* Add basic unit test framework for Owl.
* Add a couple of Topic modelling algorithms.

### 0.1.0 (2016-11-09)

* Initial architecture of Owl library.
* Basic support for double precision real dense matrices.
* Basic linear functions for dense matrices.
* Basic support for plotting functions.
* SI, MKS, CGS, and CGSM metric system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement R&D Core research and development
Projects
No open projects
owl development
  
Done (2019)
Development

Successfully merging this pull request may close these issues.

None yet

4 participants