Free (GPL) Common Lisp data analysis library with emphasis on modularity and conceptual clarity.
Common Lisp Other
Latest commit 1d574c6 Mar 13, 2017 @ghollisjr More todo
Permalink
Failed to load latest commit information.
DOCUMENTATION Using script instead of unbuffer due to sync issues Nov 7, 2015
binary-tree Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
calculus Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
clos-utils Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
csv-table Updated table field names Jun 7, 2015
emacs-lisp Exported useful functions Oct 22, 2016
error-propogation Updated error-propogation so that err-nums use the #~(...) reader Mar 14, 2016
file-utils Added write-lines utilities Apr 1, 2015
fitting Fixed bug with fitting exposed by new SBCL Jul 7, 2016
functional-utils Changed iterate so that there is a list-collecting version and a Dec 28, 2015
generic-math Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
gnuplot-interface Fixed lag in safe-io plotting by using buffered input, plotting is Mar 1, 2016
gsl-cffi Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
hash-table-utils Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
hdf-cffi Finally fixed HDF5 cleanup, this solved the file limit issues as well Sep 25, 2015
hdf-table Finally fixed HDF5 cleanup, this solved the file limit issues as well Sep 25, 2015
hdf-typespec Finally fixed HDF5 cleanup, this solved the file limit issues as well Sep 25, 2015
hdf-utils CRITICAL BUG FOUND May 24, 2016
histogram Added iterate function to functional-utils, fixed clash with iterate Dec 27, 2015
int-char Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
linear-algebra Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
list-utils Added partition function Sep 27, 2016
lorentz Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
macro-utils Fix to plotting, added new macro Nov 23, 2015
makeres-block Updated makeres-block test Jul 15, 2015
makeres-branch New branchtrans scheme seems to work Mar 4, 2016
makeres-graphviz Bug update Aug 10, 2015
makeres-macro Updated comment and todo Mar 21, 2016
makeres-progress Fixed a few copyright comments and the defsystem for cl-ana.asd Apr 21, 2015
makeres-table Updated handling of final targets, updated reusable tables, tested Feb 2, 2017
makeres-utils Added makeres-utils system, first utility is fitting May 15, 2016
makeres Updated handling of final targets, updated reusable tables, tested Feb 2, 2017
map Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
math-functions Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
memoization Added copy-list when using mapcan for safety, added safety check for Dec 3, 2016
ntuple-table New additions and tests Jun 8, 2015
package-utils Fixed asd duplicated and erroneous files Dec 30, 2014
pathname-utils Fixed string load-object and save-object, added basename function Nov 14, 2015
plotting Fixed tics handling Nov 16, 2016
quantity Removed defunct comments Mar 22, 2016
reusable-table Updated handling of final targets, updated reusable tables, tested Feb 2, 2017
serialization Fixed bug in read-histogram Jul 27, 2015
statistics Janitorial work Oct 30, 2015
string-utils Updated quantity so that reader-macro supports derived units Mar 14, 2016
symbol-utils Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
table-utils Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
table-viewing Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
table New additions and tests Jun 8, 2015
tensor Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
testing Added loading test script Sep 24, 2015
tutorials/dop Updated todo, fixed hdf-opener bug and added note on potential bug Mar 5, 2016
typed-table Renamed packages and systems to use the cl-ana prefix, also janitorial Dec 30, 2014
typespec Fixed multiple bugs Sep 5, 2015
wiki/dop-large-example/plots Added plot to tutorial Mar 5, 2016
.gitignore Updated ignore to allow test data Jun 10, 2015
BUGS Updated note on upstream problems Oct 15, 2016
COPYING initial commit Sep 7, 2014
README Using script instead of unbuffer due to sync issues Nov 7, 2015
TODO More todo Mar 13, 2017
cl-ana.asd Added makeres-utils system, first utility is fitting May 15, 2016
gpl_comment.txt Updated GPL comment, started work on large-example code Mar 2, 2016
install.lisp Added install file and worked on documentation Sep 9, 2015
package.lisp Added makeres-utils system, first utility is fitting May 15, 2016

README

cl-ana is a free (GPL) library of Common Lisp code for doing data
analysis via either straightforward programming or dependency oriented
programming.  It aims to be a general purpose framework for analyzing
small and large scale datasets, including binned data analysis and
visualization.  Much effort has been made to ensure modularity so that
individual components may be used/re-used for a new purpose.

cl-ana is available via quicklisp (http://www.quicklisp.org/beta/);
for other dependencies see below.

Example code for using some of the functionality is contained in
various test.lisp files throughout the project; the full documentation
is located on the wiki page: http://github.com/ghollisjr/cl-ana/wiki

Whenever possible, features are implemented via generic functions so
that users can extend cl-ana to whatever they want to do.

The functionality of this framework is divided into two layers.  The
lower layer provides basic libraries for the following:

* Tabulated data: Supports data tables read-from and written-to HDF5
  files (buffered read-write), ntuples (like CERN's PAW uses), comma
  separated value (CSV) files, and plists for all-in-memory operation.
  Adding a new table type is as easy as extending the table class and
  defining 4 functions for the table type.  (The libraries cl-csv and
  GSLL provide the backbone for the CSV and ntuple tables; the HDF5
  table access is completely new.)

* Histograms: Supports categorical, contiguous, and sparse histograms
  of arbitrary dimensions.  Provides functional access to histograms
  via mapping (which allows reducing) and filtering.

* Nonlinear least squares fitting: Allows plain-old lisp functions to
  be fitted to data using the GNU Scientific Library (GSL); infers the
  number of fit parameters the function takes from the initial
  parameter guess.  Can fit against alists of data & histograms and is
  easily extended to allow fitting against other types by defining a
  single function for the new type.

* Plotting: Uses gnuplot to plot histograms, data samples, plain-old
  lisp functions, and strings interpreted as formulae.

* Generic math: Common Lisp doesn't provide user-extendable math
  functions; cl-ana provides its own versions of the basic math
  functions CL gives you but with the ability to extend them for
  whatever types you want.  Also provides use-gmath which easily adds
  generic-math's symbols to a package even if you already use the
  common-lisp package.  Already provided are extensions to the generic
  math functions for error propogation, quantities (values with
  units), and treating CL sequences as tensors with all the usual math
  functions being applied element-by-element in a MATLAB/GNU Octave
  fashion.

The higher layer provides dependency oriented programming.  Dependency
oriented programming is my own term for defining your program in terms
of targets needing execution as opposed to an explicit computation.
It is a hybrid of imperative and declarative programming.  The target
table can be transformed to allow for optimizations.  Provided
optimizations include table pass merge and collapse which minimize the
number of passes over source datasets.

Also included are various utilities which have use in a variety of
places.

The main principles of the project are:

1. Conceptual clarity and documentation.  These are often neglected in
   software development, to the point where reading code can cause one
   to drink.  Conceptual clarity refers to the way in which code is
   written and the way in which algorithms are implemented: A slightly
   slower but easier to understand implementation is favored above a
   clusterfuck of bit shifts.  Documentation should always be provided
   for any feature along with example usages--ESPECIALLY with example
   usages, as these are sometimes more helpful than the actual
   documentation.

2. Modularity/Bottom-up design.  Whenever two components have a common
   feature/function/dependency, this commonality should be placed in a
   separate sublibrary.  To limit sublibrary number explosion, this
   should be done in conjunction with point 1 preserving conceptual
   clarity.  For example list utilities should be a sublibrary for
   general purpose list functions.  Further: If a feature can be
   provided by either a set of utility functions or a type heirarchy,
   strong preference should be given to the utility functions
   approach; i.e. one should have to argue long and hard before
   stratifying things into classes.

3. Lispyness.  Whenever possible, already established motifs from Lisp
   programming practices should be used.  This goes for naming
   conventions, access macros, and the general desire to provide at
   least functional access to things.

Each sublibrary should go in its own directory and come with its own
.asdf file so that one can choose any subset of functionality to use
from the library.

As you will see in reading the code, I've tried to keep everything
well documented.  I place a high emphasis on documentation since I
know how easy it is to fall out of practice.  The last thing I want is
for the usual cargo-cult around old code to emerge.

Disclaimer: much of the code I've written has been part of my own
personal development as a Lisp programmer; this is my first
non-trivial project with Lisp, and coming from a C++ background I've
had to learn quite a few things along the way.  This means that there
may be some dark corners of the code which need help from more
experienced coders/myself at a later time.  In addition, I haven't
used any general testing framework.  (To be honest I haven't needed
one either as I've done the development in a highly bottom-up way,
testing everything as I write it.)  In short this is a work in
progress.

The code tries to be self documented, but I'm working on a
tutorial/user's guide on the github wiki page to explain how to use
the software to best effect.

The dependencies for this project are:

* HDF5 (http://www.hdfgroup.org/HDF5/)
* GSL (http://www.gnu.org/software/gsl/)
* CFFI (http://common-lisp.net/project/cffi/)
* GSLL (http://common-lisp.net/project/gsll/)
* Alexandria (http://common-lisp.net/project/alexandria/)
* iterate (http://common-lisp.net/project/iterate/)
* antik (http://www.common-lisp.net/project/antik/)
* closer-mop (http://common-lisp.net/project/closer/closer-mop.html)
* cl-csv (https://github.com/AccelerationNet/cl-csv)
* gnuplot (http://www.gnuplot.info/)
* cl-fad (http://weitz.de/cl-fad/)
* external-program (http://github.com/sellout/external-program)

All of the Lisp dependencies can be installed via quicklisp
(http://www.quicklisp.org/).

I copied the API for using gnuplot from gnuplot_i
(http://ndevilla.free.fr/gnuplot/).  gnuplot_i was written by
N. Devillard <ndevilla@free.fr>, released to the public domain, and is
a no-nonsense gnuplot session manager written in C.

I use SBCL (http://www.sbcl.org/) almost exclusively; however, I also
intentionally try to ensure that all the code only assumes what the CL
standard provides.  Anytime implementation-specific functionality is
needed I try to use third party libraries for this.