# 004 incorporating kalign3 as a library

This notebook uses the low-level phylogenetic library [biomcmc-lib](https://github.com/quadram-institute-bioscience/biomcmc-lib) (commit [5975331](https://github.com/quadram-institute-bioscience/biomcmc-lib/commit/5975331ef88d1c4ec9aef9599fb6348905d289c7)).

Recently I (re-)discovered the [kalign](https://github.com/TimoLassmann/kalign) software for multiple sequence alignment by Timo Lassmann, and was happily surprised with the code simplicity and a liberal license (GPL-3.0-or-later). 

Therefore I decided to incorporate its code into a library, that I can access using a `char_vector::` from the `biomcmc-lib` library. Currently I am working on a derived library/software called "cumaru" (still private), with a modified version of `kalign` algorithms.

## Autotools

First, a digresssion on the configuration aspects of `cumaru` (or another software importing these libraries).

This is needed by `configure.ac` as well as the described files, in order for `kalign3` to use the AVX extensions. 
```bash
# M4 macros for checking of CPU features from kalign3
m4_include([m4/ax_gcc_x86_avx_xgetbv.m4])
m4_include([m4/ax_gcc_x86_cpuid.m4])
m4_include([m4/ax_check_compile_flag.m4])
m4_include([m4/ax_ext.m4])
m4_include([m4/ax_openmp.m4])
AX_EXT
```
For importing `biomcmc-lib`, you have two options:
1. downloading the (upstream) software through github, with option `--recursive` (default for final users of software). This will download `biomcmc-lib` into `${srcdir}/submodules/biomcmc-lib`.
2. download `biomcmc-lib` independently into `${srcdir}/biomcmc-lib` or somewhere else and then link it to there. This is how I do it, since I use the same directory for several projects that rely on it. 

If you opted for (1) above (i.e. you just cloned the repository recursively), then the configuration option below will create a link as in option (2). 

```bash
AC_CHECK_FILE([${srcdir}/biomcmc-lib],[],[ln -s submodules/biomcmc-lib ${srcdir}/biomcmc-lib])
AC_CHECK_FILE([${srcdir}/biomcmc-lib/configure.ac],[], [AC_MSG_ERROR(["biomcmc-lib submodule missing, please git clone --recursive or link by hand to location of source code"])])
dnl Call biomcmc-lib ./configure script recursively.
AC_CONFIG_SUBDIRS([biomcmc-lib])
AC_SUBST([BIOMCMCLIB], [biomcmc-lib])
```

Then the file `kalign/Makefile.am` has the following information, to create a static local library. This library will encapsulate `biomcmc-lib` and will be available to the main software as `libalign.la`:
```bash
AM_CPPFLAGS = $(GTKDEPS_CFLAGS)  -I$(srcdir)/../@BIOMCMCLIB@/lib  @OPENMP_CPPFLAGS@ @ZLIB_LIBS@
AM_CFLAGS = @SIMD_FLAGS@ @AM_CFLAGS@  @OPENMP_CFLAGS@
LDADD = $(GTKDEPS_LIBS) @ZLIB_LIBS@ ../biomcmc-lib/lib/libbiomcmc.la  $(AM_LDFLAGS)

common_headers = kalign.h \
tldevel.h rng.h global.h \
alignment_parameters.h \
bisectingKmeans.h \
sequence_distance.h \
alignment.h bpm.h

common_src = run_kalign.c \
tldevel.c rng.c \
alignment_parameters.c \
bisectingKmeans.c \
sequence_distance.c \
alignment.c bpm.c

noinst_LTLIBRARIES = libkalign.la   ## noinst_LT: linked statically (not installed globally)
libkalign_la_SOURCES = config.h $(common_headers) $(common_src) 
```

And the `src/Makefile.am` (with the final software) can be something like:
```bash
AM_CPPFLAGS = $(GTKDEPS_CFLAGS) -I$(srcdir)/../kalign -I$(srcdir)/../@BIOMCMCLIB@/lib @OPENMP_CPPFLAGS@  @ZLIB_LIBS@ 
AM_CFLAGS = @AM_CFLAGS@ @OPENMP_CFLAGS@ @CHECK_CFLAGS@
LDADD = $(GTKDEPS_LIBS) @CHECK_LIBS@  ../kalign/libkalign.la ../biomcmc-lib/lib/libbiomcmc.la @ZLIB_LIBS@  $(AM_LDFLAGS)

bin_PROGRAMS = cumaru 
cumaru_SOURCES = main.c kseq.h
cumaru_LDADD = $(LDADD)
```
Notice that we need to include the path to the local `biomcmc-lib` as well, since it is also statically linked.

The code below is a minimal software that performs multiple sequence alignment from a fasta file (notice that the fasta reading comes from `kseq.h`). The code does not run, BTW. 

In [6]:
//%cflags:-lm
//%cflags: -I/usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/biomcmc-lib/lib
//%cflags: -I/usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/build/biomcmc-lib/lib
//%cflags: /usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/build/biomcmc-lib/lib/.libs/libbiomcmc.a
#include <kalign.h>
#include "kseq.h"
KSEQ_INIT(gzFile, gzread)

int
main (int argc, char **argv)
{
  int i;
  clock_t time0, time1;
  char_vector seqname = new_char_vector (1);
  char_vector dna = new_char_vector (1);
  char_vector align = NULL;

  time0 = clock ();
  arg_parameters params = get_parameters_from_argv (argc, argv);

  gzFile fp = gzopen((char*) params.fasta->filename[0], "r");
  kseq_t *seq = kseq_init(fp);
  while ((i = kseq_read(seq)) >= 0) {
    char_vector_add_string (seqname, seq->name.s);
    char_vector_add_string (dna, seq->seq.s);
  }
  kseq_destroy(seq);
  gzclose(fp);
  time1 = clock (); fprintf (stderr, "read : %lf\n",  (double)(time1-time0)/(double)(CLOCKS_PER_SEC)); fflush(stderr); 

  align = kalign3_from_char_vector (dna);
  for (i= 0; i < align->nstrings; i++) printf (">%s\n%s\n", seqname->string[i], align->string[i]);

  del_char_vector (dna);
  del_char_vector (align);
  del_char_vector (seqname);
  del_arg_parameters (params);
  return EXIT_SUCCESS;
}

/tmp/tmpscuivx3u.c:5:10: fatal error: kalign.h: No such file or directory
 #include <kalign.h>
          ^~~~~~~~~~
compilation terminated.
[C kernel] GCC exited with code 1, the executable will not be executed