Skip to content

Commit

Permalink
New statistical functions and testing framework
Browse files Browse the repository at this point in the history
New functions:

* percentile_cont
* percentile_disc
* stat_mode

Fixes:

* median now returns correct value for even number of rows

Tests for:

* corr
* median
* percentile_cont
* percentile_disc
* stat_mode
  • Loading branch information
Peter Kuma committed Feb 2, 2015
1 parent 0f9be00 commit 1240be9
Show file tree
Hide file tree
Showing 34 changed files with 15,990 additions and 218 deletions.
13 changes: 11 additions & 2 deletions Makefile.am
@@ -1,5 +1,5 @@
ACLOCAL_AMFLAGS = -I m4
SUBDIRS = src
SUBDIRS = src test
EXTRA_DIST = load.sql unload.sql
BUILT_SOURCES = load.sql unload.sql
CLEANFILES = load.sql unload.sql
Expand All @@ -12,4 +12,13 @@ load.sql:
unload.sql:
sh unload.sql.sh $(enable_functions) > unload.sql

.PHONY: load.sql unload.sql
load:
mysql $(MYSQL_OPTIONS) < load.sql

unload:
mysql $(MYSQL_OPTIONS) < unload.sql

test test_prepare test_clean: all
cd test && $(MAKE) $(AM_MAKEFLAGS) $@

.PHONY: load.sql unload.sql load unload test test_preapre test_clean
18 changes: 16 additions & 2 deletions Makefile.in
Expand Up @@ -144,6 +144,10 @@ CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPP = @CPP@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
Expand Down Expand Up @@ -207,6 +211,7 @@ abs_top_builddir = @abs_top_builddir@
abs_top_srcdir = @abs_top_srcdir@
ac_ct_AR = @ac_ct_AR@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
Expand Down Expand Up @@ -255,7 +260,7 @@ top_build_prefix = @top_build_prefix@
top_builddir = @top_builddir@
top_srcdir = @top_srcdir@
ACLOCAL_AMFLAGS = -I m4
SUBDIRS = src
SUBDIRS = src test
EXTRA_DIST = load.sql unload.sql
BUILT_SOURCES = load.sql unload.sql
CLEANFILES = load.sql unload.sql
Expand Down Expand Up @@ -780,7 +785,16 @@ load.sql:
unload.sql:
sh unload.sql.sh $(enable_functions) > unload.sql

.PHONY: load.sql unload.sql
load:
mysql $(MYSQL_OPTIONS) < load.sql

unload:
mysql $(MYSQL_OPTIONS) < unload.sql

test test_prepare test_clean: all
cd test && $(MAKE) $(AM_MAKEFLAGS) $@

.PHONY: load.sql unload.sql load unload test test_preapre test_clean

# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
Expand Down
77 changes: 76 additions & 1 deletion README.md
Expand Up @@ -33,7 +33,7 @@ option:
./configure --enable-functions="<list-of-functions>"
```

where `<list-of-functions>` is a space-separated list of function names.
where `<list-of-functions>` is a list of function names separated by space.

### Uninstall

Expand Down Expand Up @@ -109,6 +109,25 @@ int lessavg(double m);
mysql> SELECT lessavg(double m) from t1;
```


Calculate continuous percentile. Returns the value at a relative position
specified by the fraction, interpolating between input values if needed.
```
double percentile_cont(double x, double fraction);
mysql> SELECT percentile_cont(x, 0.5) from t1;
```


Calculate discrete percentile. Returns the first input value whose relative
position is greater than or equal to the specified fraction.
```
double percentile_disc(double x, double fraction);
mysql> SELECT percentile_disc(x, 0.5) from t1;
```


Calculates the 3th statistical moment of a data set: skewness
See: http://geography.uoregon.edu/geogr/topics/moments.htm
```
Expand All @@ -117,6 +136,15 @@ double skewness(double m);
mysql> SELECT skewness(double m) from t1;
```


Find statistical mode, i.e. the most frequent input value.
```
double stat_mode(double x);
mysql> SELECT stat_mode(double x) from t1;
```


Calculates the 4th statistical moment of a data set: kurtosis
See: http://geography.uoregon.edu/geogr/topics/moments.htm
```
Expand Down Expand Up @@ -356,6 +384,53 @@ mysql> SELECT SETINT(4283942, 4, 8, 10);
1 row in set (0.00 sec)
```

Testing
=======

udf_infusion contains a set of unit tests to verify the correctness
of the provided UDF functions. Running them after installation is optional.

Prerequisites:

* Python 2.7
* [numpy](http://www.numpy.org/)
* [scipy](http://scipy.org/)

**Note**: The testing framework requires all UDF functions to be enabled
during installation.

First, it is recommended you set connection details (incl. password) in
`~/.my.cnf`, e.g.:

```
[client]
user=<user>
password=<password>
```

Alternatively, you can set options to be passed to the MySQL client
in the `MYSQL_OPTIONS` environment variable.

To prepare the testing environment (requires administrator rights in MySQL):

```
make test_prepare
```

This may take a while as sample data are generated and imported.
Database `udf_infusion_test` is created and populated with generated data.

Run tests with:

```
make test
```

After completion, the temporary database can dropped with `test_clean`:

```
make test_clean
```

License
======
Expand Down
78 changes: 78 additions & 0 deletions config.h
@@ -0,0 +1,78 @@
/* config.h. Generated from config.h.in by configure. */
/* config.h.in. Generated from configure.ac by autoheader. */

/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1

/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1

/* Define to 1 if you have the <limits.h> header file. */
#define HAVE_LIMITS_H 1

/* Define to 1 if you have the <memory.h> header file. */
#define HAVE_MEMORY_H 1

/* Define to 1 if MySQL libraries are available */
#define HAVE_MYSQL 1

/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1

/* Define to 1 if you have the <stdlib.h> header file. */
#define HAVE_STDLIB_H 1

/* Define to 1 if you have the <strings.h> header file. */
#define HAVE_STRINGS_H 1

/* Define to 1 if you have the <string.h> header file. */
#define HAVE_STRING_H 1

/* Define to 1 if you have the <syslimits.h> header file. */
/* #undef HAVE_SYSLIMITS_H */

/* Define to 1 if you have the <sys/stat.h> header file. */
#define HAVE_SYS_STAT_H 1

/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1

/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H 1

/* Define to the sub-directory in which libtool stores uninstalled libraries.
*/
#define LT_OBJDIR ".libs/"

/* Name of package */
#define PACKAGE "lib_udf_infusion"

/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT "robert@xarg.org"

/* Define to the full name of this package. */
#define PACKAGE_NAME "lib_udf_infusion"

/* Define to the full name and version of this package. */
#define PACKAGE_STRING "lib_udf_infusion 1.0"

/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "lib_udf_infusion"

/* Define to the home page for this package. */
#define PACKAGE_URL ""

/* Define to the version of this package. */
#define PACKAGE_VERSION "1.0"

/* Define to 1 if you have the ANSI C header files. */
#define STDC_HEADERS 1

/* Version number of package */
#define VERSION "1.0"

/* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */

/* Define to `unsigned int' if <sys/types.h> does not define. */
/* #undef size_t */

0 comments on commit 1240be9

Please sign in to comment.