Quickref

Tab_calc program Alexey A. Mints

Quickref

Usage: tab_calc [-c command] [-f filename] 
       [-m value] [-n column_number] [-o mode]
       [-t steps] [-x column_number]
       [-M value]
       [-hsQVv]
Options:
  -c command          command [avg|sum|max|min|med|med_q|smooth|hist|hist2|fit|root|...]
  -f filename         input file name
  -h                  print this help
  -i ini-filename     name of ini-file
  -m value            minimum range value [data minimum]
  -n column_number    number of columns in the file [autodetect]
  -o mode             histogram mode (0 - absolute values, 
                                      1 - normalized to maximum value,
                                      2 - normalized to total)
  -s                  print result only for one column
  -t steps            number of bins for a histogram/smoothing range [10]
  -thr threshold      threshold for root-finder
  -v                  print version & author information
  -x column_number    number of column to proceed
  -M value            maximum range value [data maximum]
  -Q                  quiet mode: hide information messages
  -V                  verbose mode: print all information messages

User manual

Introduction

This is a simple tool to perform various statistical and mathematical operations on single ASCII data-files. The entire file is read into an array, which is then analyzed. Number of columns is detected automatically. Some support for Vizier-format datafiles is also provided.

Data size is limited to 50 columns and 300.000 rows. Comment lines are allowed, starting with ';' symbol (this can be changed in ini-file COMMENT option)

Installation

One can download pre-compiled binaries or the source code.

Compiling

Before doing this, check Makefile for the following:

modify INSTALL_PATH
check that fortran compiler, that You are using is stated in F90=

Then type make and enjoy! If You want VERSION and USAGE files to be included into binary, type make inline, but this work only in Linux system, because Fortran_wrap_print.sh works only there.

The code is compatible with the following compilers (maybe more, but not tested):

Windows

g95 (GCC 4.0.2 and later) for MinGW

LINUX

g95
gfortran

Installation of addons

"Addons" contain additional command-line parameters, commands and program variables. Adding them needs re-compilation, so You will need sources. To install addon, just copy .param .command and .vars files into appropriate directory (commands, params, variables) and re-run make

Install directory

This directory can be specified in Makefile as INSTALL_PATH variable. Here must stay the default ini-file, as well as VERSION, MANUAL, USAGE files. For pre-compiled files it is not defined, so the code will look for the ini-file in the current folder. (On Linux) One can also add to .bashrc:

alias tab_calc='tab_calc --ini /path/to/ini/file'

Avaliable commands

Some of operations are predefined, although the system can be easily extended by adding more operation file into 'commands' folder (see Developer manual for details)

Command	Description	Supported number of columns	Grouping support	Binning mode
avg	Get average values	1-any	yes	no
count	Count lines/values	1-any	yes	no
dis	Value dispersion	1-any	yes	no
dis_med	Dispersion w.r.t. Median value	1-any	no	no
distr	Value distribution	2	no	yes
distr2d	2D distribution	3	no	yes
distre	Value distribution with dispersions	2	no	yes
fit	Perform simple linear fit for the data	2	no	no
fit_ext	Linear fit with extended output	2	no	no
fit_filter	Linear fit with outliers rejection	2	no	no
fit_log	Logarithmic fit	2	no	no
fit_log_filter	Logarithmic fit with outliers rejection (absolute value of maximum error given)	2	no	no
fit_log_procent_filter	Logarithmic fit with outliers rejection (percentage of maximum error given)	2	no	no
fit_power	Logarithmic fit (maximum likelihood method)	2	no	no
gmean	Get geometric mean from values in column	1-any	yes	no
grid	Make equal-spaced grid from table function	2	no	yes
help	Print help and quit	not needed	no	no
hist	Make dimple histogram	1	no	yes
hist2	Make local density distribution	1	no	yes
histl	Make histogram with logarithmic bins	1	no	yes
hmean	Get harmonic mean from values in column	1-any	yes	no
int	Integrate table function	2	no	no
int_avg	Make average for a given table function	2-any	no	no
many	Various statistical information on column(s)	1-any	no	no
max	Find maximum values in columns	1-any	yes	no
med	Find medians for columns	1-any	no	no
med_q	Find medians and quartiles for columns	1-any	no	no
min	Find minimum values in columns	1-any	yes	no
none	Do nothing (pipe data to output)	1-any	no	no
norm	Normalize data (set maximum value to 1)	1-any	no	no
norm_avg	Normalize data (set average value to 1)	1-any	no	no
norm_max	Same as norm	1-any	no	no
norm_min	Normalize data (set minimum value to 1)	1-any	no	no
path	Write install path end quit	not needed	no	no
polar	Convert X,Y (and, optionaly, VX and VY) from cartesian to polar coordinates	2 or 4	no	no
rel_dis	Relative dispersion	1-any	yes	no
root	Find root of f(x)=t equation	2	no	no
shift_center	Shift columns, so that the average value is zero	1-any	no	no
smooth	Smooth table function with a given window	2	no	no
sort	Sort one column	1	no	no
sort_by	Sort table by one column	1	no	no
sort_line	Sort data in lines	not needed	no	no
sqrt	Take square root from the sum of the squares of columns	1-any	no	no
stat	Gets both average and dispersion of columns	1-any	no	no
sum	Sum data in columns	1-any	yes	no
sys	Print system information and quit	not needed	no	no
tertia	Find first and second tertias	1-any	no	no
xhist	Experimental histogram	1	no	yes

Some more comments:

distr - makes distribution of values. Two columns must be stated with -x option. Average values for the 2nd column values grouped in bins of the first one are calculated.
distre - same as distr command, but errors are estimated as well
distr2 - same as distr command, but makes a two-dimensional distribution (thus, three columns be stated with -x option.
fit - performs linear fit to the data
fit_ext- performs linear fit to the data, and output various information, depending on the mode (specified by -o option):

fit parameters (a & b)
x, y, fitted y, fit error (this is still under development)

hist - prints histogram information on key column. Number of bins is defined by -t option (10 by default). Three modes are avaliable (specified by -o option):

Absolute values are computed (default)
Relative values are computed (maximum value is 1)
Relative values are computed (total is 1) Lower and upper limits can be set by -m and -M options [Data minimum and data maximum are taken by default]

hist2 - additional version of histogram making: neighbour density is calculated for each particle (neighbour number can be set by -t option).
int - integrates (1-st order) function. X and Y columns must be set
int_avg - integrates (1-st order) function, then devides the result by the length of integration interval. X and Y columns must be set.
many - several sub-commands can be defined via -S option. Then for every column results of these sub-commands are printed. Implemented sub-commands are avg, min, max, dis, med, gmean, hmean.
med - gives median column value. NOTE: if -s option is not set, then median values for each column is evaluated.
rel_dis - finds relatve dispersion (dis/avg)
root - finds values from xcol_add(1), where xcol_add(2) equals to 0 (or another value, set by -thr option). Linear interpolation between points assumed at the moment.
sort - performs sorting of a given column (-x)
smooth - performs simple n-point smoothing on key column. n can be set by -t option
sqrt - returns sqare root of columns from -x option (from 1 to 3 columns)

Command line options

Option	Explanation [Default value]
-c command	specify command
-f filename	input file name [$stdin]
-g column_numbers	comma-separated numbers of group-by columns [-]
-h	print help
-i column_numbers	defines list of integer columns [-]
--ini ini-filename	name of ini-file
-m value	minimum range value [data minimum]
-n column_number	number of columns in the file [autodetect]
-o mode	histogram or fit mode
-path	show path to ini-file, help and version files
-s	print result only for one column
-t steps	number of bins for a histogram/smoothing range [10]
-thr threshold	threshold for root-finder [0.0]
-v	print version & author information and exit
-x column_numbers	comma-separated numbers of 'x' columns to proceed [all]
-xi column_numbers	comma-separated numbers of columns to ignore []
-y column_numbers	comma-separated numbers of 'y' columns to proceed []
-M value	maximum range value [data maximum]
-Q	quiet mode: hide information messages
-S column_numbers	sub-commands, used in 'many' command []
--sys	prints system information, such as maximum input file sizes etc.
-V	verbose mode: print additional information messages
-comment char	specify comment char [from ini-file]

Ini-file

The ini file contains some commonly-used options. As in any ini-file options are set as follows:

OPTION=value

Warning! this feature is case-sensitive

As well, comments are possible, starting with '#' symbol. If some command-line option is omitted, it's value is taken from ini-file. If it is not present even in ini-file, some default value is taken, but this values can vary from version to version.

By default, ini-file is located in the install directory and named tab_calc.ini, but this can be overriden by -i option

Here's the list of supported ini-options:

Ini-option	command-line option	default value
COMMAND	-c	help
XCOL	-x	1
DATA_FILE	-f	''
COMMENT	-comment	;
REAL_FORMAT		'*'
MIN_VALUE	-m	autodetect
MAX_VALUE	-M	autodetect
THRESHOLD	-thr	0.0
STEP_NUMBER	-t	10

Histogram

Histogram command (-c hist) allows plotting 1D distributions. Important values are:

lower end of histogram
upper end of histogram
number of steps
mode

by default data minimum and maximum are taken as a range (this can be also set with -m and -M options, as well as through .ini-file). Min Step1 Step2 Stepi-1 Max | Bin1 | Bin2 | Bin3 | BinI | +------+------+------+------+

Three modes are avaliable (specified by -o option):

Absolute values are computed (default)
Relative values are computed (maximum value is 1)
Relative values are computed (total is 1)

Group by

If the group by columns are specified (by -g option), then the results are grouped by unique combinations of values in theese columns. In this case only one column can be specified in -x option.

Commands supported at the moment: dis, max, min, rel_dis

Tips & Examples

For some operations one or more key columns can be set with -x key.

Setting -s key leads to printing information only on the key column(s). Whole data line will be printed otherwise. For example,

 tab_calc -c max -x 1 -s

will print maximum value in the first column, and

 tab_calc -c max -x 1

will print whole table line, containing maximum value in the first column.

Number of columns is now detected automatically, although you still can set it explicitly by -n option (with unpredictable results)

 tab_calc -c hist -t 20 -f test.dat -x 2

This command will print the distribution of values in 2nd column of file "test.dat", in 20 bins.

Developer manual

Program structure

Program can be devided in three main parts:

1. Reading input parameters

Intrinsic subroutine GetArg is used for obtaining command-line options. Intrinsic function iargc is used for obtaining the number of command-line options. If theese two functions are not supported by Your compiler - shame on You 8)

First step is to check, wether -i option is set. If so, then get ini-file name. If not, then the default name tab_calc.ini is taken. Default ini-file must be located in INSTALL_PATH.

Then ini-file is opened, and all values are read from it. If some of the standard values is not present, the default one is taken.

Next step is a cycle over all parameters, using 'case' to define appropriate reactions. If You want to define a new parameter(s), You can do it with addind new file into params folder. File must be like this:

    case('-m')
      range_min=Char2Real(sKeyValue)

this file will be included into tab_calc.f90 during compilation. Not, that sKeyValue is the corresponding option value (like -m sKeyValue). If You don't use it, add

   i = i - 1

to Your file... This is done, because most of options in this program need values. If You need more, than one value for the option, there're two solutions:

Get this value(s) with GetArg subroutine(s), but don't forget to increase i in this case.
Support comma-separated values. String array is a good way to do this, like in -x option.

2. Reading data from file

First step is to define input unit. If filename (file name, defined with -f option or in ini-file is empty, then unit_id = 5, i.e. $stdin. Otherwise, first unused unit is detected, opened and assigned to unit_id variable.

If file is not $stdin, it should be opened, as ASCII file with status="UNKNOWN". Then first non-commentary line is read. After transformation (replace TABs with SPACEs, remove double SPACEs, adjust left), number_of_spaces + 1 is counted - that gives the number of columns.

The following lines are supposed to have the same number of columns, as the first one. If this is not the case, "bad" lines are simply skipped (if "verbose" mode is on, warning is shown)

3. Processing data and printing the results

Data processing is controlled by "command" parameter via simple case construction. As results can be significantly different from each other, printing the results is performed in each case separately. At the moment formatted output is not supported.

Important variables and constants

variable	type	description
data_table	real (2-dim array)	contains all data read from the file
rownum	integer	number of rows in data_table
column	integer	number of columns in data_table
xcol_add	integer (array)	mask for columns to process
xcol_num	integer	number of columns to process
sInstallPath	character	INSTALL_PATH
temp_values	real (array)	temporary values (for any use in data processing)
verbose	logical	verbose flag
cComment	character*(1)	comment char (defined in array_works.f90)
MAX_COLUMN	integer constant	maximum number of columns in file (defined in array_works.f90)
MAX_ROW	integer constant	maximum number of rows in file (defined in array_works.f90)
MAX_STEPS	integer constant	maximum number of steps in histogram/smoothing range
XCOL_NULL	integer constant	Value for xcol_add(), treated as "not set"

Writing addons

The purpose of addons is to provide some new functionality to the program, without modifying the source code itself. This allows You to use them, even if main modules of the program will be replaced with newer versions. I'll do my best to provide full compatibility of addons for different versions.

There are 3 ways to extend the program. You can add:

new variables, are stored in files in variables directory with .vars extension. They should contain definitions of variables, just as in standart programs. All predefined constants can be used as well.
new command-line parameters, are stored in files in params directory with .param extension. They should be organized just like standart command-line parameters, as it is shown in Chapter 1 of "Developer manual".

Special modules

ini_file.f90

This is a special module for dealing with .ini-files, that contain some global program settings. Provides an easy way to accsess values of different types by thier names.

string_Utils.f90

TODO list

Better sorting (2-ways, global sort)
extremum search
total min/max search
better error checking
non-numeric column grouping
better formatted output support

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
bin		bin
commands		commands
doc		doc
params		params
presentation		presentation
src		src
test		test
tests		tests
variables		variables
.hgignore		.hgignore
Makefile		Makefile
README.md		README.md
VERSION		VERSION
build		build
bundle.sh		bundle.sh
commands.list		commands.list
get_commands.py		get_commands.py
make_vers.py		make_vers.py
make_vers.rb		make_vers.rb

minzastro/tab_calc

Folders and files

Latest commit

History

Repository files navigation

Quickref

User manual

Introduction

Installation

Compiling

Installation of addons

Install directory

Avaliable commands

Command line options

Ini-file

Histogram

Group by

Tips & Examples

Developer manual

Program structure

1. Reading input parameters

2. Reading data from file

3. Processing data and printing the results

Important variables and constants

Writing addons

Special modules

ini_file.f90

string_Utils.f90

TODO list

About

Topics

Resources

Stars

Watchers

Forks

Languages