Skip to content

gtools-1.5.1

Compare
Choose a tag to compare
@mcaceresb mcaceresb released this 24 Mar 20:28
· 191 commits to master since this release

Release update. New commands, major features, and various bug fixes. Remember to run gtools, upgrade to keep up to date between major updates.

New Commands

  • gstats winsor is a fast, by-able winsor2 alternative for Winsorizing and trimming data (accepts weights).

  • greshape long and greshape wide are a fast alternative to reshape.

  • greshape spread and greshape gather are analogous to the spread and gather commands from R's tidyr.

  • gstats sum and gstats tab (alias gstats summarize and gstats tabstat) are a fast, by-able alternative to sum, detail and tabstat

Enhancements and Features

  • gstats sum or gstats tab with option matasave; this stores the output and by levels in GstatsOutput (custom naming via matasave(name)), an object of class GtoolsResults.

  • gcollapse and gegen now allow the stats:

    • select# and select-#, for the #th smallest or largest value, respectively.
    • rawselect# and rawselect-#, ibid but ignoring weights.
    • cv, coefficient of variation, sd/mean
    • variance
    • range, max - min
  • greshape features

    • Preferred syntax is by() and keys() instead of i() and j(); the docs and most of the printouts reflect this.
    • greshape tries to save variable labels, notes, and characteristics when reshaping.
    • greshape, uselabels allows the user to save the source variable labels as levels instead of their names.
    • greshape supports @ syntax.
    • greshape wide additionally supports varlist syntax (but the same stub cannot have both @ and a varlist).
    • greshape long does not support varlist syntax, but the user can pass regexes as stubs with the option match(regex). See the documentation for details.
  • glevelsof and gtop features

    • glevelsof and gtop both take option matasave (or matasave(name)) to save the variable levels in a mata object (default name is GtoolsByLevels).
    • With option matasave[(name)], r(levels) is not returned; the levels are stored in printed as part of the mata return object (e.g. GtoolsByLevels.printed). The user can save only the raw levels by also adding the silent option.
    • With option matasave[(name)], both gtop, numfmt() and glevelsof, numfmt() do the number formatting in mata, so numfmt() must pass a mata print format instead of a C print format (they are very similar, however).
    • With option matasave[(name)], gtop does not return r(toplevels) either. The frequency table is stored in toplevels as part of the mata return object (e.g. GtoolsByLevels.toplevels).
    • gtop, ntop(.) prints all the levels from largest to smallest; gtop, ntop(-.) prints from smallest to largest; gtop, alpha prints the largest/smallest ntop() levels sorted in variable order (e.g. alphabetically or numerically, depending on the variable type).
    • gtop also stores r(ntop), r(nrows), and r(alpha) as return scalars; if ntop(.) or ntop(-.) are passed, r(ntop) will just be r(J).
    • Both gtop and glevelsof should handle embedded characters better. Printing is still a problem but they get copied to the return values properly.
  • gstats is a general-purpose wrapper for misc functions.

  • lgtools.mlib added with come pre-compiled mata functios.

  • Any function that allows results to be saved in mata allow the mata object to call .desc() to get more info on the object.

  • Faster hash sort with integer bijection (two-pass radix sorts for smaller integers; undocumented option _ctolerance() allows the user to force the regular counting sort).

  • Faster index copy when every observation is read (simply assign the index pointer to st_info->index)

Bug Fixes

  • Stata 14.0 no longer tries to load SPI version 3 (loads version 2).

  • SpookyHash code compiled directly as part of the plugin. Might fix #35 (deleted all ancillary files and code related to spookyhash.dll).

  • gtop, glevelsof, and gcontract parse wildcards before adding any temporary variables, ensuring the latter don't get included in internal function calls.

  • Removed locale as a dependency; comma printing done manually. This fixes a bug where in certain systems, locale would get reset and cause some internal Stata numbers fo interpret decimals via comma, that is, 95.0 would become 95,0 and cause problems down the line.

  • Minor bug fix in gtop; inverted levels were not correctly sorted with weights. The levels themselves were OK, however.

  • gcollapse no longer crashes when rawstat does not match any entries.