Skip to content

Stata wrapper for various cryptographic hashes from OpenSSL

License

Notifications You must be signed in to change notification settings

mcaceresb/stata-shasum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shasum

Fast cryptographic hashing in Stata.

This package provides a C wrapper for the hash functions (checksums) in the OpenSSL library, namely MD5, SHA1, SHA224, SHA256, SHA384, and SHA512. It also includes a short wrapper for Python's hashlib functions (python is only available from Stata 16+, but this is specially convenient if you have difficulty compiling the plugin).

version 0.2.2 11Oct2022

Installation

Pre-compiled binaries are provided, but it is possible you will have to compile the plugin yourself (in particular in OSX). If the function fails, see the compiling section below.

local github "https://raw.githubusercontent.com"
net install shasum, from(`github'/mcaceresb/stata-shasum/master/build)
* adoupdate, update
* ado uninstall shasum

Usage

sysuse auto

* Hash string variable.
shasum make, sha1(make_sha1)
shasum make make, sha1(makemake_sha1)

* Hash a mix of string and numeric data. Note that we do not read
* numbers as strings to any particular level of precision in order to
* hash them. Rather, we hash the double-precision 64-bit representation
* of each number.
shasum make price, sha1(makeprice_sha1)

* Use Python's hashlib
pyhash make make,  hashes(sha1) gen(makemake_pysha1)
pyhash make price, hashes(sha1) gen(makeprice_pysha1)
assert makemake_pysha1  == makemake_sha1 
assert makeprice_pysha1 == makeprice_sha1

* Pad strings to be the same length. By default, we concatenate strings.
* So if you want to hash "a" and "hello", you can padd "a" with null
* characters to "a\0\0\0\0" so it is the same length as "hello".
shasum make, sha1(make_sha1_pad) pad

* You can also hash a file
findfile auto.dta
shasum, file(`r(fn)', sha1)
return list
local sha1 = `"`r(sha1)'"'

* or a list of files
clear
set obs 1
findfile auto.dta
gen y = `"`r(fn)'"'
shasum y, sha1(shay) filelist
assert `"`=shay[1]'"' == `"`sha1'"'
l

help shasum

For files, you can pass the path in parts. If variable x contains "folder/" and variable y contains "file.ext", then you can do:

shasum x y, sha1(shay) filelist path(/path/to/folder/)

Note that shasum won't add path delimiters, so they must end in "/" or the file won't be found.

Compiling

To compile, you will need

  • The GNU compiler collection (gcc)
  • git
  • OpenSSL
  • Cygwin (on Windows) with binutils, make, cmake, gcc-core, gcc-g++, mingw64-x86_64-gcc-core, mingw64-x86_64-gcc-g++ and OpenSSL (Cygwin is pretty massive by default; I would install only those packages).

From OpenSSL, you need, in particular, libssl.a and libcrypto.a as well as the headers md5.h and sha.h If you have all this in your path, run

git clone https://github.com/mcaceresb/stata-shasum
cd stata-shasum
make SSL="-lssl -lcrypto"

This only works if you have OpenSSL installed. If you don't (or if you get any errors) then you can compile a static version of OpenSSL. This is fairly straightforward on Linux and OSX by following the instructions here. For non-silicon OSX, for example,

git clone git://git.openssl.org/openssl.git
cd openssl
git checkout OpenSSL_1_1_1-stable
./Configure darwin64-x86_64-cc
export KERNEL_BITS=64
./config
make

Then you can run

cd ../stata-shasum
make LIBPATH=../openssl INCLUDE=-I../openssl/include

In either case, once you have compiled the plugin you need to install (or re-install) the package. From Stata (replacing `c(pwd)' with the path to where you cloned the repo):

cap noi ado uninstall shasum
cap noi net uninstall shasum
net install shasum, from(`c(pwd)'/stata-shasum/build) replace

Notes

  • I compile version 1.1.1 of OpenSSL; this plugin may not be compatible with the latest version 3.0; see issue #2 for more.

  • If you have an older stata version (14.0 or earlier) then compile the Stata plugin using make SPI=2.0.

License

stata-shasum is MIT-licensed