Codefinder for Stata

Codefinder for Stata

(v1.00, 14 Jun 2024)

This repository contains the code required to install and run codefinder, a package that uses multiprocessing, associative arrays and optimised Mata functions to speed up many-to-many string matching in Stata. This can be used to identify the presence of lists of codes (e.g. ICD, SNOMED-CT, Read, Emis, etc) in variables containing data in string format. At present, codefinder has only been tested on Windows (10 and 11).

Installation

The package can be installed from GitHub using net install:

net install codefinder, from("https://raw.githubusercontent.com/jonathanbatty/stata-codefinder/main/installation/") replace

Syntax

Codefinder should be used with no data open in Stata. The syntax for codefinder is as follows:

codefinder varstosearch, dataset() codefiles() id() [options]

[options] = n_cores() summary

See the help file using help codefinder for full details of each option.

The basic usage is as follows:

codefinder dx*, dataset(".\data\patient_data.dta") codefiles("hypertension.txt diabetes.txt") id(id_var) n_cores(16)

Whereby the variables dx* (e.g. dx1, dx2, dx3, ... , dx_n) present in patient_data.dta will be searched for the diagnosis codes (strings) present in hypertension.txt and diabetes.txt (one code per line in each file). Each row of data should be identified using a unique identifier, id_var. Codefinder will run the string matching procedure using 16 CPU cores i this case. It will return a dataset in memory that includes id_var and a variable to indicate the presence of one or more codes from each text file in each initial observation (i.e. dx* in this case).

Examples

Examples of running codefinder using a simulated, synthetic dataset is provided in ./examples/.

Feedback

Please open an issue to report errors, suggest feature enhancements, and/or make any other requests.

Change Log

v1.01 (16/06/24)

Minor bug fixes: installation now works with a single command.

v1.00 (14/06/24)

Initial release.

Roadmap

Test on Windows / Mac machines.
Improvements in error reporting functionality: workers to flag errors to main Stata instance, which should handle these appropriately.
Further incremental improvements to speed and stability.

Acknowledgements

JB received funding from the Wellcome Trust 4ward North Clinical Research Training Fellowship (227498/Z/23/Z; R127002).

Suggested Citation

Batty, J. A. (2024). Stata package ``codefinder'': efficient many-to-many string searching in Stata using multiprocessing (Version 1.0) [Computer software]. https://github.com/jonathanbatty/stata-codefinder

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
installation		installation
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
citation.cff		citation.cff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codefinder for Stata

Installation

Syntax

Examples

Feedback

Change Log

Roadmap

Acknowledgements

Suggested Citation

About

Releases 2

Packages

Languages

License

jonathanbatty/stata-codefinder

Folders and files

Latest commit

History

Repository files navigation

Codefinder for Stata

Installation

Syntax

Examples

Feedback

Change Log

Roadmap

Acknowledgements

Suggested Citation

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages