Skip to content

labordynamicsinstitute/Statapackagesearch

 
 

Repository files navigation

Test CI Stata

Packagesearch: module to scan Stata .do files and identify SSC packages used by the code

Installation

To install, type the following command into Stata.

net install packagesearch, from("https://labordynamicsinstitute.github.io/Statapackagesearch/")

Syntax: (also available in the help file)


      help packagesearch                                              (SJX-X: dmXXXX)
      -------------------------------------------------------------------------------
      
      Title
      
          packagesearch -- Module to search Stata code for the SSC packages used by
              the code
      
      Description
      
          packagesearch provides a tool that scans, parses, and matches all Stata
          .do files in a directory (and its subdirectories) against a list of all
          packages currently hosted at SSC. It outputs a list of candidate SSC
          packages that were (likely) used when code is run.
      
      Syntax
      
              packagesearch , codedir(directorytoscan)[ details domain(domain) filesave
                      excelsave nodropfalsepos installfounds]
      
      
      Options
      
          codedir(directorytoscan) is required. It specifies the directory that
              contains the .do files to be scanned for SSC packages.
          details will preserve the list of keywords that triggered the package match. 
              By default, only the count of such keywords is output.

          domain(domain) optionally specifies a domain from which to take
              statistics to help identify likely packages (by default, ssc hot is
              used). Only available domain right now is econ.
      
          filesave outputs a list of all files that were parsed during the scanning
              process.
      
          excelsave saves the results of the scan into an Excel spreadsheet titled
              candidatepackages.xlsx. This file is saved in the specified
              directorytoscan and will include a list of parsed programs if
              filesave is also indicated as an option.
      
          nodropfalsepos By default, command removes packages that were frequently
              found to be false positives during beta testing. This flag disables
              that feature. Presently this includes the following packages:  white,
              missing, index, dash, title, cluster, pre, bys
      
          installfounds installs all SSC packages found during the scanning process
              into the current working directory.
      

Description:

The code begins by either collecting a list of all packages hosted at SSC using the whatshot command, or pulling a list of common SSC packages used in economics research (if option domain(econ) is specified).
Next, it identifies all .do files in the specified codedir directory and subdirectories, then parses each .do file into individual words using the txttool command. Finally, it matches the individual words against the list of common Stata packages and outputs a list of candidate packages that were (likely) used when the Stata code was run.

Testing

The Github repository has a few files to test the package. To run, you can do the following:

GITURL=https://github.com/labordynamicsinstitute/Statapackagesearch/
git clone $GITURL
cd Statapackagesearch

and then, if you have Stata installed,

./test/run.sh

and if you don't, but have access to a Stata license (e.g. on Github Codespaces with the proper setup)

echo "$STATA_LIC_BASE64" | base64 -d > stata.lic
docker run -it --rm \
   -v $(pwd)/stata.lic:/usr/local/stata/stata.lic \
   -v $(pwd):/project \
   -w /project \
   --entrypoint /bin/bash dataeditors/stata17:2023-05-16 \
   ./test/run.sh

Questions?

Contact:
Lydia Reiner (lr397@cornell.edu)
Lars Vilhuber (lars.vilhuber@cornell.edu)

Languages

  • TeX 88.1%
  • Stata 10.3%
  • PostScript 1.2%
  • Other 0.4%