These scripts use the Windows findstr command-line tool and some Python processing to search for terms across multiple documentation repositories. (Repositories are assumed to use the metadata formats for docs.microsoft.com.)
To use this tool:
-
Make sure you have Python 3 installed. Download from https://www.python.org/downloads.
-
Run
pip install -r requirements.txtto install needed libraries. (If you want to use a virtual environment instead of your global environment, runpython -m venv .envthen.env\scripts\activatebefore runningpip install.) -
Modify
folders.txtto list the local folders you want to search and a base URL for the publish target. Each line contains a docset name, the local path (not including "*.md" which is appended automatically), and a base URL separated by any amount of whitespace. In the output, the URL is generated by replacing the folder in a file path with the base URL, removing ".md", and changing \ to /, as is suitable for Microsoft documentation platforms. -
Modify
terms.txtto list the case-insensitive terms you want to search. Each line has an individual term and can include spaces and regular expressions (which are allowed by findstr). (If you need a case-sensitive search, remove /I from the findstr command line intake-inventory.py.) -
Run "python take-inventory.py" and output is generated in
results_<date>_<random_int>.csvandresults_<date>_<random_int>-with-metadata.csvfiles, the latter of which includes various metadata values extracted from the files in question (see extract-metadata.py, which is invoked at the end of take-inventory.py).
Note that after a run, the text_results folder contains intermediate files from the findstr command line, which are of the form <docset>-<search-term>.txt. These can be deleted once you have the .csv files.
The first time you run a search in a particular folder, the findstr command probably takes a minute or two, depending on the number of files. Subsequent runs, however, will happen much more quickly thanks to Windows' file system caching. This characteristic means that it's very quick and easy to modify search terms and run the tool again...you won't be waiting as long.