Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
f446ff7
feat(scripts): Add dependency version scanner tool
chalmerlowe Apr 29, 2026
256b048
perf(search): Apply bot suggestions for regex optimization and imports
chalmerlowe Apr 29, 2026
1010399
refactor(benchmark): Use tempfile for unique names and safe cleanup
chalmerlowe Apr 29, 2026
68f61ee
refactor(benchmark): Remove redundant directory check
chalmerlowe Apr 29, 2026
cc960b4
test(integration): Check exit code of subprocess in integration test
chalmerlowe Apr 29, 2026
a4ad9ce
test(unit): Remove redundant and brittle test_regex_patterns
chalmerlowe Apr 29, 2026
2743957
test(unit): Move import yaml to top of file
chalmerlowe Apr 29, 2026
47450bb
refactor(benchmark): Remove redundant directory check in main
chalmerlowe Apr 29, 2026
c777e44
test(unit): Remove duplicate import yaml from function
chalmerlowe Apr 29, 2026
8aab801
feat(version_scanner): handle invalid format strings in config and ad…
chalmerlowe Apr 30, 2026
f63053c
feat(version_scanner): handle PermissionError when reading config fil…
chalmerlowe Apr 30, 2026
2af97b3
feat(version_scanner): extract read_package_file and handle file errors
chalmerlowe Apr 30, 2026
cb29438
refactor(version_scanner): simplify target resolution and remove dupl…
chalmerlowe Apr 30, 2026
ea0e8be
feat(version_scanner): add format_match_for_csv helper and tests
chalmerlowe Apr 30, 2026
a8824af
feat(version_scanner): integrate GitHub link generation into CSV report
chalmerlowe Apr 30, 2026
baafb74
feat(version_scanner): default output to results directory
chalmerlowe Apr 30, 2026
a1cc08e
feat(version_scanner): ignore version_scanner directory during scan
chalmerlowe Apr 30, 2026
3ceea9b
feat(version_scanner): broaden version regex and add case insensitivity
chalmerlowe Apr 30, 2026
d756c07
feat(version_scanner): strip newlines from matched strings
chalmerlowe Apr 30, 2026
075d04b
feat(version_scanner): add word boundaries and truncate long context …
chalmerlowe Apr 30, 2026
85e9ff5
feat(version_scanner): add console summary table
chalmerlowe Apr 30, 2026
5c8f673
feat(version_scanner): add .scannerignore file support
chalmerlowe Apr 30, 2026
efb3331
feat(version_scanner): move ignore defaults to .scannerignore file
chalmerlowe Apr 30, 2026
bf39072
docs(version_scanner): add README.md
chalmerlowe Apr 30, 2026
9d9ce22
docs(version_scanner): update README options and CLI help strings
chalmerlowe Apr 30, 2026
14e4dcc
feat(version_scanner): set default for --github-repo
chalmerlowe Apr 30, 2026
7fc03ca
feat(version_scanner): default config path to script directory
chalmerlowe Apr 30, 2026
f64eac4
feat(version_scanner): support case-insensitive file ignores and add …
chalmerlowe Apr 30, 2026
fc47dd6
feat(version_scanner): update small package list for demos
chalmerlowe Apr 30, 2026
95f6f19
Merge remote-tracking branch 'origin/main' into feat/add-version-scanner
chalmerlowe Apr 30, 2026
761def6
Merge branch 'origin/main' into feat/add-version-scanner
chalmerlowe Apr 30, 2026
9289c8c
feat(version_scanner): add combined_version_string rule and use word …
chalmerlowe Apr 30, 2026
d771258
feat(scanner): add ability to detect ignore pragma
chalmerlowe May 1, 2026
bafae70
feat(scanner): move .scannerignore to script directory and update loo…
chalmerlowe May 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions scripts/version_scanner/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.conductor/
scanner_report.csv
12 changes: 12 additions & 0 deletions scripts/version_scanner/.scannerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Directories and files to ignore by the version scanner
.git
__pycache__
.tox
.nox
venv
.venv
.conductor
version_scanner
docs
samples
changelog.md
31 changes: 31 additions & 0 deletions scripts/version_scanner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Automated Dependency Version Scanner

This tool scans the repository for hardcoded references to specific dependency versions (like Python 3.7) that need to be upgraded or removed.

## Usage

Run the script from the repository root:

```bash
python3 scripts/version_scanner/version_scanner.py -d <dependency> -v <version> [options]
```

### Options

* `-d`, `--dependency`: Name of the dependency (e.g., python, protobuf)
* `-v`, `--version`: Specific version to search for (e.g., 3.7, 4.25.8)
* `-p`, `--path`: Root directory to scan (defaults to current directory)
* `--package`: Specific subdirectory filter (useful for monorepos)
* `--package-file`: Path to a file containing a list of package directories to scan
* `--config`: Path to the regex configuration file (defaults to scripts/version_scanner/regex_config.yaml)
* `-o`, `--output`: Path to the output CSV file (defaults to <dependency>-<version>-<timestamp>.csv)
* `--github-repo`: GitHub repository URL base (defaults to https://github.com/googleapis/google-cloud-python)
* `--branch`: GitHub branch for links (defaults to main)

## Configuration

The scanner uses a YAML configuration file (`regex_config.yaml`) to define rules and regex patterns.

## Ignoring Directories

You can create a `.scannerignore` file in the directory you are scanning (usually the repo root) to list directories to skip, one per line.
166 changes: 166 additions & 0 deletions scripts/version_scanner/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
import argparse
import os
import random
import subprocess
import sys
import tempfile
import time
from typing import List, Dict

def get_package_subset(packages_dir: str, count: int) -> List[str]:
"""
Get a randomized subset of package names from the specified directory.

Args:
packages_dir: Path to the directory containing packages.
count: Number of packages to return.

Returns:
A list of package directory names.
"""
all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))]

if count >= len(all_packages):
return all_packages

return random.sample(all_packages, count)

def run_benchmark(
scanner_path: str,
root_path: str,
package_file: str,
dependency: str,
version: str
) -> float:
"""
Run the scanner and return the duration in seconds.
"""
cmd = [
"python3", scanner_path,
"-d", dependency,
"-v", version,
"-p", root_path,
"--package-file", package_file
]

start_time = time.perf_counter()

try:
result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
except subprocess.CalledProcessError as e:
print(f"Error running benchmark: {e}")
return -1.0

duration = time.perf_counter() - start_time
return duration

def run_benchmarks(
scanner_path: str,
root_path: str,
packages_dir: str,
counts: List[int],
dependency: str,
version: str
) -> Dict[int, float]:
"""Runs benchmarks for specified counts and returns a dict of results."""
results = {}

for count in counts:
subset = get_package_subset(packages_dir, count)
print(f" Testing {len(subset)} packages (e.g., {subset[:3]}...)")

# Create temp package file
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
for pkg in subset:
f.write(f"packages/{pkg}\n")
pkg_file = f.name

try:
duration = run_benchmark(scanner_path, root_path, pkg_file, dependency, version)
results[count] = duration
finally:
# Clean up
if os.path.exists(pkg_file):
os.remove(pkg_file)

return results

def main():
parser = argparse.ArgumentParser(description="Benchmark the version scanner.")

parser.add_argument(
"-s", "--scanner-path",
default="version_scanner.py",
help="Path to version_scanner.py"
)

parser.add_argument(
"-r", "--root-path",
required=True,
help="Path to the monorepo root directory"
)

parser.add_argument(
"-p", "--packages-dir",
help="Path to packages directory (defaults to <root-path>/packages)"
)

parser.add_argument(
"-d", "--dependency",
default="python",
help="Dependency to search for"
)

parser.add_argument(
"-v", "--version",
default="3.7",
help="Version to search for"
)

parser.add_argument(
"-c", "--counts",
default="1,10,50",
help="Comma-separated list of package counts to test"
)

args = parser.parse_args()

packages_dir = args.packages_dir or os.path.join(args.root_path, "packages")

if not os.path.exists(packages_dir):
print(f"Error: Packages directory not found: {packages_dir}", file=sys.stderr)
sys.exit(1)

counts = [int(c) for c in args.counts.split(',')]

all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))]

total_packages = len(all_packages)

print(f"Found {total_packages} packages in {packages_dir}")

# Filter counts that are greater than total packages
counts = [c for c in counts if c <= total_packages]
# Add total if not already there
if total_packages not in counts:
counts.append(total_packages)

print(f"Running benchmarks for counts: {counts}")

results = run_benchmarks(
scanner_path=args.scanner_path,
root_path=args.root_path,
packages_dir=packages_dir,
counts=counts,
dependency=args.dependency,
version=args.version
)

print("\nBenchmark Results:")
print(f"{'Packages':<10} | {'Time (seconds)':<15}")
print("-" * 30)
for count, duration in results.items():
print(f"{count:<10} | {duration:<15.4f}")

if __name__ == "__main__":
main()
102 changes: 102 additions & 0 deletions scripts/version_scanner/regex_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
description: Search rules for identifying dependency versions
rules:
- name: explicit_version_string
description: Finds explicit version strings in code or configs.
examples:
- "'3.7'"
- '"3.7.1"'
- "'3.7.12'"
- "Python 3.7"
rules:
- |
\b{major}\.{minor}(\.\d+)?\b

- name: python_requires
description: Finds various forms of python_requires declarations.
applies_to: [python]
examples:
- "python_requires = '==3.7'"
- "python_requires = '>=3.7'"
- "python_requires = '<=3.7'"
- "python_requires = '>3.6'"
- "python_requires = '<3.8'"
rules:
- |
python_requires\s*=\s*['"]==3\.{minor}['"]
- |
python_requires\s*=\s*['"]>=3\.{minor}['"]
- |
python_requires\s*=\s*['"]<=3\.{minor}['"]
- |
python_requires\s*=\s*['"]>3\.{minor_minus_one}['"]
- |
python_requires\s*=\s*['"]<3\.{minor_plus_one}['"]

- name: sys_version_info
description: Finds sys.version_info checks in code.
applies_to: [python]
examples:
- "sys.version_info == (3, 7)"
- "sys.version_info >= (3, 7)"
- "sys.version_info <= (3, 7)"
- "sys.version_info > (3, 6)"
- "sys.version_info < (3, 8)"
- "sys.version_info.minor == 7"
- "sys.version_info.minor >= 7"
- "sys.version_info.minor <= 7"
- "sys.version_info.minor > 6"
- "sys.version_info.minor < 8"
rules:
- |
sys\.version_info\s*==\s*\(3,\s*{minor}\)
- |
sys\.version_info\s*>=\s*\(3,\s*{minor}\)
- |
sys\.version_info\s*<=\s*\(3,\s*{minor}\)
- |
sys\.version_info\s*>\s*\(3,\s*{minor_minus_one}\)
- |
sys\.version_info\s*<\s*\(3,\s*{minor_plus_one}\)
- |
sys\.version_info\.minor\s*==\s*{minor}
- |
sys\.version_info\.minor\s*>=\s*{minor}
- |
sys\.version_info\.minor\s*<=\s*{minor}
- |
sys\.version_info\.minor\s*>\s*{minor_minus_one}
- |
sys\.version_info\.minor\s*<\s*{minor_plus_one}

- name: python_env_short
description: Finds short python environment names often used in tox or nox.
applies_to: [python]
examples:
- "py37"
- "py37-cover"
rules:
- |
\bpy3{minor}\b

- name: explicit_python_command
description: Finds explicit python commands with version.
applies_to: [python]
examples:
- "python3.7"
- "python3.7 -m pip"
- "Python3.7"
rules:
- |
python3\.{minor}

- name: combined_version_string
description: Finds combined version strings often used in class or variable names.
applies_to: [python]
examples:
- "Python37"
- "Python37DeprecationWarning"
rules:
- |
Python{major}{minor}


5 changes: 5 additions & 0 deletions scripts/version_scanner/small_package_list.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
packages/google-cloud-access-context-manager
packages/google-cloud-bigtable
packages/google-cloud-biglake-hive
packages/google-cloud-documentai-toolbox
packages/google-cloud-core
1 change: 1 addition & 0 deletions scripts/version_scanner/tests/data/.kokoro/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python3.7
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python_requires = '>=3.7'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
print("Hello")
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import csv
import os
import subprocess
import pytest

def test_integration_scan(tmp_path):
# Paths to real tools
scanner_path = os.path.abspath("version_scanner.py")
config_path = os.path.abspath("regex_config.yaml")

# Static data directory
data_dir = os.path.abspath("tests/data")

# Run the scanner in the tmp_path so the output file is created there
cmd = [
"python3", scanner_path,
"-d", "python",
"-v", "3.7",
"-p", data_dir,
"--config", config_path,
"-o", "scanner_report.csv"
]

result = subprocess.run(cmd, cwd=tmp_path, capture_output=True, text=True, check=True)

report_file = tmp_path / "scanner_report.csv"
assert report_file.exists(), f"Report file not found. Stderr: {result.stderr}"

with open(report_file, 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
rows = list(reader)

# We expect at least some matches when we build the data directory
assert len(rows) > 0
Loading
Loading