Skip to content

Python script to remove direct access to users in all repos of an org, except an allowed list

Notifications You must be signed in to change notification settings

jwiegley/github-cleanup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Repository Collaborator Cleanup

A production-ready Python script for cleaning up direct repository collaborators in a GitHub organization. This tool helps maintain security and access control by removing collaborators who are not on an approved list.

⚠️ Important: Direct Access Only

This tool ONLY modifies users with "direct access" - those who were individually added to repositories as collaborators.

Users who have access through the following are NOT affected:

  • Team membership (e.g., users in teams with repo access)
  • Organization membership (e.g., org-level default permissions)
  • Enterprise access

For example, if a user appears in the repository's "People" section under "Teams" rather than "Direct access", they will NOT be processed by this tool.

Features

  • 🔒 Safe by Default: Dry-run mode prevents accidental deletions
  • 🎯 Flexible Filtering: Target specific repos or filter by visibility (private/public/internal)
  • 👥 Direct Access Only: Only removes direct collaborators, never affects team/org access
  • ⏱️ Rate Limit Control: Configurable delay between API calls (--delay parameter)
  • 🔄 Automatic Retry: Exponential backoff for failed requests
  • 📊 Detailed Reporting: Comprehensive summary and optional JSON output
  • Robust Error Handling: Continues processing even if individual repos fail
  • 🔍 Verbose Logging: Track exactly what's happening at each step

Requirements

  • Python 3.8+
  • GitHub Personal Access Token with admin:org and repo scopes

Installation

  1. Clone the repository:
git clone <repository-url>
cd github-cleanup
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up your GitHub token:
export GITHUB_TOKEN='ghp_your_token_here'

Usage

Basic Usage

Create an allowed users file (one username per line):

cat > allowed_users.txt << EOF
alice
bob
charlie
# Comments are supported
team-bot
EOF

Run a dry-run to see what would be removed:

python github_cleanup.py \
    --org myorg \
    --allowed-file allowed_users.txt

Common Scenarios

Test on Specific Repositories

# Single repository
python github_cleanup.py \
    --org myorg \
    --repos test-repo \
    --allowed-file allowed_users.txt

# Multiple repositories
python github_cleanup.py \
    --org myorg \
    --repos "repo1,repo2,repo3" \
    --allowed-file allowed_users.txt

Filter by Repository Visibility

# Only private repositories
python github_cleanup.py \
    --org myorg \
    --visibility private \
    --allowed-file allowed_users.txt

# Only public repositories
python github_cleanup.py \
    --org myorg \
    --visibility public \
    --allowed-file allowed_users.txt

Actually Apply Changes

After verifying with dry-run, use --apply to remove collaborators:

python github_cleanup.py \
    --org myorg \
    --allowed-file allowed_users.txt \
    --apply

Skip confirmation prompt with --yes:

python github_cleanup.py \
    --org myorg \
    --allowed-file allowed_users.txt \
    --apply \
    --yes

Generate JSON Report

python github_cleanup.py \
    --org myorg \
    --allowed-file allowed_users.txt \
    --output cleanup_report.json

Command-Line Options

Required Arguments:
  --org ORG                     GitHub organization name
  --allowed-file FILE           Path to file with allowed usernames

Optional Arguments:
  --token TOKEN                 GitHub token (default: GITHUB_TOKEN env var)
  --repos REPOS                 Comma-separated list of repo names to process
  --visibility {all,private,public,internal}
                               Filter repos by visibility (default: all)
  --delay SECONDS              Delay between API calls in seconds (default: 0.5)
  --apply                      Actually remove collaborators (default: dry-run)
  --yes                        Skip confirmation prompt
  --output FILE                Save JSON report to file
  -v, --verbose                Enable verbose logging

Allowed Users File Format

The allowed users file supports:

  • One username per line
  • Comments starting with #
  • Inline comments after usernames
  • Blank lines (ignored)
  • Case-insensitive matching

Example:

# Core team
alice
bob
charlie  # team lead

# Contractors
contractor1

# Bots
deploy-bot
ci-bot

Output Examples

Console Output (Dry-Run)

2024-01-15 10:30:00 - INFO - Verifying GitHub authentication...
2024-01-15 10:30:01 - INFO - ✓ Authenticated as: admin-user
2024-01-15 10:30:01 - INFO - Loaded 5 allowed user(s) from allowed_users.txt
2024-01-15 10:30:02 - INFO - Fetching repositories for organization: myorg
2024-01-15 10:30:03 - INFO - Found 3 repositories matching criteria

============================================================
CONFIRMATION REQUIRED
============================================================
Organization: myorg
Repositories to process: 3
Allowed users: 5
Mode: DRY-RUN (no changes)

Repositories:
  - myorg/repo1
  - myorg/repo2
  - myorg/repo3
============================================================

Proceed with operation? [y/N]: y

============================================================
Processing repository: myorg/repo1
============================================================
2024-01-15 10:30:10 - INFO - Found 3 direct collaborator(s):
2024-01-15 10:30:10 - INFO -   ✓ alice - admin [direct] - ALLOWED (keeping)
2024-01-15 10:30:10 - INFO -   ✓ bob - write [direct] - ALLOWED (keeping)
2024-01-15 10:30:10 - INFO -   ✗ old-contractor - read [direct] - NOT IN ALLOWED LIST
2024-01-15 10:30:10 - INFO - [DRY-RUN] Would remove old-contractor from myorg/repo1

============================================================
CLEANUP SUMMARY
============================================================
Organization: myorg
Mode: DRY-RUN

Repositories:
  Total found: 3
  Successfully processed: 3
  Skipped (errors/permissions): 0

Collaborators:
  Total checked: 8
  Preserved (in allowed list): 6
  Would be removed: 2

Removed users by repository:
  repo1:
    - old-contractor
  repo3:
    - external-user

⚠ This was a DRY-RUN. Use --apply to actually remove collaborators.
============================================================

JSON Report

{
  "organization": "myorg",
  "total_repos": 3,
  "repos_processed": 3,
  "repos_skipped": 0,
  "total_collaborators_checked": 8,
  "total_collaborators_removed": 2,
  "total_team_access_preserved": 6,
  "dry_run": true,
  "results": [
    {
      "repo_name": "repo1",
      "success": true,
      "collaborators_checked": 3,
      "collaborators_removed": 1,
      "skipped_team_access": 2,
      "error_message": null,
      "removed_users": ["old-contractor"]
    }
  ]
}

Safety Features

Dry-Run Mode (Default)

The script runs in dry-run mode by default, showing what would be removed without making any changes. You must explicitly use --apply to remove collaborators.

Confirmation Prompt

Before processing repositories, the script displays:

  • Organization name
  • Number of repositories
  • Number of allowed users
  • Mode (dry-run or apply)
  • List of repositories to process

Use --yes to skip this prompt for automated workflows.

Permission Checks

The script automatically skips repositories where the token doesn't have admin access, preventing errors and ensuring you only modify repos you control.

Team-Based Access Preservation

The script only removes direct collaborators. Team-based access is never removed, ensuring organizational structure is maintained.

Error Isolation

If processing one repository fails, the script continues with others and provides a complete report at the end.

Rate Limiting

The script handles GitHub API rate limits automatically:

  • Configurable Delay: Use --delay to set time between API calls (default: 0.5 seconds)
  • Monitors Rate Limits: Tracks remaining requests in real-time
  • Automatic Backoff: Waits when rate limited and retries with exponential backoff
  • Verbose Tracking: Shows delay application in verbose mode (-v)

Authenticated requests have a limit of 5,000 requests per hour.

Delay Examples

# Faster processing (minimal delay)
python github_cleanup.py --org myorg --allowed-file allowed.txt --delay 0.1

# Default (balanced)
python github_cleanup.py --org myorg --allowed-file allowed.txt --delay 0.5

# Conservative (slower but very safe)
python github_cleanup.py --org myorg --allowed-file allowed.txt --delay 1.0

Testing

Run the test suite:

# Run all tests
pytest test_github_cleanup.py -v

# Run with coverage report
pytest test_github_cleanup.py -v --cov=github_cleanup --cov-report=html

# Run specific test class
pytest test_github_cleanup.py::TestGitHubClient -v

Troubleshooting

Authentication Errors

Error: Authentication failed

Solution: Verify your token has the required scopes:

  • admin:org - Required to manage org repositories
  • repo - Required to manage collaborators

Permission Denied

Error: Insufficient permissions (admin access required)

Solution: Ensure your token has admin access to the repositories you're trying to modify.

Rate Limit Exceeded

Error: Rate limit exceeded

Solution: The script automatically waits and retries. If processing many repos, consider:

  • Running during off-peak hours
  • Processing repos in smaller batches with --repos

Repository Not Found

Error: Repository listed but shows as not found

Solution:

  • Verify the organization name is correct
  • Ensure the token has access to the repository
  • Check if the repository exists and isn't archived

Best Practices

  1. Always test first: Run without --apply to verify what will be removed
  2. Start small: Test on a single repository with --repos before processing all repos
  3. Keep allowed list updated: Regularly review and update your allowed users file
  4. Save reports: Use --output to maintain audit logs of cleanup operations
  5. Use specific filters: Target private repos first with --visibility private
  6. Review team access: Ensure users should be direct collaborators vs team members

Security Considerations

  • Token Security: Never commit tokens to version control. Use environment variables or secrets management.
  • Audit Trail: Save JSON reports for compliance and auditing purposes.
  • Least Privilege: Grant only necessary permissions to the token.
  • Regular Reviews: Run cleanup operations regularly to maintain security posture.

Contributing

Contributions are welcome! Please ensure:

  • All tests pass: pytest test_github_cleanup.py -v
  • Code follows PEP 8 style guidelines
  • Type hints are included
  • Docstrings are comprehensive
  • New features include tests

License

[Your License Here]

Support

For issues and questions:

Changelog

Version 1.0.0 (2024-01-15)

  • Initial release
  • Support for organization-wide cleanup
  • Dry-run mode with safety features
  • Comprehensive error handling and reporting
  • JSON output support
  • Rate limiting and retry logic

About

Python script to remove direct access to users in all repos of an org, except an allowed list

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published