Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions .github/BUILD_RESOURCES_BRANCH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Build Resources Branch Documentation

## Overview

The `build-resources` branch is an automated branch that stores committed versions of generated markdown files. This allows reviewers to see the current state of generated content without needing to run data processing scripts locally.

## What Files Are Stored

The following generated files are automatically committed to the `build-resources` branch:

1. **Curated Resources** (`content/curated_resources/*.md`)
- Individual resource markdown files generated from Google Sheets
- Generated by: `content/resources/resource.py`
- **Updated:** Daily (automatically)

2. **Tenzing Contributors** (`content/contributors/tenzing.md`)
- Contributor list with roles and contributions
- Generated by: `scripts/forrt_contribs/tenzing.py`
- **Updated:** Daily (automatically)

3. **Glossary Files** (`content/glossary/*/`)
- Glossary terms in multiple languages (English, German, Arabic, etc.)
- Generated by: `content/glossary/_create_glossaries.py`
- **Updated:** Only on manual trigger (see below)

## How It Works

### Automatic Updates

1. The `data-processing.yml` workflow runs daily at midnight (or manually)
2. **Daily runs** generate:
- Curated resources
- Tenzing contributors list
- ⚠️ **Glossary files are NOT regenerated** (sources are less stable)
3. **Manual runs with glossary regeneration**:
- Use workflow_dispatch with `regenerate_glossary: true` to update glossary
- Only use this when glossary sources are confirmed stable
4. After generating files, the workflow:
- Stores generated files temporarily
- Switches to the `build-resources` branch
- Copies the newly generated files
- Commits and pushes changes if there are updates
- Switches back to the original branch

5. The generated files are also uploaded as artifacts for the build process

### Dual Approach Benefits

This implementation uses a **dual approach**:

- **Artifacts** (primary): Fast, efficient for builds, automatically cleaned up
- **Branch** (secondary): Reviewable, trackable, provides version history

### Workflow Integration

```yaml
# In data-processing.yml (Daily)
- Generate files (tenzing, resources)
- Upload as artifacts (for builds)
- Commit to build-resources branch (for review)
- Trigger deployment

# Manual trigger with glossary regeneration
- workflow_dispatch with regenerate_glossary: true
- Generate files (tenzing, resources, glossary)
- Upload as artifacts (for builds)
- Commit to build-resources branch (for review)
- Trigger deployment
```

## Triggering Glossary Regeneration

Glossary files are **not** regenerated on daily runs because the sources are less stable. To regenerate glossary files:

1. Go to the Actions tab in GitHub
2. Select "Data Processing" workflow
3. Click "Run workflow"
4. Check the box for "Regenerate glossary files"
5. Click "Run workflow"

⚠️ **Important**: Only trigger glossary regeneration when the glossary sources (Google Docs) are confirmed stable and ready for update.

## Reviewing Generated Content

To review the latest generated content:

1. Go to the repository on GitHub
2. Switch to the `build-resources` branch
3. Navigate to:
- `content/curated_resources/` for resources
- `content/contributors/tenzing.md` for contributors
- `content/glossary/english/` (or other languages) for glossary

## Build Process

The build process remains unchanged:

1. **Primary method**: Download from artifacts (fast)
2. **Fallback method**: Generate files if artifacts are unavailable

The `build-resources` branch does NOT impact the build process - it's purely for review purposes.

## Important Notes

- The `build-resources` branch is **automatically managed** - do not manually edit
- Files in this branch are **regenerated** from source data daily
- Any manual changes will be **overwritten** on the next data processing run
- To modify content, update the source data (Google Sheets) or generation scripts

## Version History

You can track changes to generated content by viewing the commit history of the `build-resources` branch:

```bash
git checkout build-resources
git log --oneline -- content/curated_resources/
git log --oneline -- content/contributors/tenzing.md
git log --oneline -- content/glossary/
```

## Troubleshooting

### Branch Not Found

If the `build-resources` branch doesn't exist yet, it will be automatically created on the next data processing run.

### Changes Not Appearing

If you don't see expected changes:

1. Check the latest data-processing workflow run
2. Verify the generation scripts ran successfully
3. Check if there were actual changes in the source data

### Merge Conflicts

This branch should never have merge conflicts since:
- It's not meant to be merged back to master
- All changes are automated
- Manual edits are discouraged

## Related Workflows

- `data-processing.yml`: Generates and commits files to this branch
- `deploy.yaml`: Uses artifacts (not this branch) for builds
- `staging-aggregate.yaml`: Uses artifacts (not this branch) for staging builds
110 changes: 110 additions & 0 deletions .github/workflows/data-processing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ on:
schedule:
- cron: '0 0 * * *' # Daily at Midnight
workflow_dispatch:
inputs:
regenerate_glossary:
description: 'Regenerate glossary files (only use when glossary sources are stable)'
required: false
type: boolean
default: false

jobs:
process-data:
Expand Down Expand Up @@ -165,6 +171,15 @@ jobs:
fi
done

#========================================
# Process and generate glossary files
#========================================
- name: Run Glossary Generation script
if: github.event.inputs.regenerate_glossary == 'true'
continue-on-error: true # Continue even if this step fails
run: python3 content/glossary/_create_glossaries.py
# Execute the glossary script that generates glossary markdown files

#====================
# Google Analytics Data
#====================
Expand Down Expand Up @@ -275,11 +290,106 @@ jobs:
path: |
content/contributors/tenzing.md
content/curated_resources/
content/glossary/
data/
content/contributor-analysis/
content/publications/citation_chart.webp
retention-days: 1

#========================================
# Commit generated files to build-resources branch
#========================================
- name: Commit to build-resources branch
if: github.event_name != 'pull_request'
continue-on-error: true
run: |
echo "📝 Committing generated files to build-resources branch..."

# Store current branch name
ORIGINAL_BRANCH=$(git branch --show-current)
echo "Original branch: $ORIGINAL_BRANCH"

# Store generated files in temp location
mkdir -p /tmp/generated-resources
cp -r content/curated_resources /tmp/generated-resources/
cp content/contributors/tenzing.md /tmp/generated-resources/

# Only copy glossary if it was regenerated
if [ "${{ github.event.inputs.regenerate_glossary }}" = "true" ]; then
echo "✓ Glossary regeneration enabled, including glossary files"
cp -r content/glossary /tmp/generated-resources/
else
echo "ℹ️ Glossary regeneration skipped (use workflow_dispatch with regenerate_glossary=true to update)"
fi

# Fetch build-resources branch (create if doesn't exist)
git fetch origin build-resources || echo "build-resources branch doesn't exist yet"

if git rev-parse --verify origin/build-resources >/dev/null 2>&1; then
echo "✓ build-resources branch exists, checking it out"
git checkout build-resources
git pull origin build-resources
else
echo "✓ Creating new build-resources branch from current branch"
git checkout -b build-resources
fi

# Remove old generated resource files (but keep _index.md)
find content/curated_resources -type f ! -name '_index.md' -delete 2>/dev/null || true

# Copy newly generated files
cp -r /tmp/generated-resources/curated_resources/* content/curated_resources/
cp /tmp/generated-resources/tenzing.md content/contributors/

# Copy glossary files only if regenerated (preserving directory structure)
if [ "${{ github.event.inputs.regenerate_glossary }}" = "true" ]; then
echo "✓ Updating glossary files in build-resources"
# Remove old glossary files (but keep _index.md files)
find content/glossary -type f ! -name '_index.md' ! -name '_create_glossaries.py' -delete 2>/dev/null || true
rsync -av --exclude='_index.md' --exclude='_create_glossaries.py' /tmp/generated-resources/glossary/ content/glossary/
fi

# Check if there are any changes to commit
if git diff --quiet && git diff --cached --quiet; then
echo "ℹ️ No changes to commit"
else
echo "✓ Changes detected, committing..."

# Add files based on what was regenerated
git add content/curated_resources/ content/contributors/tenzing.md
if [ "${{ github.event.inputs.regenerate_glossary }}" = "true" ]; then
git add content/glossary/
git commit -m "Update generated resources and glossary - $(date -u +'%Y-%m-%d %H:%M:%S UTC')" || echo "Nothing to commit"
else
git commit -m "Update generated resources - $(date -u +'%Y-%m-%d %H:%M:%S UTC')" || echo "Nothing to commit"
fi

# Push to build-resources branch with retry logic
MAX_RETRIES=3
RETRY_COUNT=0
while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
if git push origin build-resources --force-with-lease; then
echo "✅ Successfully pushed to build-resources branch"
break
else
RETRY_COUNT=$((RETRY_COUNT + 1))
if [ $RETRY_COUNT -lt $MAX_RETRIES ]; then
echo "⚠️ Push failed, retrying ($RETRY_COUNT/$MAX_RETRIES)..."
sleep 2
git pull origin build-resources --rebase
else
echo "❌ Push failed after $MAX_RETRIES attempts"
exit 1
fi
fi
done
fi

# Switch back to original branch
git checkout "$ORIGINAL_BRANCH"
env:
GITHUB_TOKEN: ${{ secrets.FORRT_PAT }}

#====================
# Trigger Deployment
#====================
Expand Down