Skip to content

fix: fetch full history for submodules to fix incorrect last-modified dates#783

Closed
Aayushman-nvm wants to merge 7 commits intoprecice:masterfrom
Aayushman-nvm:issue404
Closed

fix: fetch full history for submodules to fix incorrect last-modified dates#783
Aayushman-nvm wants to merge 7 commits intoprecice:masterfrom
Aayushman-nvm:issue404

Conversation

@Aayushman-nvm
Copy link

Pages from submodules were showing incorrect last-modified dates because actions/checkout does a shallow clone of submodules even when fetch-depth: 0 is set. This caused all submodule pages to show the same wrong date.

Fix: added a step to unshallow all submodules after checkout so Jekyll can
correctly determine the last commit date for each file.

Closes: Issue #404

Screenshots:
Before -
image

After -
image

@Aayushman-nvm
Copy link
Author

Still working on it

@MakisH MakisH added GSoC Contributed in the context of the Google Summer of Code technical Technical issues on the website labels Feb 23, 2026
@Aayushman-nvm
Copy link
Author

@MakisH For this issue i might've to modify vendor files... hook.rb to be exact
path: /vendor/bundle/ruby/3.3.0/gems/jekyll-last-modified-at-1.3.2/lib/jekyll-last-modified-at/hook.rb

is that fine?

@MakisH
Copy link
Member

MakisH commented Feb 23, 2026

Thanks for the contribution! That's indeed an annoying bug...

@MakisH For this issue i might've to modify vendor files... hook.rb to be exact path: /vendor/bundle/ruby/3.3.0/gems/jekyll-last-modified-at-1.3.2/lib/jekyll-last-modified-at/hook.rb

is that fine?

Doesn't sound right: /vendor/ is auto-generated and not even in this repository. Modifying it could cause all kinds of conflicts.

@Aayushman-nvm
Copy link
Author

Aayushman-nvm commented Feb 23, 2026

@MakisH haha true but i think u might wanna check this screenshot out:

deployed site:
image

My local site:
image

I made a slight change... just a single line change and now the dates are displayed as 14th jan... if u could clarify if 14th jan is ri8 then i think i might've fixed the issue

@Aayushman-nvm
Copy link
Author

for a context... I wrote a script (inject_dates.sh) that goes into each submodule, gets the real last-commit date for every markdown file using git, and injects it as last_modified_at into the frontmatter before Jekyll builds.

then changed = to ||= in hook.rb so the plugin respects existing frontmatter values instead of always overwriting with its own (broken for submodules) git detection.

added two new steps in deploy.yml... one of them is not yet commited

@Aayushman-nvm
Copy link
Author

@MakisH here's what ive done so far:

inject_dates.sh, a new file im adding to the repo root. This is a shell script that before Jekyll builds, goes into each submodule, finds every markdown file, gets its real last git commit date, and writes it into the file's frontmatter as last_modified_at.

.github/workflows/deploy.yml, in this file ive added 3 new steps: unshallow submodules so full git history is available, run inject_dates.sh, and patch the plugin's hook.rb to respect frontmatter instead of overwriting it.

The reason I patched patch = to ||= in hook.rb is that the plugin runs git log from the top level git directory, which has no history for files inside submodules so it falls back to filesystem mtime (always today's date). By changing to ||=, the plugin still works as before for all regular pages, but respects the last_modified_at we inject via inject_dates.sh for submodule files where the plugin's git detection fails. Pages that don't have last_modified_at in their frontmatter are completely unaffected.

@Aayushman-nvm
Copy link
Author

In case you’re going to test and review this branch locally, you need to run the following scripts in sequence:

  1. Run the date injection script:
bash inject_dates.sh
  1. Patch the jekyll-last-modified-at hook:
 sed -i 's/item\.data\["last_modified_at"\] = Determinator/item.data["last_modified_at"] ||= Determinator/' \
  vendor/bundle/ruby/*/gems/jekyll-last-modified-at-*/lib/jekyll-last-modified-at/hook.rb
  1. Start the Jekyll server:
bundle exec jekyll serve -l

We don’t need to worry about running these commands manually once the code is merged, they’re already configured in the deploy.yml files.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect last-modified dates displayed on website pages sourced from Git submodules. The issue stems from GitHub Actions' actions/checkout performing shallow clones of submodules even when fetch-depth: 0 is specified, resulting in all submodule pages showing the same incorrect date. The fix involves three coordinated changes: unshallowing all submodules after checkout, extracting correct last-commit dates from full Git history and injecting them into YAML frontmatter, and patching the jekyll-last-modified-at gem to respect these frontmatter values instead of computing them from the (incomplete) shallow Git history.

Changes:

  • Added a new shell script (inject_dates.sh) that traverses all submodules, extracts the true last-commit date for each markdown file from Git history, and injects or updates the last_modified_at field in YAML frontmatter
  • Modified the deployment workflow to unshallow submodules, run the date injection script, and patch the jekyll-last-modified-at Ruby gem to respect frontmatter dates
  • Implemented a workaround for the actions/checkout shallow clone limitation by combining Git history extraction with runtime gem patching

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
inject_dates.sh New bash script that extracts last-commit dates from Git history and injects them into YAML frontmatter of markdown files in all submodules
.github/workflows/deploy.yml Adds workflow steps to unshallow submodules, run the date injection script, and patch the jekyll-last-modified-at gem to use frontmatter dates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fsimonis
Copy link
Member

Hi, I have some questions:

  1. Why do you unshallow each submodule instead of replacing the checkout action with a standard deep git clone? At this point we just need a simple clone and disable all the features that the action does.
  2. Why doesn't jekyll-last-modified-at detect the correct times now? The information is in git.
  3. What is the point of patching files and jekyll-dependencies for a feature that will be removed as part of the GSoC project?

@Aayushman-nvm
Copy link
Author

Hi @fsimonis :
Answering ur questions -

1:- The workflow already used actions/checkout consistently for all three checkouts. Replacing just the website checkout with a plain git clone while keeping the action for the other two would be inconsistent. As a contributor, I wanted minimal changes to the existing working CI rather than restructuring it. The unshallow step was the least invasive addition to make submodule history available. I'm happy to replace it with a plain deep clone if you prefer.

2:- Even though the history is in git, the plugin architecturally can't reach it. Looking at determinator.rb, it runs git log with --git-dir set to git.top_level_directory the parent repo's .git directory (This repo). From the parent repo's perspective, a submodule is just a commit pointer, not a folder with individual file histories. So when the plugin computes relative_path_from_git_dir for say imported/tutorials/quickstart/README.md, it looks for that path in the parent repo's git history... finds nothing... and falls back to mtime which is always today in CI. The information is in git but inside each submodule's own .git, which the plugin never looks into.

3:- My thought was the migration to Hugo will probably take some time and the Jekyll site will remain in production the entire time. Hugo will be developed in a separate branch and only replace Jekyll once it's a complete and tested replica, after which Bootstrap and UI work still follows. Until that point all pages will show wrong dates and users will see that... and for a documentation site that's not great. The project will start in end of May with main coding work beginning in May / June, so users would be seeing wrong dates for several months while the Jekyll site is still live.
I also looked into whether this approach carries over to Hugo and it doesn't. Hugo with enableGitInfo = true handles submodule dates natively without any of these workarounds, so this fix is intentionally Jekyll-only and throwaway, meant only to cover the gap between now and when the Hugo migration goes live in production... keeping what users see on the site in mind. That said, the call is completely yours.

Copy link
Member

@MakisH MakisH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow-up. Main point from my side would be that the respective hacks would need some comments, so that they make more sense in the future.

Given that we intend to modify this infrastructure a lot in a few months, I do not mind getting the hacks merged. The jekyll-last-modified-at project is anyway officially abandoned and the wrong times issue seems to be known (if not a different one).

It should be clear, though, that most/all of this code would soon be removed again. But since the work has already been invested and I do not see any harm, I would be fine getting a temporary fix merged.

By the way, did you consider directly contributing to actions/checkout#1124 ? That would have an impact to more projects.

Comment on lines +83 to +94
- name: Patch jekyll-last-modified-at to respect frontmatter
working-directory: website
run: |
set -euo pipefail
shopt -s nullglob
hook_files=(vendor/bundle/ruby/*/gems/jekyll-last-modified-at-*/lib/jekyll-last-modified-at/hook.rb)
if [ "${#hook_files[@]}" -ne 1 ]; then
echo "Expected exactly one jekyll-last-modified-at hook.rb file, found ${#hook_files[@]}:" >&2
printf ' %s\n' "${hook_files[@]}" >&2
exit 1
fi
sed -i 's/item\.data\["last_modified_at"\] = Determinator/item.data["last_modified_at"] ||= Determinator/' "${hook_files[0]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this patch something you found in some documentation/issue, or did some tool suggest it? Does a newer version or other GitHub Action fix the issue?

I see that the respective project has been archived, so contributing to upstream would not work. In that case, patching is fine, but it should be explained in the comment.

Some comments above the step would help understand what is going on and why this is needed.

Comment on lines +38 to +41
- name: Fetch full history for submodules
working-directory: website
run: git submodule foreach --recursive "git fetch --unshallow --all 2>/dev/null || git fetch --all"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this trick, as long as there is a comment above it explaining the intention.

@MakisH MakisH self-assigned this Mar 13, 2026
@MakisH
Copy link
Member

MakisH commented Mar 13, 2026

@Aayushman-nvm
Copy link
Author

@MakisH my bad I never got any notification of this review/comment, ill get working on this right away

@precice-bot
Copy link
Contributor

This pull request has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/introduction-website-modernization/2852/1

printf ' %s\n' "${hook_files[@]}" >&2
exit 1
fi
sed -i 's/item\.data\["last_modified_at"\] = Determinator/item.data["last_modified_at"] ||= Determinator/' "${hook_files[0]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aayushman-nvm Instead of patching this here, could you open a PR in the repo of the jekyll-last-modified-at gem and then use the Gemfile of this repo to point to this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsimonis the jekyll-last-modified-at plugin seems to have been abandoned, see #783 (review)

inject_dates.sh Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aayushman-nvm as you are essentially overwriting last_modified_at here, this leads me to an idea.

Could you instead use a minimal jekyll plugin to apply this to imported files? This should be pretty straight forward, as all files in imported/ need to be patched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsimonis i think you should see #856 i have implemented the same solution you are recommending

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aayushman-nvm as you are essentially overwriting last_modified_at here, this leads me to an idea.

Could you instead use a minimal jekyll plugin to apply this to imported files? This should be pretty straight forward, as all files in imported/ need to be patched.

Yep for sure... I'm on it

@Aayushman-nvm
Copy link
Author

@fsimonis Ive addressed the reviews:

Replaced the previous solution (shell script + sed patch on the gem) with a native Jekyll generator plugin _plugins/imported_dates.rb

  • Scoped strictly to imported/
  • Batches git calls per submodule root instead of spawning one process per file, significantly faster on large submodules
  • Uses Open3.capture3 with argv array, no shell injection risk from filenames
  • Computes file paths relative to each submodule's git root, correctly handles files in subdirectories, not just submodule root-level files
  • Uses stable Jekyll API (site.pages + site.documents) throughout
  • Respects explicitly set last_modified_at frontmatter, never overwrites it
  • Skips files with no git history with a named warning instead of silently falling back to unreliable filesystem mtime
  • Removes the dependency on jekyll-last-modified-at gem entirely, no gem patching, no sed, no shell scripts

@Aayushman-nvm
Copy link
Author

As for testing:

Verified submodule has full git history:

git -C imported/tutorials log -1 --format="%ci" -- volume-coupled-flow/README.md
# 2026-02-03 12:37:38 +0100

Confirmed plugin produces matching date with no warnings:

bundle exec jekyll build 2>&1 | grep -i "ImportedLastModifiedAt"
# (no output, clean build)

Confirmed frontmatter override is respected added last_modified_at: 2026-03-17 00:00:00 to volume-coupled-flow/README.md, rebuilt, audit showed 2026-03-17 despite git having 2026-02-03. Removed after confirming.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

gem 'jekyll-sitemap'
gem 'jekyll-redirect-from'
gem "jekyll-last-modified-at"
gem 'jekyll-mermaid'
Comment on lines +66 to +70
cmd = ['git', '-C', git_root, 'log', '--format=%ct', '--name-only', '--diff-filter=AMRC', '--'] + rel_paths
# Use Open3 instead of backticks to avoid shell injection from filenames
stdout, _stderr, status = Open3.capture3(*cmd)
return result unless status.success?

current_ts = nil
stdout.each_line do |line|
line.chomp!
if line =~ /\A\d{10,}\z/
Comment on lines +38 to +41
# actions/checkout shallow-clones submodules regardless of fetch-depth: 0. This unshallows each submodule so _plugins/imported_dates.rb can resolve the correct last commit date per file during jekyll build.
- name: Fetch full history for submodules
working-directory: website
run: git submodule foreach --recursive "git fetch --unshallow --all 2>/dev/null || git fetch --all"
@MakisH
Copy link
Member

MakisH commented Mar 17, 2026

@Aayushman-nvm as discussed in #856, I appreciate the effort, but this solution still looks rather confusing to me.

Since you asked about Copilot in #856 (comment), I requested another Copilot review. I don't know what more it can do, but I have looked at both PRs and tried to understand them. Both seem to work, but I found the solution in #856 easier to understand.

I would say let's not invest more effort on this one. A few learning outcomes have been achieved anyway, despite merged or not.

@MakisH MakisH closed this Mar 17, 2026
@Aayushman-nvm
Copy link
Author

@MakisH i understand, its all good 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GSoC Contributed in the context of the Google Summer of Code technical Technical issues on the website

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants