Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use git log know a documents last modified date #1573

Merged
merged 12 commits into from
Dec 2, 2020

Conversation

peterbe
Copy link
Contributor

@peterbe peterbe commented Oct 29, 2020

Part of #706

A lot going on here and I don't feel like it's complete yet. But I wanted to reach out for some early feedback.
But it works!
Screen Shot 2020-10-29 at 5 41 01 PM

The way it works is that you run yarn tool gather-git-history and it puts in a file called _githistory.json in /path/to/mdn/content/files/en-us/. Once that's in place when it builds documents it's able to extract the data for each document.
So, the building of documents will at least work.

The idea is that, in the Production Build workflow, we first run that command so that the content has these files. Then it can build.
But it will leave a file inside the mdn/content/files/*/_githistory.json which needs to be added to .gitignore in mdn/content. In the Production Build workflow, that doesn't matter.
Also, I think it would make sense that we modify the yarn start script in mdn/content so that it runs this command first.

But it's still a bit weird and I wish it wasn't so. Perhaps there's a better place to put it. Really open to ideas, but I felt I was getting stuck for better ideas so I opted for some code that we can change rather than no code and more thinking.

Lastly, I hacked the workflow to use infinite depth when it does the git clone. I think that's ok because it's we're only ever downloading one branch when we git clone. And who cares if it takes a big fat 30 seconds?

But I made the CLI command so that you can provide a file to read from and it dumps it all back. What we could do in GitHub Actions now is to use actions/cache to save this file. And it could be reused. But I'm still not sure if it's needed. Perhaps this prototype is over-engineered already.

build/git-history.js Outdated Show resolved Hide resolved
@peterbe
Copy link
Contributor Author

peterbe commented Nov 10, 2020

I have no idea what's causing these Buffer objects to be printed:

▶ yarn test:testing destructive
yarn run v1.22.4
$ cd testing && jest destructive
<Buffer 66 61 74 61 6c 3a 20 6e 6f 74 20 61 20 67 69 74 20 72 65 70 6f 73 69 74 6f 72 79 20 28 6f 72 20 61 6e 79 20 6f 66 20 74 68 65 20 70 61 72 65 6e 74 20 ... 19 more bytes>
<Buffer 66 61 74 61 6c 3a 20 6e 6f 74 20 61 20 67 69 74 20 72 65 70 6f 73 69 74 6f 72 79 20 28 6f 72 20 61 6e 79 20 6f 66 20 74 68 65 20 70 61 72 65 6e 74 20 ... 19 more bytes>
 PASS  tests/destructive.test.js
  fixing flaws
    ✓ can be run in dry-run mode (1635ms)
    ✓ can actually change the files (1633ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        3.932s, estimated 4s
Ran all test suites matching /destructive/i.
✨  Done in 4.87s.

It doesn't seem to be happening on master.

@peterbe peterbe marked this pull request as ready for review November 10, 2020 01:52
@peterbe
Copy link
Contributor Author

peterbe commented Nov 10, 2020

The only thing that isn't great here is that it seems to be printing some Buffer things when you run the destructive ideas.
If you have ideas to solve that @fiji-flo , mucho appreciated.

@@ -84,10 +84,19 @@ function execGit(args, opts = {}, root = null) {
args,
{
cwd: gitRoot,
// Default is 1MB
// That's rarely big enough for what we're using Yari for.
maxBuffer: 1024 * 1024 * 100, // 100MB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth to pull the 100 to the front, since that eases reading.

@peterbe
Copy link
Contributor Author

peterbe commented Nov 11, 2020

@fiji-flo r? ping

Copy link
Contributor

@fiji-flo fiji-flo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good!

@peterbe
Copy link
Contributor Author

peterbe commented Dec 2, 2020

@fiji-flo Everything's working except the most important thing. The git log yields the wrong dates.
It seems to say that files/en-us/mdn/kitchensink/index.html last changed on Nov 20. That's what you get if you parse the git log ... command we use.
But as you can clearly see here:
Screen Shot 2020-12-02 at 10 53 36 AM
It was Nov 11. Not November 20.
And the commit sha is c22b4497.

If I run git log --name-only --no-decorate --format="COMMIT:%cI %h" --date-order --reverse -- files/en-us/mdn/kitchensink/index.html I get this output:

▶ git log --name-only --no-decorate --format="COMMIT:%cI %h" --date-order --reverse -- files/en-us/mdn/kitchensink/index.html
COMMIT:2020-09-17T15:33:46-04:00 3ebb1553

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-10-09T12:39:46-04:00 9f4723c2

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-10-27T21:07:25-04:00 c2db0049

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-10-28T08:48:25-04:00 4b6eb65b

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-11-03T13:57:41-05:00 9377a846

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-11-10T06:26:49-05:00 c3ada83a

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-11-10T14:46:11-05:00 f4c85bc2

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-11-10T17:33:05-05:00 6ae75146

files/en-us/mdn/kitchensink/index.html
COMMIT:2020-11-11T17:34:43+02:00 4b5a583a
COMMIT:2020-11-11T10:46:57-05:00 c22b4497

files/en-us/mdn/kitchensink/index.html

This time it gets it right!
(Note the -- files/en-us/mdn/kitchensink/index.html filtering I send to git log)

If I remove that filtering, files/en-us/mdn/kitchensink/index.html appears several more times.
E.g. mdn/content@deb5f1bc
Now let's look into that. That commit was part of mdn/content#122
To work on that pull request, what I did was that I made some changes to the file during the work on the pull request just to wake up the CI that depends on get-diff-action. But when I finalized the PR, I undid that change and when I merged the PR I used "Squash and merge" because, as you can see, in the final commit of the PR there's no change to files/en-us/mdn/kitchensink/index.html.

So I guess we need to figure out some way to run git log ... for all files without including any commits that never make it onto the main branch when squashed.

@peterbe
Copy link
Contributor Author

peterbe commented Dec 2, 2020

Using --first-parent solves it. But I need to try to understand it better.

@peterbe
Copy link
Contributor Author

peterbe commented Dec 2, 2020

@fiji-flo and I discussed a gnarly corner case that can happen.
If you make a PR, and commit file into that PR, but then "uncommit" the file so it's not part of the final squashed and merged PR commit, then it might actually count as a commit when you run git log ...
But if you use git log ... -- path/to/that/file it will NOT yield the same log.

The situation is extremely rare and probably not all that important. We can maybe figure it out some other day.

@peterbe peterbe merged commit 6c96d7f into master Dec 2, 2020
@peterbe peterbe deleted the 706-use-git-log-know-a-documents-last-modified-date branch December 2, 2020 17:53
@peterbe
Copy link
Contributor Author

peterbe commented Dec 2, 2020

For the record, here's the simplest way to demonstrate that above-mentioned problem:

git log main --stat | rg kitchensink -C 2
git log main --stat files/en-us/mdn/kitchensink/index.html

you'll potentially see a different set of commits. The latter one gives the right commit if you compare it with the GitHub file History UI.

schalkneethling pushed a commit that referenced this pull request Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants