-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor the Document.findAll() function #3537
refactor the Document.findAll() function #3537
Conversation
One way of testing is to trigger the const { translationsOf } = require("./content/translations");
translationsOf({ slug: "whatever", locale: "whatever" }); If I run this over and over I get the following outputs (thanks to this):
And if I run the same exact thing on
|
Now, the question is, does it matter? The only times we run the whole entire Also, this slower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love those numbers Peter. Here are some musing from reading it
content/document.test.js
Outdated
|
||
const Document = require("./document"); | ||
|
||
describe("test Document.findAll()", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe("test Document.findAll()", () => { | |
describe("Document.findAll()", () => { |
describe("test Document.findAll()", () => { | ||
it("should always return files that exist", () => { | ||
const filePaths = [...Document.findAll().iter({ pathOnly: true })]; | ||
expect(filePaths.every((value) => fs.existsSync(value))).toBeTruthy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just testing whether fdir works? We are never adding to what it finds, so this should be given. But I'm cool with testing it nonetheless, 3 lines for making sure the lib is working as expected is still a good use of space in my book
content/document.js
Outdated
.filter((filePath) => { | ||
// Note! Due to a bug in fdir, any accidental errors throw here are | ||
// swallowed! | ||
// See https://github.com/thecodrr/fdir/issues/56 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we got a reply over in that issue. If withError is not what we want, we might want to nest all of this in our own try/catch so that we can at least get some logs for it.
@@ -197,32 +198,70 @@ function unarchive(document, move) { | |||
return created; | |||
} | |||
|
|||
const read = memoize((folder) => { | |||
const read = memoize((folderOrFilePath) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this PR should not be limited to changing findAll()
? From what I understand this changes read()
's API, don't we have callers that are passing folders in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I was only looking at the if-branch. Still wondering though whether we can reap the findAll-speed improvements without changing fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right!
But... may I ask to make an exception to that general rule here? Pleeeease.
The argument is that the two most important uses of this code is when we do prod builds which is a big yarn build
, and when we do PR CI which builds the specified files (based on the git diff).
In both those cases it's actually smart to pass in a full absolute file path to read()
because you don't need to loop through the roots to figure out where it came from.
But yes, what could happen is that the yarn build
causes a call of read('/home/path/to/absolute/content/files/en-us/foo/index.html')
first and then later some KS sidebar or some other code internally triggers a call to read('/en-us/foo')
which is the same result but a different memoization key. But I would argue that the cache is not important enough to protect against that. The read()
is so fast and because of the LRU (max 2,000 entries) you'll end up calling the read()
with the same key many times anyway and not benefitting from the memoization.
So I violated a bit by changing the findAll()
and the read()
and that's not great. (But hopefully my explanation and justification can make up for that).
However, perhaps the correct API is that read()
should always and only expect a filePath
as the one and only argument. And if you only have a folder (e.g. fr/web/html/element/video
) you should be using some other readByFolder()
wrapper that can turn fr/web/html/element/video
into /full/path/to/translated-content/files/fr/web/html/element/video/index.html
before it goes into read()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't even consider the bits about the cache. Maybe we should then as part of this factor out the finding of the correct path so that the cache still gets hit? Or do you prefer getting this in first? It seems worthwhile to me to reduce the responsibilities of read
a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented on the last outstanding bit, it's your call. Choose your own adventure ⚔️
@@ -197,32 +198,70 @@ function unarchive(document, move) { | |||
return created; | |||
} | |||
|
|||
const read = memoize((folder) => { | |||
const read = memoize((folderOrFilePath) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't even consider the bits about the cache. Maybe we should then as part of this factor out the finding of the correct path so that the cache still gets hit? Or do you prefer getting this in first? It seems worthwhile to me to reduce the responsibilities of read
a little
@Gregoor I researched it. A lot.
Then I switch to this branch in this PR and measure again. Now, because the
In other words, the |
I still think we should split it up so that |
* refactor the Document.findAll() function * more refactoring * cleaning up * adding a very simply unit test * adding .withErrors() to avoid error swallowing * feedbacked
Here's how I tested the performance gain:
Just before deleting/replacing the old
findAll
function, I renamed it tofindAll0
just so I can make this benchmark.Then I run this: https://gist.github.com/peterbe/318bab87834580c384b763da08919d25
First with and then without
CONTENT_TRANSLATED_ROOT
set.WITH:
WITHOUT:
(note that the benchmark makes sure that the number of found files is equal every time)
So basically, the new function is 10x faster.