Skip to content

Commit

Permalink
feat: Added 'cache' parameter; an opt-in solution to performance regr…
Browse files Browse the repository at this point in the history
…essions caused by #1217 (#1255)

Closing a memory leak in #1217 by clearing the packfile cache at the end of each git operation has resulted in extraordinarily bad performance for some users trying to upgrade to 1.7.5 or higher. If you've been affected, take a look at https://isomorphic-git.org/docs/en/cache to learn how to opt-in to manual cache management to claw back the performance.
  • Loading branch information
billiegoose committed Nov 8, 2020
1 parent f2e3805 commit 26f761e
Show file tree
Hide file tree
Showing 71 changed files with 413 additions and 49 deletions.
123 changes: 123 additions & 0 deletions docs/cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
id: cache
title: The `cache` parameter
sidebar_label: cache
---

TL;DR see [Example](#example).

## Background

Some git commands can greatly benefit from a cache.
Reading and parsing git packfiles (the files sent over the wire during `clone`, `fetch`, `pull` and `push`) can take a "long" time for large git repositories.
(Here "long" is usually measured in milliseconds.)

For example, here is one of the absolute worst performing things you can do:

```js
// PLEASE DON'T DO THIS!! This is for demonstration purposes only.
const test = async () => {
console.time('time elapsed')
for (const filepath of await git.listFiles({ fs, dir })) {
console.log(`${filepath}: ${await git.status({ fs, dir, filepath })}`)
}
console.timeEnd('time elapsed')
}

test().catch(err => console.log(err))
```

Running this code on the `isomorphic-git` repo on my 2018 Macbook Pro takes over 2 minutes!

It is slow because every time you call `git.status` it has to re-read and re-parse one or more packfiles in `.git/objecs/pack`.
Each individual status may take relatively little time (10ms to 100ms) but if you have thousands of files that quickly adds up.

Naively doing it in parallel will not help!

```js
// PLEASE DON'T DO THIS!! This is for demonstration purposes only.
const test = async () => {
console.time(`time elapsed`)
const filepaths = await git.listFiles({ fs, dir })
await Promise.all(
filepaths.map(async filepath => {
console.log(`${filepath}: ${await git.status({ fs, dir, filepath })}`)
})
)
console.timeEnd(`time elapsed`)
}

test().catch(err => console.log(err))
```

This performs even worse than the first code snippet because now instead of reading and parsing the packfiles thousands of times in a row, you are doing the same workload in parallel!
It quickly consumed all 32 GB of memory on my Macbook and I had to kill it after 4 minutes.

You can write an extremely performant version of the above though using [`walk`](./walk.html).
That's what [`statusMatrix`](./statusMatrix.html) is.

```js
const test = async () => {
console.time(`time elapsed`)
const matrix = await git.statusMatrix({ fs, dir })
for (const [filepath, head, workdir, stage] of matrix) {
console.log(`${filepath}: ${head} ${workdir} ${stage}`)
}
console.timeEnd(`time elapsed`)
}

test().catch(err => console.log(err))
```

This runs in 843ms on my machine.

## The `cache` parameter

As you can see, you can easily write yourself into a performance trap using `isomorphic-git` commands in isolation.

Unlike canonical `git` commands however, there is a way for `isomorphic-git` commands to cache intermediate results
and re-use them between commands.
It used to do this by default, but that results in a memory leak if you never clear the cache.

There is no single best caching strategy:
- For long-running processes, you may want to monitor memory usage and discard the cache when memory usage is above some threshold.
- For memory constrained devices, you may want to not use a cache at all.

Instead of compromising, I've placed a powerful tool in your hands:
1. You pass in an ordinary `cache` object.
2. isomorphic-git stores data on it by setting Symbol properties.
3. Manipulating the `cache` directly will void your warranty ⚠️.
4. To clear the cache, remove any references to it so it is garbage collected.

## Example

Here's what the first example looks like re-written to use a shared `cache` parameter:

```js
// PLEASE DON'T DO THIS!! This is for demonstration purposes only.
const test = async () => {
console.time('time elapsed')
let cache = {}
for (const filepath of await git.listFiles({ fs, dir, cache })) {
console.log(`${filepath}: ${await git.status({ fs, dir, filepath, cache })}`)
}
console.timeEnd('time elapsed')
}

test().catch(err => console.log(err))
```

This code runs in under 8 seconds on my machine.
(Compare with over 2 minutes without the `cache` argument.)
Still nowhere as good as `statusMatrix`, but not everything you might want to do with isomorphic-git can be described by a [`walk`](./walk.html).

The catch of course, is you have to decide when (if ever) to get rid of that cache.
It is just a JavaScript object, so all you need to do is eliminate any references to it and it will be garbage collected.

```js
// 1. Create a cache
let cache = {}
// 2. Do some stuff
// 3. Replace cache with new object so old cache is garbage collected
cache = {}
```
3 changes: 2 additions & 1 deletion src/api/add.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import { join } from '../utils/join.js'
* @param {string} args.dir - The [working tree](dir-vs-gitdir.md) directory path
* @param {string} [args.gitdir=join(dir, '.git')] - [required] The [git directory](dir-vs-gitdir.md) path
* @param {string} args.filepath - The path to the file to add to the index
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<void>} Resolves successfully once the git index has been updated
*
Expand All @@ -31,6 +32,7 @@ export async function add({
dir,
gitdir = join(dir, '.git'),
filepath,
cache = {},
}) {
try {
assertParameter('fs', _fs)
Expand All @@ -39,7 +41,6 @@ export async function add({
assertParameter('filepath', filepath)

const fs = new FileSystem(_fs)
const cache = {}
await GitIndexManager.acquire({ fs, gitdir, cache }, async function(index) {
await addToIndex({ dir, gitdir, fs, filepath, index })
})
Expand Down
3 changes: 2 additions & 1 deletion src/api/addNote.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import { normalizeCommitterObject } from '../utils/normalizeCommitterObject.js'
* @param {number} [args.committer.timestamp=Math.floor(Date.now()/1000)] - Set the committer timestamp field. This is the integer number of seconds since the Unix epoch (1970-01-01 00:00:00).
* @param {number} [args.committer.timezoneOffset] - Set the committer timezone offset field. This is the difference, in minutes, from the current timezone to UTC. Default is `(new Date()).getTimezoneOffset()`.
* @param {string} [args.signingKey] - Sign the note commit using this private PGP key.
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<string>} Resolves successfully with the SHA-1 object id of the commit object for the added note.
*/
Expand All @@ -48,6 +49,7 @@ export async function addNote({
author: _author,
committer: _committer,
signingKey,
cache = {},
}) {
try {
assertParameter('fs', _fs)
Expand All @@ -58,7 +60,6 @@ export async function addNote({
assertParameter('onSign', onSign)
}
const fs = new FileSystem(_fs)
const cache = {}

const author = await normalizeAuthorObject({ fs, gitdir, author: _author })
if (!author) throw new MissingNameError('author')
Expand Down
4 changes: 3 additions & 1 deletion src/api/annotatedTag.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import { normalizeAuthorObject } from '../utils/normalizeAuthorObject.js'
* @param {string} [args.gpgsig] - The gpgsig attatched to the tag object. (Mutually exclusive with the `signingKey` option.)
* @param {string} [args.signingKey] - Sign the tag object using this private PGP key. (Mutually exclusive with the `gpgsig` option.)
* @param {boolean} [args.force = false] - Instead of throwing an error if a tag named `ref` already exists, overwrite the existing tag. Note that this option does not modify the original tag object itself.
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<void>} Resolves successfully when filesystem operations are complete
*
Expand Down Expand Up @@ -56,6 +57,7 @@ export async function annotatedTag({
object,
signingKey,
force = false,
cache = {},
}) {
try {
assertParameter('fs', _fs)
Expand All @@ -72,7 +74,7 @@ export async function annotatedTag({

return await _annotatedTag({
fs,
cache: {},
cache,
onSign,
gitdir,
ref,
Expand Down
4 changes: 3 additions & 1 deletion src/api/checkout.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import { join } from '../utils/join.js'
* @param {boolean} [args.noUpdateHead] - If true, will update the working directory but won't update HEAD. Defaults to `false` when `ref` is provided, and `true` if `ref` is not provided.
* @param {boolean} [args.dryRun = false] - If true, simulates a checkout so you can test whether it would succeed.
* @param {boolean} [args.force = false] - If true, conflicts will be ignored and files will be overwritten regardless of local changes.
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<void>} Resolves successfully when filesystem operations are complete
*
Expand Down Expand Up @@ -69,6 +70,7 @@ export async function checkout({
noUpdateHead = _ref === undefined,
dryRun = false,
force = false,
cache = {},
}) {
try {
assertParameter('fs', fs)
Expand All @@ -78,7 +80,7 @@ export async function checkout({
const ref = _ref || 'HEAD'
return await _checkout({
fs: new FileSystem(fs),
cache: {},
cache,
onProgress,
dir,
gitdir,
Expand Down
4 changes: 3 additions & 1 deletion src/api/clone.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import { join } from '../utils/join.js'
* @param {string[]} [args.exclude = []] - A list of branches or tags. Instructs the remote server not to send us any commits reachable from these refs.
* @param {boolean} [args.relative = false] - Changes the meaning of `depth` to be measured from the current shallow depth rather than from the branch tip.
* @param {Object<string, string>} [args.headers = {}] - Additional headers to include in HTTP requests, similar to git's `extraHeader` config
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<void>} Resolves successfully when clone completes
*
Expand Down Expand Up @@ -69,6 +70,7 @@ export async function clone({
noCheckout = false,
noTags = false,
headers = {},
cache = {},
}) {
try {
assertParameter('fs', fs)
Expand All @@ -81,7 +83,7 @@ export async function clone({

return await _clone({
fs: new FileSystem(fs),
cache: {},
cache,
http,
onProgress,
onMessage,
Expand Down
3 changes: 2 additions & 1 deletion src/api/commit.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import { normalizeCommitterObject } from '../utils/normalizeCommitterObject.js'
* @param {string} [args.ref] - The fully expanded name of the branch to commit to. Default is the current branch pointed to by HEAD. (TODO: fix it so it can expand branch names without throwing if the branch doesn't exist yet.)
* @param {string[]} [args.parent] - The SHA-1 object ids of the commits to use as parents. If not specified, the commit pointed to by `ref` is used.
* @param {string} [args.tree] - The SHA-1 object id of the tree to use. If not specified, a new tree object is created from the current git index.
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<string>} Resolves successfully with the SHA-1 object id of the newly created commit.
*
Expand Down Expand Up @@ -64,6 +65,7 @@ export async function commit({
ref,
parent,
tree,
cache = {},
}) {
try {
assertParameter('fs', _fs)
Expand All @@ -72,7 +74,6 @@ export async function commit({
assertParameter('onSign', onSign)
}
const fs = new FileSystem(_fs)
const cache = {}

const author = await normalizeAuthorObject({ fs, gitdir, author: _author })
if (!author) throw new MissingNameError('author')
Expand Down
11 changes: 9 additions & 2 deletions src/api/expandOid.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import { join } from '../utils/join.js'
* @param {string} [args.dir] - The [working tree](dir-vs-gitdir.md) directory path
* @param {string} [args.gitdir=join(dir,'.git')] - [required] The [git directory](dir-vs-gitdir.md) path
* @param {string} args.oid - The shortened oid prefix to expand (like "0414d2a")
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<string>} Resolves successfully with the full oid (like "0414d2a286d7bbc7a4a326a61c1f9f888a8ab87f")
*
Expand All @@ -22,14 +23,20 @@ import { join } from '../utils/join.js'
* console.log(oid)
*
*/
export async function expandOid({ fs, dir, gitdir = join(dir, '.git'), oid }) {
export async function expandOid({
fs,
dir,
gitdir = join(dir, '.git'),
oid,
cache = {},
}) {
try {
assertParameter('fs', fs)
assertParameter('gitdir', gitdir)
assertParameter('oid', oid)
return await _expandOid({
fs: new FileSystem(fs),
cache: {},
cache,
gitdir,
oid,
})
Expand Down
4 changes: 3 additions & 1 deletion src/api/fastForward.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import { join } from '../utils/join.js'
* @param {string} [args.corsProxy] - Optional [CORS proxy](https://www.npmjs.com/%40isomorphic-git/cors-proxy). Overrides value in repo config.
* @param {boolean} [args.singleBranch = false] - Instead of the default behavior of fetching all the branches, only fetch a single branch.
* @param {Object<string, string>} [args.headers] - Additional headers to include in HTTP requests, similar to git's `extraHeader` config
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<void>} Resolves successfully when pull operation completes
*
Expand Down Expand Up @@ -57,6 +58,7 @@ export async function fastForward({
corsProxy,
singleBranch,
headers = {},
cache = {},
}) {
try {
assertParameter('fs', fs)
Expand All @@ -72,7 +74,7 @@ export async function fastForward({

return await _pull({
fs: new FileSystem(fs),
cache: {},
cache,
http,
onProgress,
onMessage,
Expand Down
4 changes: 3 additions & 1 deletion src/api/fetch.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ import { join } from '../utils/join.js'
* @param {boolean} [args.pruneTags] - Prune local tags that don’t exist on the remote, and force-update those tags that differ
* @param {string} [args.corsProxy] - Optional [CORS proxy](https://www.npmjs.com/%40isomorphic-git/cors-proxy). Overrides value in repo config.
* @param {Object<string, string>} [args.headers] - Additional headers to include in HTTP requests, similar to git's `extraHeader` config
* @param {object} [args.cache] - a [cache](cache.md) object
*
* @returns {Promise<FetchResult>} Resolves successfully when fetch completes
* @see FetchResult
Expand Down Expand Up @@ -87,6 +88,7 @@ export async function fetch({
headers = {},
prune = false,
pruneTags = false,
cache = {},
}) {
try {
assertParameter('fs', fs)
Expand All @@ -95,7 +97,7 @@ export async function fetch({

return await _fetch({
fs: new FileSystem(fs),
cache: {},
cache,
http,
onProgress,
onMessage,
Expand Down
4 changes: 3 additions & 1 deletion src/api/findMergeBase.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@ import { join } from '../utils/join.js'
* @param {string} [args.dir] - The [working tree](dir-vs-gitdir.md) directory path
* @param {string} [args.gitdir=join(dir,'.git')] - [required] The [git directory](dir-vs-gitdir.md) path
* @param {string[]} args.oids - Which commits
* @param {object} [args.cache] - a [cache](cache.md) object
*
*/
export async function findMergeBase({
fs,
dir,
gitdir = join(dir, '.git'),
oids,
cache = {},
}) {
try {
assertParameter('fs', fs)
Expand All @@ -29,7 +31,7 @@ export async function findMergeBase({

return await _findMergeBase({
fs: new FileSystem(fs),
cache: {},
cache,
gitdir,
oids,
})
Expand Down
Loading

0 comments on commit 26f761e

Please sign in to comment.