New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch all branches that match a pattern, ideas for API enhancements, and some bugs that were found #43
Comments
Somewhat intentionally, isomorphic-git doesn't grab all the git history when it does a clone, it just grabs the history of one branch at a time. In this case "fetch" just grabbed "master" since that's the default for this repo. This is my way of keeping people from shooting themselves in the foot, cloning megabytes more than they need to. Maybe I need to rethink that and go with the more generous "if no branch is specified fetch ALL the branches" instead of just fetching the default? Or maybe do shallow clones of all the branches by default? Or simply document the current behavior with a big bold "notice" in the documentation and mention in the README that this behavior differs from canonical git? What do you think? Edit: since the error message is unhelpful, I've labeled this a bug. isomorphic-git needs to be smart enough to realize what has happened and say something like "Error: Tried to checkout a branch that isn't available locally - do git.fetch({ref: 'gh-pages'}) to make the branch available locally" As for this particular case, here's how you would do it using "fetch": const git = require('isomorphic-git')
const fs = require('fs')
;(async () => {
const dir = 'isogit'
await git.init({
fs,
dir
})
await git.config({
fs,
dir,
path: 'remote.origin.url',
value: 'https://github.com/isomorphic-git/isomorphic-git.git'
})
await git.fetch({
fs,
dir,
remote: 'origin',
ref: 'gh-pages'
})
await git.checkout({
fs,
dir,
remote: 'origin',
ref: 'gh-pages',
})
})() but that can be simplified to: const git = require('isomorphic-git')
const fs = require('fs')
;(async () => {
await git.clone({
fs,
dir: 'isogit',
url: 'https://github.com/isomorphic-git/isomorphic-git.git',
ref: 'gh-pages'
})
})() However on my Windows machine I'm getting an error saying it cannot create |
I had fooled myself into thinking this had to do with orphaned branches because the master and develop branches were pointing to the same commit. I see now that it has to do with any branches that aren't the same. It makes sense to me that you only fetch branches on demand. That's often what you want. What I'm trying to do is grab content out of any branches that match a pattern (e.g., v*). So I need a way to get a list of remote branches so I know which ones to fetch. Currently, the Then I'd be able to iterate over the branches that match the filter and collect the files. The workaround is to dive into .git/refs/remotes/origin/ myself to find matching branches, but it's a little crowed in there, so an API method would be much nicer. |
While we're on the topic of cloning/fetching/checking out, is there a way to suppress the progress messages written to stdout?
|
I'll take out the progress messages going to stdout. I'm looking for a good / great way to handle progress messages. Have you seen any APIs that do it really well? I like the simplicity of returning Promises for "git.clone" etc, but returning an EventEmitter would allow a lot more flexibility. |
Here's the script I came up with to grab information from the package.json file from each reference in this repository and report information about it (in this case, the version). const fs = require('fs')
const git = require('isomorphic-git')
const path = require('path')
;(async () => {
const url = 'https://github.com/isomorphic-git/isomorphic-git.git'
const dir = 'isogit'
const originRefsDir = path.join(dir, '.git/refs/remotes/origin')
await git.fetch({ fs, dir, url })
// QUESTION is there a better way to get the HEAD/main branch?
const mainBranchInfo = fs.readFileSync(path.join(originRefsDir, 'HEAD'), 'utf8')
const mainBranchName = mainBranchInfo.slice(mainBranchInfo.lastIndexOf('/') + 1, mainBranchInfo.length).trim()
const branchNames = fs.readdirSync(originRefsDir).filter((ref) => ref.charAt(0) === 'v' && !ref.endsWith('^{}'))
const data = []
const errors = []
for (let i = 0, len = branchNames.length; i < len; i++) {
const branchName = branchNames[i]
if (branchName !== mainBranchName) await git.fetch({ fs, dir, url, ref: branchName })
try {
await git.checkout({ fs, dir, remote: 'origin', ref: branchName })
} catch (e) {
errors.push(e)
}
data.push(JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf8')).version)
}
data.forEach((it) => console.log(it))
})() There are two things I'd like to know.
|
nodegit seems to do a decent job. I've been able to use those hooks in the past to create a progress bar. |
I would also study got. It's very well done. |
Not currently. I've been meaning to expose a
Tags are stored in |
+1 I was thinking that too.
But only after you fetch it, right? As you can see from my script, I can't tell before I fetch whether I'm fetching a tag or a branch. I'd like to exclude tags. |
OK, it's weird, but tags aren't associated with a remote AFAICT. So right now, you should be fine because you shouldn't find any tags in Oh. Huh. Well, that's a bug. Canonical git doesn't do that. Mine is accidentally dumping tags in there. It should be dumping them in |
👍 |
Are there any loose ends to tie up here? |
I'll refactor the code based on the latest master and see what's still sticking out. Stay tuned. |
We're looking pretty good! Here's how the updated code looks (which reports the package version for each branch): const fs = require('fs-extra')
const git = require('isomorphic-git')
const path = require('path')
;(async () => {
const url = 'https://github.com/isomorphic-git/isomorphic-git.git'
const dir = 'isogit'
const depth = 1
const repo = { fs, dir }
const originBaseRef = 'refs/remotes/origin/'
if (process.argv[2] === '--clean') await fs.remove(dir)
await fs.pathExists(dir)
.then((exists) => exists ? undefined : git.fetch({ ...repo, url, depth }))
const defaultBranchName =
(await git.resolveRef({ ...repo, ref: originBaseRef + 'HEAD', depth: 2 }))
.replace(originBaseRef, '')
const branchNames = (await git.listBranches({ ...repo, remote: 'origin' }))
.filter((name) => name !== 'HEAD')
async function isBranchFetched(branchName) {
return git.readObject({ ...repo, oid: (await git.resolveRef({ ...repo, ref: originBaseRef + branchName })) })
.then(() => true)
.catch(() => false)
}
async function extractVersion(branchName) {
const sha = await git.resolveRef({ ...repo, ref: originBaseRef + branchName })
const { object: { tree } } = await git.readObject({ ...repo, oid: sha })
const { object: { entries } } = await git.readObject({ ...repo, oid: tree })
const packageEntry = entries.find((entry) => entry.path === 'package.json')
if (packageEntry) {
const { object: pkg } = await git.readObject({ ...repo, oid: packageEntry.oid })
return JSON.parse(pkg.toString('utf8')).version
}
}
const data = []
for (let i = 0, len = branchNames.length; i < len; i++) {
const branchName = branchNames[i]
if (branchName !== defaultBranchName && !(await isBranchFetched(branchName))) {
await git.fetch({ ...repo, url, ref: branchName, depth })
}
const version = await extractVersion(branchName)
if (version) {
data.push(branchName + ': ' + version)
} else {
console.log('package.json not found in branch: ' + branchName)
}
}
data.forEach((it) => console.log(it))
})() First, let me just say that the fetch depth is a killer feature. nodegit is lacking that, and it's sorely needed. Here's my wishlist:
Even without these improvements, I can already see this library standing shoulder-to-shoulder with nodegit, which is damn exciting. |
Here's an equivalent of this script written using nodegit. const fs = require('fs-extra')
const git = require('nodegit')
const path = require('path')
;(async () => {
const url = 'https://github.com/isomorphic-git/isomorphic-git.git'
const dir = 'isogit/.git'
const fetchOpts = { callbacks: { certificateCheck: () => 1 } }
if (process.argv[2] === '--clean') await fs.remove(path.dirname(dir))
const repo = await fs.pathExists(dir)
.then((exists) => exists ? git.Repository.open(dir) : git.Clone.clone(url, dir, { bare: 1, fetchOpts }))
const refs = await repo.getReferences(git.Reference.TYPE.OID)
const branchNames = refs.reduce((accum, ref) => {
const segments = ref.name().split('/')
if (segments[1] === 'remotes' && segments[2] === 'origin') accum.push(segments.slice(3).join('/'))
return accum
}, [])
async function extractVersion(branchName) {
const tree = await repo.getBranchCommit('origin/' + branchName).then((commit) => commit.getTree())
try {
const packageBlob = await tree.getEntry('package.json').then((entry) => entry.getBlob())
return JSON.parse(packageBlob.content().toString('utf8')).version
} catch (e) {}
}
const data = []
for (let i = 0, len = branchNames.length; i < len; i++) {
const branchName = branchNames[i]
const version = await extractVersion(branchName)
if (version) {
data.push(branchName + ': ' + version)
} else {
console.log('package.json not found in branch: ' + branchName)
}
}
data.forEach((it) => console.log(it))
})() |
Amazing, amazing feedback. I'm going to respond piecemeal bc I'm on the road today from my phone, so I apologise in advance.
This is trickier said than done. The way you're doing it is the most robust - there are a number of edge cases where the object you want to read might not be available and catching that error is a sound approach. Maybe the branch is only fetched to a certain depth, or maybe it was fetched but there was a force push to the remote meanwhile. I'm thinking what would actually speed up this code is to fetch all the branches at once. Then you'd only have to make one or two HTTP requests, and only one or two packfiles instead of one per branch (which I assume is what happens when your code runs? check .git/objects/pack and let me know, I never actually tested with multiple packfiles in a single repo IIRC). So either I should add a way to fetch all the branches up front, or make that the default behavior, or allow specifying a list of branches to fetch instead of just one, or some combination of all those ideas. But fetching multiple branches at once should speed up the code from O(n) with the number of branches to O(1) with the number of branches. |
Ooh! That's a great idea. That way listBranches and resolveRef work exactly the same way, and it elevates 'remote' to a common abstraction. You won't have to be aware of the filesystem implementation ( |
I was planning to return a BlobDescription of some sort, but that's the thing about blobs: they've got no metadata. All the metadata (filename, executable bit, etc) is in the tree. |
I agree it's a little inconvenient to use Array.find or a for loop through the list to get the entries you care about, but I don't want to trade away simplicity for convenience. What's nice about TreeDescription and CommitDescription is they are just data structures. They're not strictly JSON because Buffer isn't a valid JSON object, but they are "structured clone"-able so you should be able to copy them and send them with I might be persuaded differently later, but right now it seems much easier to say "tree.entries is an array of objects that have a type, a path, a mode, and an oid property" than to say "tree is an object with methods" and then have to document all the methods, and then deal with how you serialize the objects, and users who want to subclass them, etc. |
I get that. My point, though, is that when I pass |
My concern isn't the convenience factor. I just want to make sure that I'm not having it perform operations it doesn't need to perform. I'm only interested if |
That would be a welcomed addition! 👏 What I would like to be able to do is fetch branches that match a pattern in the most efficient way possible. One way is to get a list of branch names in the repository, filter them, then tell fetch to retrieve them. Another (perhaps alternate) option would be just to have fetch take a collection of include patterns. That's what I'm essentially doing anyway. Something like: |
I'll probably end up providing both. Some way to do remote reference discovery that essentially exposes |
I just realized that TreeDescription#entries only returns a single level. Would it be possible to have work recusively like Example:
|
Git stores each tree as a separate object, and retrieving objects with readObject should be just as efficient as anything I could implement "internally" in the library. So I don't think there's a performance gain to be had by adding a recursive option. If anything, the optimal perf will be when you can take advantage of use-case-specific knowledge and choose which directories to recurse into (src, doc) and which to skip (tests, dist). And because the worst case performance of a recursive read would be really bad, people would want more features like "ignore rules" and "recursion depth limit". Does the "-r" option still return a flat list or would it return nested lists? If it returns nested lists but only with the "-r" option, how do I describe that in the TypeScript definition file that provides IDE autocompletion for the library? It might snowball endlessly... But that sucks right? Because a simple "I want a list of all the files in a git commit" shouldn't be this arduous task that requires reinventing the wheel every time. And heck I'd use a library that let me list recursively using globbing syntax to match (and exclude!) files and options for recursion depth, and keyword search, and filtering results by file size, and more! (deep breath) In the meantime, I think I'll add a section to the README and to the documentation for "cut-and-paste" useful code snippets for tasks that probably shouldn't be in the core of the library, yet are common enough that you shouldn't have to think about how to do them. They'd also double as useful Examples of how to use the core library, and possibly serve a third duty as answers to FAQs. |
On a pragmatic note, if you haven't figured out how to recursively list the entries of a tree, let me know. I assume it's trivial but unless I actually work it out I can't be certain. |
I've thought about this more and I actually agree with you. I have a lot more control being able to decide when to descend and when not to. I also don't have to worry about paths being created incorrectly on Windows since I receive them one level at a time. What might be nice, however, is a tree walker like nodegit. That would just help manage the recursion, but still give me a callback to decide whether to keep going, stop, or whatever. But, of course, I can implement such a think in my own application if necessary. |
The downside of the tree walker in nodegit is that it doesn't allow me to control the level of descent. So it's actually just a more cumbersome way of doing git ls-tree. |
Took a while, but the 0.8.0 release fetches all branches by default now when you do a clone (opt out of that behavior by using the |
I decided to rename this issue since it's evolved quite a lot. |
I can confirm this is working great.
This would be very nice for large repos. |
It seems that if a branch has a disconnected history from the main branch, isomorphic-git fails to check it out. You can see the problem on this repository:
The error reported is:
If you change the ref from
gh-pages
todevelop
, it works fine.The text was updated successfully, but these errors were encountered: