New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git LFS support #1375
Comments
I don't think there will be support unless there will be a person that will add this feature. The main person behind the project doesn't work on it like at the beginning. I'm right now an admin/maintainer, but I will not be able to add this feature. In fact, I will probably not able to add any feature myself. At least no in near future, maybe it will change later when I'll work on the project for a while. |
I think the first thing to try/prove is whether Git LFS can be supported without a change to isomorphic-git. isomorphic-git gives you the low level functions to read the objects in a git repository (abstracting away all the interaction with the loose, pack, index, and ref files). From that information, it should (in theory) be possible to discover Git LFS references and resolve them in application code. Personally, I don't yet understand Git LFS enough to know what has to happen. But I can't imagine that there is anything in the loose, pack, index, or ref files that pertain to Git LFS, and thus it's possible to handle those references without a change to isomorphic-git. If we determine that there's something that has to change in isomorphic-git, and we know what that change is, then of course we can consider making those updates so that an application can resolve Git LFS references. |
I just tried to clone a repository that has files tracked by git-lfs and it seemed to work just fine. What's the specific request here? Is this a question about being able to add and remove lfs files, or is it about being able to clone a repository with git-lfs enabled? What specifically isn't working right now? And what do you want to see working? |
I understand this better now. When the repository is cloned, the lfs blobs are actually info files (much like symlinks). Here's an example:
So it's clear this reference needs to be resolved. The question is, what needs to be sent to the server to get the lfs storage (either a single file or all of them)? Does anyone know where this exchange is documented? |
Aha! Adding the following flags when running git allowed me to see what it is requesting:
Here's what I saw:
That at least gives me a thread to follow. |
I have a very rudimentary prototype working to resolve lfs files when walking the git tree. I'll get the code organized into a step-by-step example and share it here. From there, we can think about how this can become part of isomorphic-git. I'm thinking something like adding lfs support transparently to the readBlob function, but it's too early to commit to anything at this point. |
Here are the docs for the lfs service for reference: https://github.com/git-lfs/git-lfs/tree/main/docs/api |
Thank you, @jcubic, for your quick response. @mojavelinux As I see, you already got the point. Sorry for reacting that late. Actually, there are two sides relevant for LFS: a) Checkout and retrieve the binary blobs based of reference files from the endpoint. There is a specification on how to implement LFS. |
I'll be focusing on (a) at first, though I don't see anything preventing (b) from being implemented too. |
Here's the code to clone a repository, populate the lfs object cache from the LFS pointer files found in the tree, and replace each LFS pointer file in the worktree with the real lfs object. This code uses two commands from isomorphic-git, 'use strict'
// $ node <url> <dir>
const fs = require('fs')
const { promises: fsp } = fs
if (!fsp.rm) fsp.rm = fsp.rmdir
const git = require('isomorphic-git')
const http = require('isomorphic-git/http/node')
const ospath = require('path')
const SYMLINK_MODE = 40960
const LFS_POINTER_PREAMBLE = 'version https://git-lfs.github.com/spec/v1\n'
async function bodyToBuffer (body) {
const buffers = []
let offset = 0
let size = 0
for await (const chunk of body) {
buffers.push(chunk)
size += chunk.byteLength
}
body = new Uint8Array(size)
for (const buffer of buffers) {
body.set(buffer, offset)
offset += buffer.byteLength
}
return Buffer.from(body.buffer)
}
function readLfsPointer ({ dir, gitdir = ospath.join(dir, '.git'), content }) {
const info = content.toString().trim().split('\n').reduce((accum, line) => {
const [k, v] = line.split(' ', 2)
if (k === 'oid') {
accum[k] = v.split(':', 2)[1]
} else if (k === 'size') {
accum[k] = v
}
return accum
}, {})
const oid = info.oid
const objectPath = ospath.join(gitdir, 'lfs', 'objects', oid.substr(0, 2), oid.substr(2, 2), oid)
return { info, objectPath }
}
async function downloadLfsObject ({ http: { request }, headers, url }, lfsInfo, lfsObjectPath) {
const lfsInfoRequestData = { operation: 'download', transfers: ['basic'], objects: [lfsInfo] }
const { body: lfsInfoBody } = await request({
url: `${url}/info/lfs/objects/batch`,
method: 'POST',
headers: {
...headers,
Accept: 'application/vnd.git-lfs+json',
'Content-Type': 'application/vnd.git-lfs+json',
},
body: [Buffer.from(JSON.stringify(lfsInfoRequestData))]
})
const lfsInfoResponseData = JSON.parse(await bodyToBuffer(lfsInfoBody))
const lfsObjectDownloadUrl = lfsInfoResponseData.objects[0].actions.download.href
const { body: lfsObjectBody } = await request({ url: lfsObjectDownloadUrl, method: 'GET', headers })
const content = await bodyToBuffer(lfsObjectBody)
await fsp.mkdir(ospath.dirname(lfsObjectPath), { recursive: true })
await fsp.writeFile(lfsObjectPath, content)
return content
}
;(async (url, dir) => {
const repo = { fs, dir }
const headers = { 'user-agent': `git/isomorphic-git@${git.version()}` }
await fsp.rm(repo.dir, { recursive: true })
await git.clone({ ...repo, headers: { ...headers }, http, url })
await git.walk({ ...repo, trees: [git.TREE({ ref: 'HEAD' })], map: async (filepath, [treeEntry]) => {
const type = await treeEntry.type()
if (type === 'tree') return true
if (type === 'blob' && await treeEntry.mode() !== SYMLINK_MODE) {
let content = await treeEntry.content().then((bytes) => Buffer.from(bytes.buffer))
if (content[0] === 118 && content.subarray(0, 100).indexOf(LFS_POINTER_PREAMBLE) === 0) {
const { info: lfsInfo, objectPath: lfsObjectPath } = readLfsPointer({ ...repo, content })
if (await fsp.access(lfsObjectPath).catch(() => true)) {
await downloadLfsObject({ headers, http, url }, lfsInfo, lfsObjectPath).then((content) => {
const lfsWorktreePath = ospath.join(repo.dir, filepath)
return fsp.lstat(lfsWorktreePath).then(({ mode }) => fsp.writeFile(lfsWorktreePath, content, { mode }))
})
}
}
}
}})
})(...process.argv.slice(2)) This code has several shortcomings. First, it leaves the worktree dirty. If I switch to the directory and run Second, the code doesn't handle authentication. But that shouldn't be difficult to add (especially since it's using the same request function that isomorphic-git uses internally). Finally, the code makes a separate requests for each LFS object to get the download URL. But the LFS service supports returning the download URL for multiple objects. So these requests could be consolidated into a single request so it only makes N+1 requests to the LFS service (one to collect all the download URLs and one for each download). (It also might be best to stream to the file in lfs/objects instead of buffering it into memory). |
The next step is now to figure out what isomorphic-git can do for us. It seems reasonable that the But let's assume the repository is cloned without a checkout. When should isomorphic-git populate the |
@mojavelinux awesome job. |
I would expect to populate Regarding the file type, I'm not sure what it is about and what the benefit would be. @mojavelinux Awesome work! |
Indeed, that would match canonical git when lfs is installed. But there's also overhead in doing so that not every user may want when cloning (at least not unconditionally). So I think the behavior will need to be controlled via a switch.
This would be
If we are walking the git tree, we need to know whether we are looking at an lfs pointer file or a regular file...just like we do when we're looking at a symlink pointer. isomorphic-git should be able to tell us this. (It's more than just checking the file to see if it looks like an LFS pointer file...we need to be 100% sure by consulting .gitattributes). Otherwise, we cannot trigger the appropriate action. |
From end user perspective, I believe there are two typical uses as far as reading LFS data goes:
As to adding objects, it’s more difficult to mimc the existing API but a low-level helper function like Agreed that the behavior of fetching all LFS files matching |
Hi there,
is there any plan to support the large file system capability with isomorphic-git?
I found #218 that notes LFS in an example, but I didn't found in the docs how to enable it.
Best regards
Ralf
The text was updated successfully, but these errors were encountered: