Skip to content

Initial Git-LFS support#921

Merged
gitblit merged 1 commit intogitblit-org:developfrom
paulsputer:git-lfs-support
Oct 10, 2015
Merged

Initial Git-LFS support#921
gitblit merged 1 commit intogitblit-org:developfrom
paulsputer:git-lfs-support

Conversation

@paulsputer
Copy link
Collaborator

Noticed quite a few people in the forums mentioning Git-LFS support but haven't seen any PR's yet so here is mine. I'm sure there's a fair bit of clean up needed yet but It would be great to get some feedback for improvements and would be fantastic if it could be part of the 1.7 release :)

Notes

[lfs]
url = "<server>/lfs/<repoName>"

Is it just me or is it strange it requires the quotes unlike the other urls in the config file?

  • By default the client uses <server>/<user>/<repoName>/info/lfs

Maybe you can point me in the right direction here as I’m not quite sure how to filter away from the gitServlet correctly. It would be great to use the default style refspec so a clone will correctly get the large files too. At the moment I'm simply using <server>/lfs/<repoName>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infamous nested repository feature strikes back! I don't think this will match test/mytest.git, for example.

@gitblit
Copy link
Collaborator

gitblit commented Sep 22, 2015

Hi Paul,

This looks pretty amazing. We'll definitely squeeze this in for 1.7.0. We'll also need some documentation on how to use this. I haven't used Github's LFS before. Do you know of an example repo?

@paulsputer
Copy link
Collaborator Author

Hi James, thanks for the feedback,

I felt there was a lot of duplication in the GitLFS specific filter that I created first time around so I've made some minor modifications to the accessRestrictionFilter to give access to the method and auth headers to allow the GitLFS filter to be part of the standard GitFilter. This has also allowed the default GitLFS path to be used to simply getting up and running with using the client.

I'm not aware of there being any open access LFS repositories as GitHub appears to be testing only on a small subset of users, in fact, from the looks of things GitBlit will be one of the first open-source integrated providers! :) Even if there were open access though we still would be unable to use that for testing as this PR is only for the server side, not the client (that's pending a JGit update for smudge and clean filters) I've identified some limitations on the new help page, but I may have missed some so it would be great for another pair of eyes to go over that ;)

As a result I've created tests for the servlet by creating the filestore aspects using the separately tested Manager functions. I've also updated the repository extraction to handle nested repositories and added tests for the regex for that.

Let me know how it looks and if there's any changes you'd like to see, cheers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be canClone() to be consistent with the rest of Gitblit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I had this thought a few times. I was going to put canClone() but then in the case that a user has view permissions but not clone permissions they would be unable to request the blob via the web interface (not sure how likely that is). Probably a more realistic scenario though is for in a later PR I'm hoping to use the filestore to handle ticket attachments. In that case I thought it may be more feasible to have someone with view but not clone rights, what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. That makes sense, leave it.

@gitblit
Copy link
Collaborator

gitblit commented Oct 5, 2015

I made a couple of small suggestions.

One thing that is giving me pause is how and how often the metadata is stored.

I do like object serialization and I use it in a ton of other places in my work, but I'm not sure about it's use here as the source of truth for blob metadata. Would a JSON array of FilestoreModel be better? At least it would be inspectable and it would be harvestable by other tools.

The other thing I notice is it looks like the backing store is out-of-sync with the in-memory representation until Gitblit is cleanly shutdown. In a perfect world of uninterrupted power and clean shutdowns, that would be fine but I'm thinking we need to persist the metadata when a blob is added & deleted. Do you agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken link. Need https://.

@paulsputer
Copy link
Collaborator Author

Would a JSON array of FilestoreModel be better

That's a good idea! I have to admit in the first instance I was just wanting to get something that worked without too many changes so serialization seemed the most straight forward approach, though I'm not sure how it would scale. Certainly JSON would be a better approach. Is there a particular example would recommend me look at, maybe the ticket journal implementation?

I'm thinking we need to persist the metadata when a blob is added & deleted.

Fully agree! I wasn't sure what the best way is and thought persisting on each transaction might become an issue with serializing the whole map. I guess using JSON following the ticket journal approach would minimize this and could also provide an audit log if needed. In that case would you say JSON array or journal be better?

@gitblit
Copy link
Collaborator

gitblit commented Oct 5, 2015

I hadn't considered the ticket journal. That would actually be a pretty good fit. The ticket backends initially used a SHA1 for the ticket id (instead of a long) and the sharded directory structure - same as you are using in the LFS directory & similar to Gerrit.

We could do something like FileTicketService - and you aren't too far away from that now. If you did something like BranchTicketService then the filestore would be clonable but ironically it would be putting LFS inside git which is what LFS is trying to avoid in the first place. But I suppose the blobs could still be externally stored while the metadata is stored in a repo.

Or we could go with JSON, for now, synchronize the add/delete methods, and ensure the metadata is re-serialized on those operations. I imagine that updating/removing the blobs probably has a low-incidence, at least for a Gitblit install. No doubt GitHub has to think harder about concurrency and persistence I/O bottlenecks with their 27 million(?) repos but I think Gitblit and it's userbase can tolerate a synchronized lock on the FilestoreManager.

@paulsputer
Copy link
Collaborator Author

If you did something like BranchTicketService then the filestore would be clonable but ironically it would be putting LFS inside git which is what LFS is trying to avoid in the first place.

I wonder if this would provide a simple means by which to federate the filestore later on...

In the mean time, going with the JSON array approach I think I'm almost there. Just having second thoughts on the use of UserModel within the FilestoreModel. It looks as though I need to have access to the UserManager within the FilestoreManager just to retrieve a userModel from a name, feels rather messy though. What do you reckon, better to simply store the principal user name string in the FilestoreModel and then convert to a userModel in the UI if we need to refer to a different field of the user?

@paulsputer paulsputer force-pushed the git-lfs-support branch 2 times, most recently from aea3cc0 to 7269b81 Compare October 6, 2015 22:17
@paulsputer
Copy link
Collaborator Author

@gitblit Ok I've implemented the metadata storage as an append-only JSON journal file so it automatically provides an audit log on each status-changing transaction. Also to simplify deserialization of the FilestoreModel I've switched to a string of the principal user name rather than the UserModel. Let me know if there's anything else that needs improving.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you saw in ConfigUserService I usually write to a temp file and then rename to the target file on successful completion of write. I'd like to have that logic here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try (RandomAccessFile fs = new RandomAccessFile(metaFile, "rw"))

@gitblit
Copy link
Collaborator

gitblit commented Oct 9, 2015

@paulsputer We're almost there. I made a few comments on making sure we cleanup streams and we safely persist your JSON array.

+ Metadata maintained in append-only JSON file providing complete audit
history.
+ Filestore menu item
	+ Lists filestore items
	+ Current size and availability
	+ Link to GitBlit Filestore help page (top right)
+ Hooks into existing repository permissions
+ Uses default repository path for out-of-box operation with Git-LFS
client
+ accessRestrictionFilter now has access to http method and auth header
+ Testing for servlet and manager
@paulsputer
Copy link
Collaborator Author

Thanks @gitblit, I've added in writing to a temporary file then copying, is this ok? I'm not caching the whole history of the changes, just the current status. Not sure which is the best approach.

@gitblit
Copy link
Collaborator

gitblit commented Oct 10, 2015

Looks good. 👍

gitblit added a commit that referenced this pull request Oct 10, 2015
@gitblit gitblit merged commit a3a18a0 into gitblit-org:develop Oct 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants