Initial Git-LFS support #921
Conversation
HttpServletResponse httpResponse = (HttpServletResponse) response; | ||
|
||
String url = httpRequest.getRequestURI(); | ||
Pattern p = Pattern.compile("/lfs/([~a-zA-Z0-9.]+)/objects(/batch|/[a-fA-F0-9]{64})?"); |
gitblit
Sep 22, 2015
Owner
The infamous nested repository feature strikes back! I don't think this will match test/mytest.git
, for example.
The infamous nested repository feature strikes back! I don't think this will match test/mytest.git
, for example.
HttpServletResponse response) throws ServletException ,IOException { | ||
|
||
String uri = request.getRequestURI(); | ||
Pattern p = Pattern.compile("/lfs/([~a-zA-Z0-9.]+)/objects(/batch)?"); |
gitblit
Sep 22, 2015
Owner
Same issue here on nested repos.
Same issue here on nested repos.
HttpServletResponse response) throws ServletException ,IOException { | ||
|
||
String uri = request.getRequestURI(); | ||
Pattern p = Pattern.compile("/lfs/([~a-zA-Z0-9.]+)/objects/([a-fA-F0-9]{64})"); |
gitblit
Sep 22, 2015
Owner
Nested repos. Maybe the string should be a constant or a couple constants?
Nested repos. Maybe the string should be a constant or a couple constants?
HttpServletResponse response) throws ServletException ,IOException { | ||
|
||
String uri = request.getRequestURI(); | ||
Pattern p = Pattern.compile("/lfs/([~a-zA-Z0-9.]+)/objects/([a-fA-F0-9]{64})"); |
gitblit
Sep 22, 2015
Owner
And here.
And here.
Hi Paul, This looks pretty amazing. We'll definitely squeeze this in for 1.7.0. We'll also need some documentation on how to use this. I haven't used Github's LFS before. Do you know of an example repo? |
Hi James, thanks for the feedback, I felt there was a lot of duplication in the GitLFS specific filter that I created first time around so I've made some minor modifications to the accessRestrictionFilter to give access to the method and auth headers to allow the GitLFS filter to be part of the standard GitFilter. This has also allowed the default GitLFS path to be used to simply getting up and running with using the client. I'm not aware of there being any open access LFS repositories as GitHub appears to be testing only on a small subset of users, in fact, from the looks of things GitBlit will be one of the first open-source integrated providers! :) Even if there were open access though we still would be unable to use that for testing as this PR is only for the server side, not the client (that's pending a JGit update for smudge and clean filters) I've identified some limitations on the new help page, but I may have missed some so it would be great for another pair of eyes to go over that ;) As a result I've created tests for the servlet by creating the filestore aspects using the separately tested Manager functions. I've also updated the repository extraction to handle nested repositories and added tests for the regex for that. Let me know how it looks and if there's any changes you'd like to see, cheers |
private FilestoreModel.Status canGetObject(String oid, UserModel user, RepositoryModel repo) { | ||
|
||
//Access Control | ||
if (!user.canView(repo)) { |
gitblit
Oct 5, 2015
Owner
This should probably be canClone()
to be consistent with the rest of Gitblit.
This should probably be canClone()
to be consistent with the rest of Gitblit.
paulsputer
Oct 5, 2015
Author
Collaborator
Ah yes I had this thought a few times. I was going to put canClone()
but then in the case that a user has view permissions but not clone permissions they would be unable to request the blob via the web interface (not sure how likely that is). Probably a more realistic scenario though is for in a later PR I'm hoping to use the filestore to handle ticket attachments. In that case I thought it may be more feasible to have someone with view but not clone rights, what do you think?
Ah yes I had this thought a few times. I was going to put canClone()
but then in the case that a user has view permissions but not clone permissions they would be unable to request the blob via the web interface (not sure how likely that is). Probably a more realistic scenario though is for in a later PR I'm hoping to use the filestore to handle ticket attachments. In that case I thought it may be more feasible to have someone with view but not clone rights, what do you think?
gitblit
Oct 5, 2015
Owner
Ah. That makes sense, leave it.
Ah. That makes sense, leave it.
if (action.equals(gitLfs)) { | ||
//Repository must already exist for any filestore actions | ||
return null; | ||
}; |
gitblit
Oct 5, 2015
Owner
I think some compilers/IDEs hate trailing semi-colons on conditional blocks.
I think some compilers/IDEs hate trailing semi-colons on conditional blocks.
paulsputer
Oct 5, 2015
Author
Collaborator
oops how did that get in there. thanks!
oops how did that get in there. thanks!
@@ -66,7 +66,7 @@ | |||
ModelUtilsTest.class, JnaUtilsTest.class, LdapSyncServiceTest.class, FileTicketServiceTest.class, | |||
BranchTicketServiceTest.class, RedisTicketServiceTest.class, AuthenticationManagerTest.class, | |||
SshKeysDispatcherTest.class, UITicketTest.class, PathUtilsTest.class, SshKerberosAuthenticationTest.class, | |||
GravatarTest.class }) | |||
GravatarTest.class, FilestoreManagerTest.class }) |
gitblit
Oct 5, 2015
Owner
Missing reference to FilestoreServletTest.class
.
Missing reference to FilestoreServletTest.class
.
paulsputer
Oct 5, 2015
Author
Collaborator
Ah pretty critical, thanks will add in.
Ah pretty critical, thanks will add in.
I made a couple of small suggestions. One thing that is giving me pause is how and how often the metadata is stored. I do like object serialization and I use it in a ton of other places in my work, but I'm not sure about it's use here as the source of truth for blob metadata. Would a JSON array of The other thing I notice is it looks like the backing store is out-of-sync with the in-memory representation until Gitblit is cleanly shutdown. In a perfect world of uninterrupted power and clean shutdowns, that would be fine but I'm thinking we need to persist the metadata when a blob is added & deleted. Do you agree? |
<div class="alert alert-danger"> | ||
<h3><center>Using the Filestore</center></h3> | ||
<p> | ||
<strong>All clients intending to use the filestore must first install the <a href="git-lfs.github.com/">Git-LFS Client</a> and then run <code>git lfs init</code> to register the hooks globally.</strong><br/> |
gitblit
Oct 5, 2015
Owner
Broken link. Need https://
.
Broken link. Need https://
.
That's a good idea! I have to admit in the first instance I was just wanting to get something that worked without too many changes so serialization seemed the most straight forward approach, though I'm not sure how it would scale. Certainly JSON would be a better approach. Is there a particular example would recommend me look at, maybe the ticket journal implementation?
Fully agree! I wasn't sure what the best way is and thought persisting on each transaction might become an issue with serializing the whole map. I guess using JSON following the ticket journal approach would minimize this and could also provide an audit log if needed. In that case would you say JSON array or journal be better? |
I hadn't considered the ticket journal. That would actually be a pretty good fit. The ticket backends initially used a SHA1 for the ticket id (instead of a long) and the sharded directory structure - same as you are using in the LFS directory & similar to Gerrit. We could do something like Or we could go with JSON, for now, synchronize the add/delete methods, and ensure the metadata is re-serialized on those operations. I imagine that updating/removing the blobs probably has a low-incidence, at least for a Gitblit install. No doubt GitHub has to think harder about concurrency and persistence I/O bottlenecks with their 27 million(?) repos but I think Gitblit and it's userbase can tolerate a synchronized lock on the |
I wonder if this would provide a simple means by which to federate the filestore later on... In the mean time, going with the JSON array approach I think I'm almost there. Just having second thoughts on the use of |
aea3cc0
to
7269b81
@gitblit Ok I've implemented the metadata storage as an append-only JSON journal file so it automatically provides an audit log on each status-changing transaction. Also to simplify deserialization of the FilestoreModel I've switched to a string of the principal user name rather than the UserModel. Let me know if there's anything else that needs improving. |
|
||
boolean isNewFile = !metaFile.exists(); | ||
|
||
RandomAccessFile fs = new RandomAccessFile(metaFile, "rw"); |
gitblit
Oct 9, 2015
Owner
As you saw in ConfigUserService
I usually write to a temp file and then rename to the target file on successful completion of write. I'd like to have that logic here.
As you saw in ConfigUserService
I usually write to a temp file and then rename to the target file on successful completion of write. I'd like to have that logic here.
gitblit
Oct 9, 2015
Owner
try (RandomAccessFile fs = new RandomAccessFile(metaFile, "rw"))
try (RandomAccessFile fs = new RandomAccessFile(metaFile, "rw"))
Collection<FilestoreModel> items = null; | ||
|
||
Gson gson = gson(); | ||
try { |
gitblit
Oct 9, 2015
Owner
try (FileReader file = new FileReader(metadata)) {
try (FileReader file = new FileReader(metadata)) {
file.getParentFile().mkdirs(); | ||
file.createNewFile(); | ||
|
||
FileOutputStream streamOut = new FileOutputStream(file); |
gitblit
Oct 9, 2015
Owner
try (FileOutputStream streamOut = new FileOutputStream(file))
try (FileOutputStream streamOut = new FileOutputStream(file))
oid, model.getSize(), actualSize)); | ||
} else { | ||
|
||
FileInputStream fileForHash = new FileInputStream(file); |
gitblit
Oct 9, 2015
Owner
try (FileInputStream fileForHash = new FileInputStream(file))
try (FileInputStream fileForHash = new FileInputStream(file))
|
||
if (streamOut != null) { | ||
try { | ||
FileInputStream streamIn = new FileInputStream(getStoragePath(oid)); |
gitblit
Oct 9, 2015
Owner
try (FileInputStream streamIn = new FileInputStream(getStoragePath(oid)))
try (FileInputStream streamIn = new FileInputStream(getStoragePath(oid)))
@paulsputer We're almost there. I made a few comments on making sure we cleanup streams and we safely persist your JSON array. |
+ Metadata maintained in append-only JSON file providing complete audit history. + Filestore menu item + Lists filestore items + Current size and availability + Link to GitBlit Filestore help page (top right) + Hooks into existing repository permissions + Uses default repository path for out-of-box operation with Git-LFS client + accessRestrictionFilter now has access to http method and auth header + Testing for servlet and manager
Thanks @gitblit, I've added in writing to a temporary file then copying, is this ok? I'm not caching the whole history of the changes, just the current status. Not sure which is the best approach. |
Looks good. |
how can i delete files from filestore ? just delete a test repo but the files are still in the filestore list.. |
That's a logical expectation but it is not yet implemented and it may get messy when considering forks. |
Hi @wirmachenbunt glad to hear you're trying out the filestore. Unfortunately there's no inbuilt delete functionality yet for it, although there are states defined for it in the code ready for later implementation, though this will be quite a major task to ensure it is safe to delete. @gitblit I wouldn't even know where to begin with managing it safely with forks. As it stands the Git-LFS protocol doesn't define any delete process, and this is probably a good thing as major damage could be done as filestore items may be shared by multiple repositories, and still be referenced in repository histories leading to incomplete clones. If this is only for testing purposes and there are no other filestore users, you can delete the contents of /lfs |
yeah probably makes no sense deleting when this is production stuff btw. seems to work like a charm. i run gitblit on synology NAS and use gitextensions as a client, the new LFS part integrates very well. thank you |
@wirmachenbunt Great stuff, glad to hear! To remove them from the filestore list you'll have to shutdown gitblit then delete the filestore.json and filestore.json.tmp. Deleting these files during run time won't clear them as it holds them in a cache and commits to the files on both push and shutdown events, and loads the file again at startup. |
that worked, cheers! |
Noticed quite a few people in the forums mentioning Git-LFS support but haven't seen any PR's yet so here is mine. I'm sure there's a fair bit of clean up needed yet but It would be great to get some feedback for improvements and would be fantastic if it could be part of the 1.7 release :)
Notes
<server>/<user>/<repoName>/info/lfs