Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store repositories in a single siva file #381

Open
jfontan opened this issue Feb 8, 2019 · 1 comment
Open

Store repositories in a single siva file #381

jfontan opened this issue Feb 8, 2019 · 1 comment
Labels
proposal proposal for new additions or changes

Comments

@jfontan
Copy link
Contributor

jfontan commented Feb 8, 2019

Purpose

Store whole repositories in the same place instead of splitting them in several siva files. Reasons explained in: #380

Changes

  • Add rooted repo column for the whole repository to the database schema
  • Skip init commit search if repository has already a rooted repository selected in DB
  • Select the rooted repository for the repository if it still doesn't have one

Database

In core-retrieval add a new column to Repository:

Init SHA1

We want to keep also the Init in Reference as these will be used to delete the references from the extra rooted repos on updating.

Init selection

If the repository already has Init column set use it instead of searching for one. Otherwise pick it following this rules:

  • Error when there are no references
  • If there's a default branch and is valid calculated the rooted repo from it
  • If there's no default branch calculate rooted repos from all branches and pick the most used, that is, the rooted repo with more references
  • If there is a tie pick the first lexicographically

Note: There could be more rules like getting the longest commit history tree or checking which ones already exist in the database but it will make the code more complex and this shouldn't happen too often.

Changes in the code

gitReferencer (https://github.com/src-d/borges/blob/master/git.go#L56) should have a new constructor to accept the init commit in case it exists in the database:

func NewGitReferencerWithInit(r *git.Repository, i plumbing.Hash) Referencer {
  return gitReferencer{
    Reposirory: r,
    init: i,
  }
}

type gitReferencer struct {
  *git.Repository

  init plumbing.Hash
}

If init is set then do not do the search and set all references Init to the same value.

Optimizations

These may not be done in the first implementation but could accelerate downloads a lot.

Fast path for first download

This is already done, here for completion. If the siva file is new (no commits) then rename the references and copy the repository as is inside the siva.

#378

Fast path for updates

This only works if we already know the init where the repositoriy will be located.

A second optimization can use use a layer on top of the repository to do the translation of reference names when fetching it and do it directly over the siva file. This way the packfile that is downloaded is smaller and we don't need to do a push, it is written as is. This layer should be go-borges.

@jfontan jfontan added the proposal proposal for new additions or changes label Feb 8, 2019
@jfontan
Copy link
Contributor Author

jfontan commented May 21, 2019

Information on how the repositories are stored with the current system (one siva per rooted repo):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal proposal for new additions or changes
Projects
None yet
Development

No branches or pull requests

1 participant