Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored retrieving of remote projects and added support for FTP, Mercurial and Subversion #309

Closed
wants to merge 41 commits into from

Conversation

saserr
Copy link
Contributor

@saserr saserr commented Dec 19, 2011

This pull contains:

  • Refactoring of a general retriever trait and reimplementation of git, file, http(s), and ftp support.
  • Optimized retrieving of multiple clones of same git repository w.r.t network usage, time, and size. Optimization will do only one remote clone per repository and all other clones are done using this local clone.
  • Allow defining non-standard git URIs (ones that do not use git protocol and do not end with '.git') by prefixing them with 'git:' (see examples)
  • Implementation of retrievers for mercurial and subversion repositories.

Examples of git URIs:

git://server/path
http(s)://server/path/repo.git
git:http(s)://server/path/repo
...

Examples of mercurial URIs:

hg:ssh://user@server/path
hg:http(s)://server/path
hg:file:///path
...

Examples of subversion URIs:

svn://server/path
svn+ssh://server/path
svn:http(s)://server/path
svn:file:///path
...

@harrah
Copy link
Member

harrah commented Dec 21, 2011

Thanks! I probably won't have time to review this for a couple of days.

@saserr
Copy link
Contributor Author

saserr commented Dec 21, 2011

If you have any questions about the code (i tried to make it as readable as possible) or if you find something that needs fixing, don't hesitate to ask.

@harrah
Copy link
Member

harrah commented Jan 9, 2012

I reviewed this a week ago, but it seems my comment didn't show up, so sorry about the delay. The overall goals of separating out the individual retrievers, adding new ones, and optimizing git usage are good.

  1. Use proper classes instead of structural types in the implicits.
  2. Always annotate the return type of implicits.
  3. Implicit conversions shouldn't be used ad hoc like this. They are hard enough to discover as it is, so they should be added to a shared helper class or, probably best for this case, just use normal methods.

There is already a type for retrievers: BuildLoaders.Resolver:

type Resolver = ResolveInfo => Option[() => File]

This is a better type for a partial function than having two methods in a trait like Scala's PartialFunction.

@saserr
Copy link
Contributor Author

saserr commented Jan 9, 2012

Thanks for the reply. I will fix these suggestions and push the new commits in few days.

Additionally, if you could tell me the way to do it, I can also add configuration so people can define new retrievers in their build files.

@harrah
Copy link
Member

harrah commented Jan 9, 2012

It is already possible. See https://github.com/harrah/xsbt/wiki/Build-Loaders.

error("Nonzero exit code (" + result + "): " + command.mkString(" "))
}
// Note: Git MUST come before Remote and Local because it can also handle `http(s)`, `ftp` and `file` URIs.
private val retrievers = List(Retriever.Subversion, Retriever.Mercurial, Retriever.Git, Retriever.Remote, Retriever.Local)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said configuration, I was thinking about this line which hardcodes default retrievers. Is it fine, w.r.t. outside configuration of Resolvers, to do it like this?

@harrah
Copy link
Member

harrah commented Jan 13, 2012

Oh, sorry. Yes, that should work. Users can't remove the defaults without configuring a full build loader, but they can override one by adding a resolve that handles that URI. Is that what you are asking?

@saserr
Copy link
Contributor Author

saserr commented Jan 13, 2012

I tried to follow code but i got lost :) so i asked to be sure to not create problems. thanks for the clarification.

also, i pushed new commits: now i am using BuildLoader.Resolver instead of my own trait and i replaced all implicits with normal methods as you suggested.

@harrah
Copy link
Member

harrah commented Jan 13, 2012

Great, thanks for the continued work!

  1. Don't use null except for interoperability with Java. That is, you obviously have to check the URI fragment for null because it is written that way and not under our control, but don't allow arguments to be null in code you control, such as the run method. Use Option.
  2. There is no need to extend Function1, as in object Local extends Resolver. Just do:
val Local: Resolver = (info: ResolveInfo) => {
  ... body of apply method ...
}
  1. Using keywords by quoting with backticks, such as for is awkward to use. This isn't a big deal if you prefer it, though.
  2. On line 24 of Resolvers, this will fail if the URI doesn't have a scheme of "file". it should be inside the if statement.
  3. The original RetrieveUnit code had the URI matching code separated from the retrieval/cloning code and this should be preserved. In general, keep methods small and make them a combination of other well-separated methods. For example, the Git resolver implementation should have the branch checkout code as a separate method.
  4. When doing Option(something) map f getOrElse x, just do pattern matching on something because the map/getOrElse don't gain you anything (unless f is already a function value), but cost two extra class files generated. (It is certainly ok to use map and getOrElse in general and this guideline is probably only something you'll have to follow for sbt and not other projects.)
  5. For the withoutMarkerSchema method, don't make URI parse the uri twice: assign new URI(uri.getSchemeSpecificPart) to a val.

@saserr
Copy link
Contributor Author

saserr commented Jan 18, 2012

Hopefully this is better now :-)

Changes:

  • Replaced objects with vals.
  • Moved back matching of URIs to RetriveUnit and cleaned it up.
  • Added RichURI "implicit" class to encapsulate all specialized code that deals with URIs.
  • Separated retrieving into meaningful small methods.
  • Reinstated cleanup code on failure that I accidentally deleted in last commit.

@harrah
Copy link
Member

harrah commented Jan 18, 2012

This looks close. If it weren't for #1, I think it could be merged.

  1. This doesn't look right:

    def hasFragment = this.uri.getFragment eq null
  2. Is there a reason for explicitly calling this as in this.uri ?

  3. RichURI can go in its own file in util/io/RichURI.scala and the implicit can go in a companion object. RichURI can be made final.

  4. Perhaps a better name for privateDir is uniqueSubdirectory

  5. vals usually aren't capitalized unless they are constants. Local and Git should be lowercase now that they aren't objects.

@saserr
Copy link
Contributor Author

saserr commented Jan 19, 2012

Good catch with #1. I fixed all your suggestions and made two additional changes:

  • moved uniqueSubdirectory (aka privateDir) back to Resolvers because it didn't fit well with rest of the utility class RichURI,
  • replaced setScheme with more Scala-like copy method.

@harrah
Copy link
Member

harrah commented Jan 19, 2012

Now that I've tried it out, it throws an exception when using a URI like hg:....

java.net.URISyntaxException: Expected scheme-specific part at index 3: hg:

You might want to use ScalaTest, specs2, or ScalaCheck for unit tests or sbt's scripted tests for integration testing: https://github.com/harrah/xsbt/blob/gh-pages/Developing.pdf?raw=true. Please ask if you have any questions.

@saserr
Copy link
Contributor Author

saserr commented Jan 26, 2012

I had the same mistake in RichURI.hasMarkerScheme as in hasFragment. I should have put ne instead of eq.

I tested it on my machine manually and now it works. I can create some unit tests for RichURI, but I am still not sure how can I test retrieving. If I use remote git, mercurial, and svn repositories it will make testing very slow and dependent on internet connection. So do you have any other ideas?

@vigdorchik
Copy link
Contributor

Right. Looking at the code it seems a new temporary directory is created every time RetrieveUnit.apply is called, so checking for exists() doesn't make much sense.

@saserr
Copy link
Contributor Author

saserr commented Feb 1, 2012

It doesn't have to create new directory for each new retrieval. If you try to retrieve same URI twice, second time you don't have to do anything and just return the directory that is already there. Again, same semantic as in previous code.

Is that what you asked or did I understood you wrong?

@vigdorchik
Copy link
Contributor

Yes, sorry for that.
So the check for existence is the same and can be moved out of each resolver?

@saserr
Copy link
Contributor Author

saserr commented Feb 1, 2012

But, figuring out where to resolve URI is not same. Local retriever will just point to given URI if it is writable. SVN, mercurial and git will normalize repository URI so that for example https://github.com/harrah/xsbt.git and git://github.com/harrah/xsbt.git are resolved to same local directory.

Of course, you can put all this into RetrieveUnit, but it doesn't gain anything because it will still be specific for each case. I just prefer to keep all retrieve-specific things inside corresponding retriever. If you or harrah prefer other way you can take my code and move that to RetrieveUnit.

Furthermore, this check has to be in the returned function because we have to check if local directory exists as a precaution before we try to create it. That means that RetrieveUnit would have to be responsible for creating the returned functions, which changes whole design of retrievers because they then become just Unit functions.

@harrah
Copy link
Member

harrah commented Feb 1, 2012

@vigdorchik OSTYPE can be probably dealt with in a subsequent pull request, right?

@saserr to be clear, the patch is a good one and important and will go in. I asked Eugene to look at it to give you faster feedback and for a fresh review as it is harder to look at it clearly after a few revisions.

@saserr
Copy link
Contributor Author

saserr commented Feb 1, 2012

No problem, of course. Glad to help at least little bit :)

Should I also rebase on top of 0.12 branch instead of 0.11 as there won't be any more 0.11.x releases?

@harrah
Copy link
Member

harrah commented Feb 1, 2012

Yes, put it on the 0.12 branch.

Instead of cloning from a remote git repository for each branch,
revision or tag separately, the git resolver locally clones only once
the remote git repository and then creates further local clones from
this local copy of the remote repository.

First, optimization, of course, is execution speed, because cloning
local repository is much faster than remote repository. Furthermore,
because git uses hard-linking when a clone of local repository is
created, the second optimization is in space consumption.

For example, if we have one project that uses
https://github.com/harrah/xsbt.git#v0.11.1 and second project that
uses https://github.com/harrah/xsbt.git#v0.11.2, in previous git
resolver implementation it would require two separate clones of the
remote git repository at https://github.com/harrah/xsbt.git. But, the
new git resolver requires only one clone of the remote git repository
and two local clones which take no space because of hard-linking.
Non-standard git URIs are ones that do not start with 'git:' nor end
with '.git'. An example of non-standard git URI is
'ssh://server/home/user/repo'.

The mechanism for specifying a non-standard git URI is done by
prefixing the whole URI with 'git:' to signify that it should be
handled with the git resolver. For example, non-standard git URIs like
'git:ssh://server/user/repo' and 'git:https://server/user/repo' can
now be used.
All mercurial URIs have to be prefixed with 'hg:' to separate them
from URIs for other resolvers. For example,
'hg:https://server/user/repo' can now be used.
All subversion URIs have to be prefixed with 'svn:' to separate them
from URIs for other resolvers. For example, 'svn:https://server/repo'
can now be used.
@saserr
Copy link
Contributor Author

saserr commented Feb 1, 2012

It messed up this pull request after rebase, because it didn't change it to 0.12 branch. I am closing it and opening new one for 0.12 branch.

@saserr saserr closed this Feb 1, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants