Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow matching repositories to dependencies #1369

Closed
bmuschko opened this Issue Feb 8, 2017 · 38 comments

Comments

@bmuschko
Copy link
Contributor

bmuschko commented Feb 8, 2017

Original issue: https://issues.gradle.org/browse/GRADLE-1066

Highly voted issue: 20

Example use cases:

  • I need to ensure that the runtime dependencies I ship in my distribution come from a specific repository that passes some licence check. Plugins and compile only dependencies don't have this restriction
  • A repository was added to provide only some specific dependency, but I don't want Gradle to ask it for other dependencies. Maybe it is slower or less trustworthy than others.
@omsn

This comment has been minimized.

Copy link

omsn commented Feb 10, 2017

+1

We absolutely need this feature. Note that for us, "declare repository per configuration (e.g. runtime dependencies)" would already sufficient. Maybe you can consider that as well.

@seviu

This comment has been minimized.

Copy link

seviu commented Mar 10, 2017

+1
I figured that a way to make the situation a bit better is to think carefully about the order in which you declare your repositories. Where I work we have a nexus private repo that sometime is very slow to resolve dependencies. Moving it to the bottom made our life much easier.

@lavcraft

This comment has been minimized.

Copy link

lavcraft commented Mar 13, 2017

+1

@nedtwigg

This comment has been minimized.

Copy link

nedtwigg commented Apr 10, 2017

One idea:

repositories {
    maven {
        url 'specialCase'
        filter 'specialGroup:**'
    }
    maven {
        url 'repo1'
        filter '!specialGroup:**'
    }
}
@kevinbayes

This comment has been minimized.

Copy link

kevinbayes commented May 18, 2017

Has any work been done on this?

It's a really good feature for competing repositories that may have the same dependencies with slight differences.

@jaredsburrows

This comment has been minimized.

Copy link

jaredsburrows commented May 31, 2017

This is great idea for when you want to use jitpack.io for a single dependency and not use it for anything else.

@markhughes

This comment has been minimized.

Copy link

markhughes commented Sep 1, 2017

I think this would be really cool in terms of not having to poll a bunch of other repositories before getting to the right one. This way, we can just prioritise the repo for a certain repository.

Looking forward to this implementation! 😄

@oehme oehme changed the title Provide support for declaring repositories on a per-artifact basis Reduce overhead of having many repositories Sep 19, 2017

@oehme

This comment has been minimized.

Copy link
Member

oehme commented Sep 19, 2017

I've adjusted the title of this to refer to the actual problem. Assigning a repository to a dependency might not be the best solution. There are alternatives like remembering that a certain module was absent from a repository and not checking it again when we want to see if there is a new version.

Another alternative would be to do matching of repositories based on certain attributes.

@nedtwigg

This comment has been minimized.

Copy link

nedtwigg commented Sep 19, 2017

For my usecase, I care less about the performance problem, and more about being able to assign a specific dependency to a specific repository for correctness reasons.

Example story:

  • a jar from mavencentral has a bug / needs a feature
  • I modify the jar to suit my needs
  • I publish it to our local 3rd party repo under a new version
  • at some point in the future, a new jar with the same version is published to mavencentral

I might be in the minority here, but I imagine it's a pretty common situation for a project to be "we get all our deps from mavenCentral, except this one vendor library which comes from the vendor's maven repo, and this one hacked-up library we check-in to our 'libs' directory". I guess it's a sticky issue figuring out how to resolve the transitives...

@jgriff

This comment has been minimized.

Copy link

jgriff commented Sep 19, 2017

@netwigg The way I've handled your use case in the past is to deploy your "fixed/hacked" artifact under an expressly different GAV coordinate. It may just be that the version is overtly different (i.e. "1.0.1" --> "1.0.1.MYCOMPANY_PATCH").

Another way is to prepend the groupId or artifactId with yours. Back in the day, SpringSource did this with their OSGi Enterprise Bundle Repository. They took a ton of OSS libraries that were not OSGi'd, and modified their MANIFEST.MF and repackaged them and deployed them with the same exact groupId and version, but the artifiactId was prepended with "com.springsource". For example, "org.apache.commons:commons-io" became "org.apache.commons:com.springsource.commons-io".

I like prepending to the group or artifact ID over changing the version, just to make it clear it is in fact a "different" artifact. It also better avoids collisions.

@oehme

This comment has been minimized.

Copy link
Member

oehme commented Sep 19, 2017

Publishing it under a different group is a very good approach. You can use a substitution rule to tell Gradle to replace any appearance of the original with your modified one. Then you don't rely on repository order for correctness.

@omsn

This comment has been minimized.

Copy link

omsn commented Sep 28, 2017

The problem with the GAV changes is that they propagate to projects which consume your project as 4th party dependencies but those consumer projects might not have access to the same repositories you have. You care where the libs in your distribution build come from but you don't care / control where consumer projects get them from.

@omsn

This comment has been minimized.

Copy link

omsn commented Sep 28, 2017

Another use case for this which I think is fairly common at companies and which shows that this is not only about performance: as per company policy 3rd parties have to pulled from a certain "approved" repository which is basically the same as Maven central but has some additional stuff like a virus scanner or something.

@oehme

This comment has been minimized.

Copy link
Member

oehme commented Sep 28, 2017

I don't understand the two points above. Can you please elaborate how the original proposal of "repositories on dependencies" would have solved them?

If you modify a lib, but don't give consumers access to the repo containing that modified version, these consumers can't work. They can't just use the unmodified version, since that's not going to work correctly with your project. But if they really want to try anyway, they can use a dependency resolution rule to adjust that dependency back to the unmodified version.

If you require dependencies to be pulled from a specific repo for policy reasons, it's much safer to validate that no other repo is used than to rely on repository ordering or trusting users to specify the repository on each dependency/configuration.

@omsn

This comment has been minimized.

Copy link

omsn commented Sep 28, 2017

If you require dependencies to be pulled from a specific repo for policy reasons, it's much safer to validate that no other repo is used than to rely on repository ordering or trusting users to specify the repository on each dependency/configuration.

How exactly can you validate that no other repo is used today? Usually, the repo which has to be used for policy reasons does not contain all dependencies of your project. For example, due to otherwise too much maintenance effort, the repo which has to be used for policy reasons only contains the dependencies you redistribute (runtime dependencies). However, your project might also have test dependencies, integration test dependencies, compile only dependencies, plugins, etc.. which are not available on that repository. Therefore the only solution I see today is to rely on repository ordering which we both agree on is not a safe / reliable solution. Trusting developers to specify the repository on each dependency/configuration would be a reliable solution because with that your build fails if a certain configuration isn't on the repo you specified it should be.

If you modify a lib, but don't give consumers access to the repo containing that modified version, these consumers can't work.

What if the lib isn't modified, just has to be pulled from a different (private) repo for policy reasons? However, I admit that this use case is a bit far fetched, might not be as common as the "need to use repo X for policy reasons" use case. Dependency substitution on the consumer side would be a fix, but that complicates consumer projects quite a bit. With this ticket, this becomes easier since you don't need to change the GAV.

@oehme

This comment has been minimized.

Copy link
Member

oehme commented Sep 28, 2017

How exactly can you validate that no other repo is used today?

You can add a task to the build that goes through the repositories and fails if any non-trusted one is used. This could be part of a plugin that every project is required to use.

Usually, the repo which has to be used for policy reasons does not contain all dependencies of your project.

Plugins, compile only dependencies etc. can all compromise the code that you deliver. Compromised testing libraries could help an attacker hide such issues. Checking only the jars you redistribute doesn't mean the distribution is safe to use.

Trusting developers to specify the repository on each dependency/configuration would be a reliable solution

Isn't all policy about not trusting a single party, but having an independent party cross-check? It's much easier to check that each project is using a policy-checking plugin than checking whether each project implements each policy on a line-by-line level in their build scripts.

All that being said, I can imagine having some attribute-to-repository matching that allows you to restrict Gradle's search for dependencies. I would suggest opening a separate issue for this though, because the original intent of this one is about performance, not policy.

@omsn

This comment has been minimized.

Copy link

omsn commented Sep 28, 2017

You can add a task to the build that goes through the repositories and fails if any non-trusted one is used. This could be part of a plugin that every project is required to use.

Doesn't solve anything for the case I described since what is considered "trusted" depends on the artifact/configuration.

Checking only the jars you redistribute doesn't mean the distribution is safe to use.

Let's not discuss whether such policies make sense or not. Fact is such policies exist and developers have to deal with them.

Isn't all policy about not trusting a single party, but having an independent party cross-check? It's much easier to check that each project is using a policy-checking plugin than checking whether each project implements each policy on a line-by-line level in their build scripts.

I don't really get this point. Goal is to fail the build if certain dependencies aren't pulled from a trusted repository. The fact that the build fails serve as proof that the build implements the policy. Whether the build fails because a plugin isn't present or because something isn't configured correctly in the build script itself doesn't really matter. Issue is that with Gradle today you can only fail the build if any dependency cannot be found on any specified repository. But what is needed is to fail the build if certain dependencies cannot be found on certain specified repositories.

All that being said, I can imagine having some attribute-to-repository matching that allows you to restrict Gradle's search for dependencies. I would suggest opening a separate issue for this though, because the original intent of this one is about performance, not policy.

That would be great. I'm happy to file a separate issue for this, but reading the description and comments of this ticket, why do you think this is only about performance? The "Expected Behavior" of this ticket sounds like what I'm trying to describe here and doesn't mention performance as an issue.

@oehme

This comment has been minimized.

Copy link
Member

oehme commented Sep 28, 2017

Goal is to fail the build if certain dependencies aren't pulled from a trusted repository. The fact that the build fails serve as proof that the build implements the policy.

So what does a green build tell you in this case? It could be implementing the policy (and no bad dependencies were used) or it might not be implementing the policy (and it might use bad dependencies). It's green either way.

I'm happy to file a separate issue for this, but reading the description and comments of this ticket, why do you think this is only about performance? The "Expected Behavior" of this ticket sounds like what I'm trying to describe here and doesn't mention performance as an issue.

That's because it went into a technical proposal too early. The context section mentions the actual problem - I add a repo just for a single dependency and everything else gets slower. By now it's gotten hard to tell who added a +1 for "make it faster" and who added a +1 for "make policy easier".

@markhughes

This comment has been minimized.

Copy link

markhughes commented Sep 28, 2017

I think it's all about what the original Expected Behavior stated:

Individual dependencies can declare the repositorie(s). Any runtime the dependencies are only resolved from the declared dependencies. The build fails if the dependency cannot be found in the declared repositories.

Edit: this is useless debate, and really unproductive. But both ideas seem necessary? Split the ticket if you need to just stop debating.

@oehme oehme changed the title Reduce overhead of having many repositories Allow matching repositories to dependencies Sep 28, 2017

@oehme

This comment has been minimized.

Copy link
Member

oehme commented Sep 28, 2017

I've forked out the performance aspect into its own issue and reworded this issue to not propose a technical solution, but talk about use cases instead.

@hearit-danny

This comment has been minimized.

Copy link

hearit-danny commented Feb 22, 2018

Any update on this? I too have this first requirement stated at the beginning. Any ETA or version this is slated for?

@mkobit

This comment has been minimized.

Copy link
Contributor

mkobit commented Mar 26, 2018

Another thing to take into consideration here is if Gradle eventually provides plugins with the ability to provide custom dependency types along with custom dependency resolution/repositories (somewhat mentioned in #1400).

I don't want to dive into implementation much, but stumbling through the Gradle API's led me to https://docs.gradle.org/current/javadoc/org/gradle/api/attributes/package-summary.html . I don't fully understand what those are used for yet, but those look like they might fit the bill of "matching things together".

@melix

This comment has been minimized.

Copy link
Member

melix commented Mar 27, 2018

@mkobit attributes are meant to select variants of the same module. They have no knowledge of where the component comes from, so are not the right solution for this.

Our initial thought was to have the ability to declare, for each repository, what it may or may not contain. But we haven't particularly made progress on this yet.

@ajwhite

This comment has been minimized.

Copy link

ajwhite commented May 16, 2018

A good example of a problem this would solve that recently affected many users facebook/react-native#13094 (comment)

@ar

This comment has been minimized.

Copy link

ar commented Jun 16, 2018

I believe this issue is quite sensitive. We run our own repository for nightly snapshots and I'm amazed to see our access logs from very large Fortune companies aggressively trying to access hundreds of dependencies. We get to know who uses what and which versions (and so does MavenCentral and other repos), which is quite a security leak, plus it scares me that we could even serve some of those (if we were bad guys).

@piotrturski

This comment has been minimized.

Copy link

piotrturski commented Jun 26, 2018

this issue would solve problems like: 'javax.mail:mail:1.3.1'. in central (and a few other repos) there is only pom but no jar. jar is in other repo. so make gradle survive jar missing in central i have to put that pecific repo as a first one.

so the moment i hit the same problem with different repo means i won't be able to build my project with gradle?

@sboardwell

This comment has been minimized.

Copy link

sboardwell commented Jul 9, 2018

What is currently stopping work being carried out on this? We also has this problem with public repositories even being spammed with requests for non-existent artifacts. Would it not suffice to add a whitelist/blacklist (includes/excludes) attribute to a repository block to either:

  • include resolution of all matching dependencies
  • exclude resolution of all matching dependencies

I would be willing to help out if someone would like to point me in the right direction.

@melix

This comment has been minimized.

Copy link
Member

melix commented Jul 9, 2018

Thanks @sboardwell ,

At this point we're collecting use cases for this, to make sure filtering, or matching, is the right solution for each one. We seeing advantages on implementing this, but we'd like to make sure the use cases for this are real, so starting with a list of use cases and possible solutions to them is a good start. Then we can decide if how and when we implement.

@sboardwell

This comment has been minimized.

Copy link

sboardwell commented Jul 9, 2018

Ok, thanks for the info @melix. My use case would be similar to other posters here:

  • internal artifacts pushed to an internal Nexus repository nexus.mycomp.com/my-releases
  • be able to tell gradle: please check nexus.mycomp.com/my-releases for any artifacts having the group mycomp

I also second @ar's comment. The information found in the incorrect requests being sent out can be very revealing, not to mention dangerous.

We are currently solving some of the problems with rewrite rules on the webserver serving our nexus, catching and sending back 404's for any requests known to be incorrect - but this is a really ugly solution to maintain.

Do you know when you'll be finished collecting information for use cases? Just asking since the original ticket was opened in 2010 😃

@melix

This comment has been minimized.

Copy link
Member

melix commented Jul 9, 2018

We've been working on dependency management features intensively for the past months, and this issue has been mentioned several times. We were close to implementing but always found better ways to solve the use cases we originally thought would need this. This doesn't mean they aren't. Performance improvement is another one. It is unlikely we will do anything on this before october as we have many more features to finish before.

Usually our customers having such issues workaround by having an internal repository which also does proxying, but we reckon it's not always that simple.

@d10xa

This comment has been minimized.

Copy link

d10xa commented Jul 9, 2018

@melix One of usecases is speed up build with multiple slow repositories:

repositories {
    maven { url "http://slow-repository-1.com" }
    maven { url "http://slow-repository-2.com" }
    // ...
    maven { url "http://slow-repository-n.com" }
    maven { url "http://super-slow-repository.com" }
    maven { url "http://extremely-slow-repository.com" }

    jcenter()
}

dependencies {
    compile "foo:bar:1.0" at "http://slow-repository-42.com"
    // ...
}
@sboardwell

This comment has been minimized.

Copy link

sboardwell commented Jul 9, 2018

Hi @melix, thank again for the quick response.

We use an internal nexus with cached/proxied remote repositories for jcenter, maven-central and Co - the default negative cache in Nexus is 1440 minutes, meaning the remote repository will only be contacted once per day for something it doesn't have.

However, if I've understood:

https://docs.gradle.org/current/userguide/introduction_dependency_management.html#sec:dependency_resolution

Once each repository has been inspected for the module, Gradle will choose the 'best' one to use.

correctly (but maybe I've got this part wrong), every configured repository will be contacted on every build with an empty gradle dependency cache (~/.gradle/caches/modules-2) (regardless of the ordering of the repositories). So, even if we declare the internal repository first and the dependency is found, the other irrelevant repositories are checked regardless. Is this correct?

@ljacomet

This comment has been minimized.

Copy link
Member

ljacomet commented Jul 9, 2018

@sboardwell Unless you use dynamic versions, once a version is found in a repository, it is not searched for in the other repos.

But here the comment from @melix is about declaring only your internal repository in the build script and let it handle the proxying.

Another potential use case for an improvement:

  • I want my runtime dependencies to be resolved against a blessed repository only, for security / legal reasons.
  • I have a looser constraint for my test dependencies which are allowed to be sourced from a wider set of sources.
@sboardwell

This comment has been minimized.

Copy link

sboardwell commented Jul 9, 2018

Thanks @ljacomet, I'll have a look at solving it with repository groups.

@akqa-tag-tech

This comment has been minimized.

Copy link

akqa-tag-tech commented Sep 19, 2018

Any update?

@eygraber

This comment has been minimized.

Copy link

eygraber commented Oct 10, 2018

My use case is some issues I've run into on multiple occasions where jcenter hosts old or unofficial artifacts that I need to get from another repo (e.g. Firebase artifacts that should come from google but are instead resolved to jcenter causing versioning issues; declaring google before jcenter fixes that issue but causes others).

Another one (that prompted me to search for this) is companies host their own artifacts, but adding their repository slows down the build, or worse, is somehow used to download all artifacts (even though it is specified last in repositories). My current issue is with Cloudflare's mobile SDK which is:

A. Slow
B. Being used to try and resolve all my artifacts

@melix

This comment has been minimized.

Copy link
Member

melix commented Oct 26, 2018

A PR is ready, if anyone is willing to build the branch and give us feedback.

@melix melix added this to the 5.1 RC1 milestone Nov 30, 2018

@melix

This comment has been minimized.

Copy link
Member

melix commented Nov 30, 2018

PR has been merged, will ship into 5.1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.