Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "crates.io Policy Update" RFC #3463

Merged
merged 7 commits into from
Nov 7, 2023

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Jul 24, 2023

Rendered

/cc @rust-lang/crates-io

@Turbo87 Turbo87 added the T-crates-io Relevant to the crates.io team, which will review and decide on the RFC. label Jul 24, 2023
@Turbo87 Turbo87 force-pushed the crates-io-policy-update branch from ed88f70 to d89b3c8 Compare July 24, 2023 13:38
@Turbo87 Turbo87 force-pushed the crates-io-policy-update branch from d89b3c8 to 5ea4917 Compare July 24, 2023 13:40
@BurntSushi
Copy link
Member

BurntSushi commented Jul 24, 2023

I think a lot of the rules in this RFC are probably fine, but some of them strike me as quite vague and easily contorted. I understand some level of vagueness is necessary as not every little detail can be specified up front. That should make it clear that I am not objecting to vagueness itself but the level of vagueness.

Ironically, I recognize the vagueness of my objection. I think that discussing each rule in the abstract is likely to go exactly nowhere. So instead, I think it would be useful to take the rules as written and try to apply them to the ecosystem today. Presumably there is some subset of crates on crates.io that would be taken down by an application of one or more of the rules in this RFC. I certainly don't think we need to enumerate the entire subset, but for transparency, I think it would be a good idea to enumerate some of them. I can get us started:

I'm sure there are more interesting examples, and I'd encourage others to raise them. But at least for this small set, I personally see a fairly straight-forward application of one or more rules, as written in this RFC, being used to justify some action against all of the above crates. Is that not intended? If not, then perhaps the rules need some work. If so, then we should definitely talk about the ramifications of this RFC in terms of the crates it is going to impact as a starting point.

@Turbo87
Copy link
Member Author

Turbo87 commented Jul 25, 2023

@BurntSushi thank you, I think discussing in less abstract cases makes a lot of sense. the following would be my interpretation of the rules:

assertive

in its current state it would not be allowed since it breaks the "reserved for prolonged period of time" rule. if there was a repository link attached that shows active development or a README that explains why 4 years later there is still no content there then things might look different.

I see that Carl also has a bunch of async-* crates reserved, where the rule would similarly apply.

rg

as others have noted, there is also a rule in the proposal about impersonation. since ripgrep is a very popular tool and the rg package prevents a common impersonation/typo-squatting attack IMHO the name reservation rule does not apply, especially since the README clearly explains the purpose.

buttplug

I think https://docs.github.com/en/site-policy/acceptable-use-policies/github-sexually-obscene-content is a good guideline on what is meant by "sexually obscene" clause. "We do not allow sexually themed or suggestive content that serves little or no purpose other than to solicit an erotic or shocking response" does not apply since there is genuine functionality here, that is related to sexuality, but AFAICT not abusive in any way or includes any kind of pornographic material.

tl;dr it's fine

vsdb
bitcoin

I'm not sure why these two were included. Is this related to the latest lib.rs discussions?

While I'm personally not a big fan of cryptocurrencies there is nothing in the rules that would prevent anyone from publishing these kinds of crates. There is something in there about "cryptocurrency mining" but that is in the context of "excessive bulk activity and coordinated inauthentic activity" and meant to e.g. disallow installing a cryptominer in the build.rs file of a crate.

bible

There is no genuine functionality in this crate (just the hello world example code), it has no repository attached to check for development activity, and it has no README explaining reasons why this might still be legitimate. Looking at the other crates of the owner it also becomes quite obvious that this person is squatting names for the fun of it. In other words: this crate would be removed under the proposed ruleset.

@Jules-Bertholet
Copy link
Contributor

since ripgrep is a very popular tool and the rg package prevents a common impersonation/typo-squatting attack IMHO the name reservation rule does not apply, especially since the README clearly explains the purpose.

If ripgrep was less popular, would this reasoning still apply? Is there a popularity threshold of some sort?

@BurntSushi
Copy link
Member

BurntSushi commented Jul 25, 2023

@Turbo87 Thanks for the response. Separate from the specifics of each crate, has any work been done to figure out how best to communicate with existing crate authors about how this policy is going to impact them?

I see that Carl also has a bunch of async-* crates reserved, where the rule would similarly apply.

cc @carllerche do you want to chime in on this?

as others have noted, there is also a rule in the proposal about impersonation. since ripgrep is a very popular tool and the rg package prevents a common impersonation/typo-squatting attack IMHO the name reservation rule does not apply, especially since the README clearly explains the purpose.

It's not at all clear to me that every use of rg is about impersonation though. And even then, how do you choose which rule overrides the other?

What I'm getting at here is that the rule against squatting put forward here is very broad. @jhpratt suggested a carve out involving contacting the authors. What do you think about that? I understand writing down rules against squatting is difficult, but the problem is itself difficult and has a ton of prior discussion on the matter. Has that prior discussion been surveyed?

I think https://docs.github.com/en/site-policy/acceptable-use-policies/github-sexually-obscene-content is a good guideline on what is meant by "sexually obscene" clause. "We do not allow sexually themed or suggestive content that serves little or no purpose other than to solicit an erotic or shocking response" does not apply since there is genuine functionality here, that is related to sexuality, but AFAICT not abusive in any way or includes any kind of pornographic material.

Should that language make it into the RFC clarifying what is meant?

I'm not sure why these two were included. Is this related to the latest lib.rs discussions?

While I'm personally not a big fan of cryptocurrencies there is nothing in the rules that would prevent anyone from publishing these kinds of crates. There is something in there about "cryptocurrency mining" but that is in the context of "excessive bulk activity and coordinated inauthentic activity" and meant to e.g. disallow installing a cryptominer in the build.rs file of a crate.

No it's not about lib.rs. The cryptominer case I think is covered well by the "contains malicious code" rule. The rule I'm thinking about here is this (for which there is already some lively discussion):

is false, inaccurate, or intentionally deceptive information and likely to
adversely affect the public interest (including health, safety, election
integrity, and civic participation)

It is by no means a stretch to say that cryptocurrency meets this standard. All you have to do is make an argument that the vast majority of cryptocurrencies are in some way a scam or an MLM scheme, and then you just have to make an argument about its impact on climate change. I'm not going to make that argument in detail here (I think its existence is sufficient), but I've seen plenty of other folks make this argument earnestly and with pretty compelling reasons. So I'm not just picking something out of the ether here and playing with hypotheticals.

If the cryptocurrency case isn't meant to fall under this rule, then perhaps the rule should be further clarified.

There is no genuine functionality in this crate (just the hello world example code), it has no repository attached to check for development activity, and it has no README explaining reasons why this might still be legitimate. Looking at the other crates of the owner it also becomes quite obvious that this person is squatting names for the fun of it. In other words: this crate would be removed under the proposed ruleset.

Ah okay, whoops. Perhaps the_rock instead. My thinking here was essentially a more extreme version of the cryptocurrency case. This one is admittedly a bit more bombastic, but if you talked to the version of me that was 15 years younger, I would tell you with a straight face that it was promoting a lie that endangered the health of humanity.


Popping up a level, now that we've dived a little into specifics, my general feeling here is that the rules don't do a good enough job on their own from protecting ourselves and crates.io from our own biases. Some of these cases might look dumb and obviously not in violation of the rules, but we also need to think about how these rules will be used 10+ years from now. That's not to say we should start treating our future selves as adversaries necessarily, but I do think it's prudent to try and be a little more concrete with some of these rules.

Maybe there are some small wording changes that make the scope of some of these rules a bit more narrow explicitly.

@jonas-schievink
Copy link
Contributor

vsdb
bitcoin

I'm not sure why these two were included. Is this related to the latest lib.rs discussions?

Note that vsdb's download counts are clearly artificially inflated. Doesn't that violate the rule against misinformation?

@clarfonthey
Copy link
Contributor

Note that vsdb's download counts are clearly artificially inflated. Doesn't that violate the rule against misinformation?

I think that download counts for crates are (mostly) useless since one genuine build system with bad caching can inflate downloads. I think this would fall more in line with API abuse, where for example a given host could be banned for repeatedly requesting crates when they should be properly caching their downloads.

@Turbo87
Copy link
Member Author

Turbo87 commented Jul 26, 2023

If ripgrep was less popular, would this reasoning still apply? Is there a popularity threshold of some sort?

if your tool only has a couple of downloads per month then it probably wouldn't apply, but ripgrep is certainly popular enough to warrant an exception. in general, it would be best to have the binary name match the crate name though, so that the reason for this additional name reservation doesn't exist in the first place.

Separate from the specifics of each crate, has any work been done to figure out how best to communicate with existing crate authors about how this policy is going to impact them?

if the crate is from a legitimate author (as in: the GitHub user wasn't created just for the purpose of name squatting) then we will most likely contact the author via email before taking action. we have already done similar things in the past couple of weeks/months with regard to broken crate files (broken tarballs, empty tarballs, missing manifests, etc.).

@jhpratt suggested a carve out involving contacting the authors. What do you think about that?

I'm generally in favor, as long as it does not prevent us from immediately acting on actual name squatting "attacks". In the past couple of weeks we had several cases of people squatting hundreds of crates name. I don't think it would be practical if we had to contact these people first and give them a couple of weeks to respond before we could act on this. I'm open to wording suggestions on how to integrate this with the current proposal :)

Should that language make it into the RFC clarifying what is meant?

yep, I've added the link to the bullet point in the list.

It is by no means a stretch to say that cryptocurrency meets this standard. All you have to do is make an argument that the vast majority of cryptocurrencies are in some way a scam or an MLM scheme, and then you just have to make an argument about its impact on climate change. I'm not going to make that argument in detail here (I think its existence is sufficient), but I've seen plenty of other folks make this argument earnestly and with pretty compelling reasons. So I'm not just picking something out of the ether here and playing with hypotheticals.

If the cryptocurrency case isn't meant to fall under this rule, then perhaps the rule should be further clarified.

I think whether crates.io should generally forbid any cryptocurrency code or not would derail this RFC quite a bit and I'm not going to argue for either side here. It is currently not the intention of the crates.io team to generally forbid such code from being published to crates.io, unless there are some verifiably misleading claims published with it.

I'm open to suggestions on how to clarify the rule if you think that it needs a clarification.

@jhpratt
Copy link
Member

jhpratt commented Jul 26, 2023

Perhaps saying that the team may reach out to the crate's owner at their discretion? Obvious cases can still be handled immediately, while questionable ones can be given the opportunity for further explanation.

@BurntSushi
Copy link
Member

if your tool only has a couple of downloads per month then it probably wouldn't apply, but ripgrep is certainly popular enough to warrant an exception.

How should this balancing act be incorporated into the rules proposed in this RFC?

Also, I'd like to repeat my question: squatting is a topic that has been much discussed in the past. Has a survey of those discussions been done? My sense of things here is that this RFC is proposing a short but fairly strict policy on squatting, but will in practice be looser than the strict interpretation yet stricter than what is practiced today. Is it worth setting expectations more clearly that that?

if the crate is from a legitimate author (as in: the GitHub user wasn't created just for the purpose of name squatting) then we will most likely contact the author via email before taking action. we have already done similar things in the past couple of weeks/months with regard to broken crate files (broken tarballs, empty tarballs, missing manifests, etc.).

Right, that sounds reasonable. I think I might have been unclear. I didn't mean, "how are you going tell crate authors that they're in violation of a new policy," but rather, how are you going to collect feedback from stakeholders before this RFC passes? I realize "stakeholders" here is a pretty big bucket, and I've tried CC'ing a couple folks already in this thread, but I really think it's important to cast a wide net here. IMO, the title of this RFC does not reflect the level of change. I guess from my perspective, this is a very large change. I know you've said that most of these rules are a reflection of current practice (sans squatting), but 1) the squatting issue is a big one and 2) there is a bias inherent in your perspective. You might look at the rules and see them through the lens of a ton of context that comes from practice, but others (such as myself) look at the rules and see a very large deviation from the status quo. That is, in part, due to the vagueness of some of the rules.

And to be clear, I think a lot of the deviation is good deviation. I don't use deviation pejoratively. But because of the delta here, or at least, the perceived delta, I think it's important to get more visibility on this RFC. And in particular, it's important to reflect on what the changes in the rules imply on their own, and not just what the practice has been recently. A change in the rules can precipitate a change in practice over time.

I feel like I'm fumbling my words here and not making my point as clear as I would like to. My apologies.

I'm generally in favor, as long as it does not prevent us from immediately acting on actual name squatting "attacks".

Yes, 100% agreed. It's totally fine IMO to treat obvious bad faith or trolling cases differently from those acting in good faith. (Usually the difference is extremely obvious, and if a mistake is made, it's hopefully usually easy to undo.)

I think whether crates.io should generally forbid any cryptocurrency code or not would derail this RFC quite a bit and I'm not going to argue for either side here. It is currently not the intention of the crates.io team to generally forbid such code from being published to crates.io, unless there are some verifiably misleading claims published with it.

I'm open to suggestions on how to clarify the rule if you think that it needs a clarification.

Yeah... I see the conundrum. The unfortunate bit here is that I feel like the crypto-currency case is a really excellent stress test on these rules IMO. The thing that makes it an excellent stress test is probably also the thing that would make it easy to derail this thread unfortunately. I'm not sure how to resolve that.

Maybe it makes sense to add the word "narrow" (or similar) to this rule then?

is false, inaccurate, or intentionally deceptive information and likely to
adversely affect the public interest (including health, safety, election
integrity, and civic participation)

As based on your responses here, it seems like you want to treat this rule narrowly and specifically to a particular crate.

You could also include examples of how the rules are intended to apply, although that gets a little dicey.

@Jules-Bertholet
Copy link
Contributor

I think whether crates.io should generally forbid any cryptocurrency code or not would derail this RFC quite a bit and I'm not going to argue for either side here. It is currently not the intention of the crates.io team to generally forbid such code from being published to crates.io, unless there are some verifiably misleading claims published with it.

I don't think punting this decision is the right move. Rust tries to maintain a reputation for reliability and backward compatibility. "Even if you try to do everything right and follow all the rules in good faith, we might delete your code and break your builds at any time, because our standards are so vague that even we aren't sure how to apply them yet" is not compatible with that.

@Turbo87
Copy link
Member Author

Turbo87 commented Jul 26, 2023

Rust tries to maintain a reputation for reliability and backward compatibility. "Even if you try to do everything right and follow all the rules in good faith, we might delete your code and break your builds at any time, because our standards are so vague that even we aren't sure how to apply them yet" is not compatible with that.

just to be clear, the whole point of this RFC is to make the rules more clear than they were before. yes, they still leave a bit of vagueness here and there, but everything else would be impractical. claiming that "our standards are so vague that even we aren't sure how to apply them yet" is definitely the wrong characterization of the situation though.

@Jules-Bertholet
Copy link
Contributor

The proposed policies would potentially disallow crates, like the bitcoin example, that the current policy ("we won’t attempt to get into policing what exactly makes a legitimate package. We will do what the law requires us to do, and address flagrant violations of the Rust Code of Conduct") pretty clearly allows. That is more than just a clarification, it adds new uncertainty that did not exist before.

@Jules-Bertholet
Copy link
Contributor

https://www.law.cornell.edu/uscode/text/18/16

@trevyn
Copy link

trevyn commented Sep 24, 2023

I'm generally in favor, as long as it does not prevent us from immediately acting on actual name squatting "attacks". In the past couple of weeks we had several cases of people squatting hundreds of crates name. I don't think it would be practical if we had to contact these people first and give them a couple of weeks to respond before we could act on this.

Just a reminder that non-automated squatting is explicitly currently allowed by policy (otherwise this update wouldn't be necessary).

It would be odd not to respect in good faith a behavior that is currently allowed.

@alexpyattaev
Copy link

Based on comments by @tux3 I would recommend that, where possible, wording is changed to emphasize that the intent has to be malicious, rather than the code. For example, code that is minified for compactness (or to speed up build for example) should be treated differently from code that is minified deliberately to hide an exploit. It is the ill intent that should be punished, not the fact of uploading a specific bit of content.

@samlh
Copy link

samlh commented Sep 28, 2023

Based on comments by @tux3 I would recommend that, where possible, wording is changed to emphasize that the intent has to be malicious, rather than the code. For example, code that is minified for compactness (or to speed up build for example) should be treated differently from code that is minified deliberately to hide an exploit. It is the ill intent that should be punished, not the fact of uploading a specific bit of content.

I agree on context being important - however, I'd personally consider obfuscation of, say, build.rs files to be almost always a risk for hidden malicious intent.

I would mildly prefer we start with a stance of disallowing all obfuscation, with carve-outs for specific cases where it is easy to tell that it isn't malicious.

However, I think any discouragement of obfuscation is better than nothing, even if it is weaker than what the policy says now.

@gilescope

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@Turbo87
Copy link
Member Author

Turbo87 commented Oct 5, 2023

@trevyn the current crates.io policy refers to the Rust project code of conduct which says:

Likewise any spamming, trolling, flaming, baiting or other attention-stealing behavior is not welcome.

and squatting large numbers of crate names without meaningful content or a plan to use them for anything could be seen as "spamming". in other words: "non-automated squatting is explicitly currently allowed by policy" is only correct to a degree that does not amount to "spamming".

@alexpyattaev @samlh again, pleeeease comment on or open new threads on the diff. otherwise this main PR thread will become an even bigger mess than it already is... 🙏

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Oct 27, 2023
@rfcbot
Copy link
Collaborator

rfcbot commented Oct 27, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Nov 6, 2023
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 6, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@Turbo87 Turbo87 merged commit 48d7d6a into rust-lang:master Nov 7, 2023
Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Nov 7, 2023
Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Nov 7, 2023
Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Nov 10, 2023
Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Nov 10, 2023
Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Nov 10, 2023
Turbo87 added a commit to rust-lang/crates.io that referenced this pull request Nov 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-crates-io Relevant to the crates.io team, which will review and decide on the RFC. to-announce
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.