New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record who published a version of a crate #1478

Open
kornelski opened this Issue Aug 20, 2018 · 12 comments

Comments

Projects
None yet
4 participants
@kornelski
Copy link
Contributor

kornelski commented Aug 20, 2018

Currently neither the crates index nor the carates io api attribute published versions to users who published them.

When a crate has multiple owners it's not possible to establish who published what.

This attribution is needed for:

  • Auditing and monitoring. In case something unexpected gets published it's important to know whose credentials were stolen or misused.

  • Evaluation. On crates.rs I'm experimenting with several ideas - detecting abandoned crates (inactive authors) and finding trusted users based on relationships between crates, but the data is incomplete without knowing who exactly published each crate.

Ideally I'd like to see username GitHub User ID for every published version. It could be as part of the crates index. If bloating of the index should be avoided, it'd be ok as crates.io API call as well.


Implementation instructions

(added by @carols10cents)

Backend changes:

@jtgeibel

This comment has been minimized.

Copy link
Member

jtgeibel commented Sep 12, 2018

I agree that we should track this for auditing purposes. I think initially we will want to track this only in the database, as GitHub usernames can change and the primary key in our users table is opaque without hitting the API anyway.

@kornelski

This comment has been minimized.

Copy link
Contributor

kornelski commented Sep 13, 2018

Ah indeed, GitHub user ID would be more reliable.

@jtgeibel

This comment has been minimized.

Copy link
Member

jtgeibel commented Oct 30, 2018

In another thread, @carols10cents said:

Right now we aren't storing in the database which of a crate's owners published a particular version, which could be useful in situations such as a malicious crate version being published, in order to determine whose credentials had been compromised.

I would like to see this added soon as well. Any thoughts on tracking this per auth token instead of user?

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 30, 2018

Any thoughts on tracking this per auth token instead of user?

Yeah that would be interesting to know as well, in the case of a token breach. I think we delete auth tokens from the database (as opposed to marking them as deleted) if someone deletes one of their tokens, so I think we should store user id + auth token value/id.

@carols10cents carols10cents changed the title Record username of user who published the version of the crate Record who published the version of the crate Oct 31, 2018

@carols10cents carols10cents changed the title Record who published the version of the crate Record who published a version of a crate Oct 31, 2018

@carols10cents carols10cents changed the title Record who published a version of a crate Record who published or yanked a version of a crate Oct 31, 2018

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 31, 2018

Oooh, we should record who yanked a crate too.

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 31, 2018

Urrgghh I was thinking this would just be new columns on the versions table, but what if two owners have a yank/unyank battle? We should record all of those actions. So now I'm thinking a new table called version_owner_history or version_owner_actions or similar where we insert a new (version_id, owner_id, owner_api_token, action_type, datetime) record for each action (publish, yank, unyank).

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 31, 2018

@jtgeibel I'm trying to make this into a quest issue for a new contributor to take on... I'm starting to worry it might be a bit too complicated 😆 Could you check the implementation instructions I added to the issue description? Am I missing anything you'd expect to see in an implementation? Anything that doesn't sound like it would work out?

@jtgeibel

This comment has been minimized.

Copy link
Member

jtgeibel commented Oct 31, 2018

what if two owners have a yank/unyank battle?

I'd be more afraid that a compromised token would try to remove other owners. We don't keep any history there either ☹️

For publish/yank/unyank, perhaps we could add some user metadata to the commit message so it is clearly associated with a specific action in the index. I'm cautious of adding data directly in the index, but a commit message feels far less committal. (Although beyond supporting cargo I don't think we make any commitments about the format of the index.)

Here are some thoughts on the tradeoffs for various pieces of data:

  • Usernames - they can change over time
  • GitHub IDs - in the future we might want to support other auth providers
  • Our database user ID - not very useful externally unless we provide an API to get a user given an id number
@jtgeibel

This comment has been minimized.

Copy link
Member

jtgeibel commented Oct 31, 2018

@carols10cents, I just saw everything else you posted. I'll take a look at it tomorrow evening.

@jtgeibel

This comment has been minimized.

Copy link
Member

jtgeibel commented Oct 31, 2018

@carols10cents I think your instructions are fine and I don't see anything that is missing. Just a few minor comments below. I do agree that this is probably a bit much for a new contributor to take on, but still possible.

As a first step, we could descope this a bit and just add a published_by column to versions and track just that initially. I think that is the most important piece at the moment and should be approachable for a new contributor. I also think this is the only piece of information that we need to expose in the API and user interface. I see the full action log as a resource for assessing the damage done by a compromised account/token, where we care more about all actions from an account over a period of time, and not individual actions associated with a specific version.

A few comments:

  • owner_token VARCHAR NOT NULL - rather than storing a copy of the token, we could keep tokens around (with a revoked_on timestamp) and make this a foreign key. We don't have uniqueness constraints on this table (other than the value of the token itself) so a user is already free to duplicate token names and revoked tokens wouldn't restrict them from reusing the name. We also probably need to make this column nullable, since we currently allow (I think unintentionally) a user to do these actions with a cookie and in the future we may intentionally expose some of these actions to the frontend UI (although I doubt yank/unyank specifically).
  • action VARCHAR NOT NULL - we could make this an enum in the code and an integer in the database, like OwnerKind currently is.

I think the scheme you've outlined also maps equally well to tracking changes in crate ownership. The only differences I see would be that the table for that would track the crate_id instead of version_id, and the user/team affected by the addition/removal of ownership.

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Nov 1, 2018

As a first step, we could descope this a bit and just add a published_by column to versions and track just that initially.

You're totally right, I was letting scope creep get the best of me. I'm going to make this issue just for published_by and move everything else to new issues :)

@carols10cents carols10cents changed the title Record who published or yanked a version of a crate Record who published a version of a crate Nov 1, 2018

@carols10cents carols10cents referenced this issue Nov 1, 2018

Open

Audit trail for more actions #1548

0 of 9 tasks complete
@Andy-Bell

This comment has been minimized.

Copy link
Contributor

Andy-Bell commented Nov 2, 2018

I would be interested in picking this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment