-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General outline of proposed updates to publishing logic #1522
Comments
Just for posterity, max crate size will eventually need to be checked asynchronously, so this probably should be primarily enforced in cargo. |
I've been thinking of how this task could be split up into smaller tasks to make it easier to review and lower the risk of breaking something (or at least making smaller changes so that we can tell which change broke something). I tried putting the current and proposed lists as revisions of a gist to enable viewing them in diff format, it helps a little I think: https://gist.github.com/carols10cents/4f32c43855fdfd77a8a5b48f53ab06b5/revisions#diff-b160a194db1110d5914710115e64d429 I also think there's opportunity to refactor the So I'd like to see these smaller changes to the code in the publish controller made in separate PRs in approximately this order (multiple items in this list might be accomplished in one PR depending on how much reordering is or is not necessary): Verify Headers
Verify Request Body
With database, outside of main transaction
Start writing within the transaction
|
@jtgeibel can you clarify a bit more what you mean by this in the proposed section:
As far as I can tell, this isn't a check we're doing directly right now. Are you thinking this is a quick way we can reject invalid requests rather than waiting until verify_tarball gets the data? Or would this new check prevent problems we could potentially be open to right now? |
Do we really need a separate setting for this? Isn't max content length an effective maximum on metadata, because you could theoretically have a tarball that's 0 bytes and metadata that takes up the rest of the space? Just thinking in terms of what we would set this configuration option to if we had it! I suppose that gets into how we want to resolve this comment and how we want to communicate this limit to someone whose crate is getting rejected who has a |
Since the original issue was created, we've added the requirement that the publishing user have a verified email address. It's currently pretty early in the process, and it doesn't depend on the crate content at all, but it does need a database connection. If I'm following the logic you've laid out here, I think it should get inserted here? Do you agree @jtgeibel ? ### With database, outside of main transaction
* Obtain database connection
+ * Ensure user has a verified email address
* Ensure name is not reserved (also the reason that I'm all of the sudden all over this issue is that I think this can be split into a bunch of smaller contributions we can get new people to swarm on, and also it'll help us enable |
Thanks for taking a fresh look at this @carols10cents! I definitely agree with the approach of taking small, reviewable steps towards this general outline.
👍 This is what I have in mind as well. In general, I think we should review logic that is currently in models like I've gone through the code again and updated the "Currently" section above, splitting out the background job work and adding notes showing where other steps are currently located. I've also added several new steps (to both sections):
No, we don't do that currently. I was considering it mainly as a quick sanity check on the request that can be done very early in the request processing. We might want to land it in an atomic deploy, if we decide to add such a check.
I think it makes sense to add a separate config item here, but I don't know what we should set it to. Maybe we should add some logging to get a feel for some typical metadata sizes. The publish endpoint is fairly attractive from a DOS perspective, and the metadata could kick off a lot of database activity. I could potentially see a scenario where it would be nice to drop this limit quickly via an environment variable. Although I haven't really put much serious though into what such an attack might look like and if this would be an effective mitigation.
Both of these sound great to me! |
Thanks!!! I've updated the diff view and I'm going to update the checklist next :) |
I decided to split off the max metadata length check to a separate issue: #1896 |
Hm, shouldn't we wait until we validate everything to upload to s3? If the crate doesn't make it into the database or the index, then I don't think there's a way for cargo to download the |
I agree. Those uploads are now done in the background jobs. I think I
forgot to remove it in my last edit (and I agree that was probably the
wrong place in my original plan anyway).
…On Thu, Nov 14, 2019, 14:18 Carol (Nichols || Goulding) < ***@***.***> wrote:
### Verify Request Body
...
* Upload crate and rendered README (if not `--dry-run`)
### With database, outside of main transaction
...
### Start writing within the transaction
...
Hm, shouldn't we wait until we validate everything to upload to s3? If the
crate doesn't make it into the database or the index, then I don't *think*
there's a way for cargo to download the .crate thinking it's legitimate,
but I wonder about invalid stuff (say, by someone who doesn't own a crate)
being uploaded and accessible from the direct static.crates.io URL. Am I
missing something?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1522?email_source=notifications&email_token=AAAFNKRSDIJG2WBOIZ3P3QTQTWP6VA5CNFSM4F3VU6O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEC7ETI#issuecomment-554037837>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAFNKR5HV33LRNGKSSRIDTQTWP6VANCNFSM4F3VU6OQ>
.
|
The readme rendering and uploading is done in a background job, but I don't think uploading the |
You're right. I've updated the issue above to clarify. (I thought I had already done so, but maybe I didn't hit save on that edit.) In the proposed section above, I've added the crate upload step to be the last step before enqueueing the background jobs. |
since there hasn't been any activity here for 2.5 years, I guess we can close this issue. feel free to reopen if it becomes relevant again :) |
I've recently gone through our publishing logic and would like to document my findings and propose some changes. This was originally raised in the context of a
cargo publish --dry-run
option in #1517, but I think some refactoring here would help with proposed enhancements such as background jobs (#1466) and direct client uploads to S3.Currently (edit: updated 2019-11-13)
Our publishing logic currently follows the following sequence:
NewCrate::validate
)NewCrate::ensure_name_not_reserved
)NewCrate::save_new_crate
)NewCrate::create_or_update
)NewCrate::create_or_update
)NewVersion::validate_license
viaNewVersion::new
)NewVersion::save
)NewVersion::save
)models::dependency::add_dependencies
)Keyword::update_crate
)Category::update_crate
)Badge::update_crate
)uploaders::upload_crate
)Background job: Render and upload README
Defined in
render::render_and_upload_readme
Background job: Update Index
Defined in
git::add_crate
Proposed
Notes
Verify Headers
Verify Request Body
With database, outside of main transaction
Start writing within the transaction
The text was updated successfully, but these errors were encountered: