Add Categorization! #473

Merged
merged 34 commits into from Dec 29, 2016

Conversation

Projects
None yet
5 participants
@carols10cents
Member

carols10cents commented Nov 17, 2016

Hiiii! This is the first part of adding categories to crates.io!

What this PR does

  • On server startup, take the list of categories in src/categories.txt and add or remove categories from the database as needed to make the database match the categories. This way, the list of available categories can be changed via pull request to this repo.
  • A /categories page not yet linked from anywhere that displays all the categories in alphabetical order
  • A /categories/whatever page that will list all the crates in a category
  • If a crate has categories, they will be listed on that crate's page in the sidebar under Keywords
  • If categories are specified in the metadata uploaded with a new crate request from cargo, they will be added to the crate. (Implemented in rust-lang/cargo#3301)

Testing this PR

$ cargo build
$ cargo run --bin migrate
$ cargo run --bin sync-categories
$ cargo run --bin server

Start the frontend as well (Note: I had to use ember server --proxy http://127.0.0.1:8888 instead of yarn run start:local with the latest version of yarn in order to actually use my local backend, I'm investigating why this is the case)

  • You should be able to go to /categories and see 2 categories, "Development Tools" and "Libraries"
  • You should be able to click on those categories and see 0 crates. They each have subcategories, that should also have 0 crates.
  • Publish a crate to your local crates.io (I had to comment out code uploading to s3 since I didn't want to do that)
  • Using rust-lang/cargo#3301, publish to your local index
  • You should see your crate on that category's page
  • You should see the category on the crate's page (might have to do a shift-reload because of ember caching)

Deploying this PR

It should be totally fine to deploy this PR to crates.io. Nothing links to /categories, and cargo does not publish category metadata yet-- and neither cargo nor crates.io will complain if a crate is not in any categories.

After this PR is merged

  • I will start a PR to categories.txt where we can bikeshed what we actually want the initial set of categories to be
  • I will open a PR to add a link to /categories somewhere
  • I will send some popular crates PRs to add categories
@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 17, 2016

Member

Ummmmm I have no idea why the tests are failing with "Once instance has previously been poisoned" :(

Member

carols10cents commented Nov 17, 2016

Ummmmm I have no idea why the tests are failing with "Once instance has previously been poisoned" :(

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 18, 2016

Member

Aaaand I forgot to add support for subcategories, just realized that. I'd love any thoughts in the meantime, but I'm going to be adding a few commits :)

Member

carols10cents commented Nov 18, 2016

Aaaand I forgot to add support for subcategories, just realized that. I'd love any thoughts in the meantime, but I'm going to be adding a few commits :)

@carols10cents carols10cents referenced this pull request in rust-lang/cargo Nov 18, 2016

Merged

Upload categories specified in the manifest #3301

@alexcrichton

Awesome! I think the test error is fixed on master, so a rebase should pick that up. I'm also fine with the rollout strategy here, sounds good to me.

Some other thoughts of mine:

  • I wonder if there's a warning message we could provide back to Cargo for nonexistent categories? Either for a typo'd category or for just picking something that doesn't exist.
  • Right now we can't update a crate's metadata without publishing a new version, the addition of this feature is unfortunately likely to exacerbate this problem. Not something that needs to be fixed here per se, but in theory it's not too hard to add a cargo subcommand to update crate metadata...
+ ALTER TABLE crates_categories \
+ ADD CONSTRAINT fk_crates_categories_category_id \
+ FOREIGN KEY (category_id) REFERENCES categories (id) \
+ ON DELETE CASCADE", " \

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

This just means if we delete a category it'll auto-delete everything from crates_categories, right?

@alexcrichton

alexcrichton Nov 21, 2016

Member

This just means if we delete a category it'll auto-delete everything from crates_categories, right?

This comment has been minimized.

@carols10cents

carols10cents Nov 21, 2016

Member

Correct, is that ok?

@carols10cents

carols10cents Nov 21, 2016

Member

Correct, is that ok?

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Oh yeah fine by me, just not used to fancy sql features :)

@alexcrichton

alexcrichton Nov 21, 2016

Member

Oh yeah fine by me, just not used to fancy sql features :)

src/bin/sync-categories.rs
+ let in_clause = categories.iter()
+ .map(|c| format!("'{}'", c))
+ .collect::<Vec<_>>()
+ .join(",");

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Perhaps just for the sake of hygiene, but could we use rust-postgres's ability for escaping here? (rather than doing so ourselves).

I do realize though that categories.txt isn't user input, I figure it's just a good idea to stick with built-in escaping.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Perhaps just for the sake of hygiene, but could we use rust-postgres's ability for escaping here? (rather than doing so ourselves).

I do realize though that categories.txt isn't user input, I figure it's just a good idea to stick with built-in escaping.

This comment has been minimized.

@carols10cents

carols10cents Nov 21, 2016

Member

rust-postgres doesn't support this for the bulk import of multiple values that I wanted to do here, see sfackler/rust-postgres#218. I can change the WHERE NOT IN to use rust-postgres' escaping, though.

Another option is to make one insert for every category... this doesn't really NEED to be fast... wdyt?

@carols10cents

carols10cents Nov 21, 2016

Member

rust-postgres doesn't support this for the bulk import of multiple values that I wanted to do here, see sfackler/rust-postgres#218. I can change the WHERE NOT IN to use rust-postgres' escaping, though.

Another option is to make one insert for every category... this doesn't really NEED to be fast... wdyt?

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Eh I ended up seeing more of these throughout the codebase anyway (which were all reasonable), so it's fine to ignore this. A minor "concern" anyway.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Eh I ended up seeing more of these throughout the codebase anyway (which were all reasonable), so it's fine to ignore this. A minor "concern" anyway.

src/category.rs
+ pub fn find_by_category(conn: &GenericConnection, name: &str)
+ -> CargoResult<Option<Category>> {
+ let stmt = try!(conn.prepare("SELECT * FROM categories \
+ WHERE category = $1"));

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Could you also add an index to the database for this category field? (unless I already missed it)

Also, perhaps the index and this could be based on lower(category) and lower($1) so we don't have to worry about case issues?

@alexcrichton

alexcrichton Nov 21, 2016

Member

Could you also add an index to the database for this category field? (unless I already missed it)

Also, perhaps the index and this could be based on lower(category) and lower($1) so we don't have to worry about case issues?

This comment has been minimized.

@carols10cents

carols10cents Nov 21, 2016

Member

Postgres automatically adds an index for UNIQUE fields (see https://www.postgresql.org/docs/current/static/ddl-constraints.html 5.3.3. Unique Constraints, "Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns listed in the constraint.").

I can make it lower(category) though! I was thinking exact should mean exact, but it would probably be a bit friendlier to not care about case.

@carols10cents

carols10cents Nov 21, 2016

Member

Postgres automatically adds an index for UNIQUE fields (see https://www.postgresql.org/docs/current/static/ddl-constraints.html 5.3.3. Unique Constraints, "Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns listed in the constraint.").

I can make it lower(category) though! I was thinking exact should mean exact, but it would probably be a bit friendlier to not care about case.

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Postgres foils me again! If we add slugs then I think this becomes a non-issue, we can just ensure that all slugs are always lowercase.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Postgres foils me again! If we add slugs then I think this becomes a non-issue, we can just ensure that all slugs are always lowercase.

src/category.rs
+ pub fn encodable(self) -> EncodableCategory {
+ let Category { id: _, crates_cnt, category, created_at } = self;
+ EncodableCategory {
+ id: category.clone(),

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Could this be the lowercase version to be "pretty"?

Although now that I think more about this, maybe we should have two fields in the categories table. One for a slug (url friendly, this field) and another for the name? That way we can support categories with punctuation and spaces and such, all while keeping a nice url.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Could this be the lowercase version to be "pretty"?

Although now that I think more about this, maybe we should have two fields in the categories table. One for a slug (url friendly, this field) and another for the name? That way we can support categories with punctuation and spaces and such, all while keeping a nice url.

This comment has been minimized.

@carols10cents

carols10cents Nov 21, 2016

Member

Sure! Should categories.txt still list only the title-cased, punctuated version, and then the slug is lowercased, replace all spaces with hyphens, and remove other punctuation? Or should categories.txt list the slug as well?

If people specify the slug as their crate's category value, should that be valid?

@carols10cents

carols10cents Nov 21, 2016

Member

Sure! Should categories.txt still list only the title-cased, punctuated version, and then the slug is lowercased, replace all spaces with hyphens, and remove other punctuation? Or should categories.txt list the slug as well?

If people specify the slug as their crate's category value, should that be valid?

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Let's go with a format like:

slug the name of this category is everything to the end of the line
slug2 another category

(or something like that)

Also hm that is a good point about what you specify in the manifest. I'm tempted to say slugs, not names? That way we can tweak names as we see fit (maybe even localize them one day).

@alexcrichton

alexcrichton Nov 21, 2016

Member

Let's go with a format like:

slug the name of this category is everything to the end of the line
slug2 another category

(or something like that)

Also hm that is a good point about what you specify in the manifest. I'm tempted to say slugs, not names? That way we can tweak names as we see fit (maybe even localize them one day).

This comment has been minimized.

@carols10cents

carols10cents Nov 22, 2016

Member

Ok, if that's the case, then as a crate author I'd want to be able to see what the valid slugs are. I'm going to make a page like pypi has that's a plain text list of the exact valid category specifiers. We can reference that in the warning message when someone specifies an invalid slug too!

@carols10cents

carols10cents Nov 22, 2016

Member

Ok, if that's the case, then as a crate author I'd want to be able to see what the valid slugs are. I'm going to make a page like pypi has that's a plain text list of the exact valid category specifiers. We can reference that in the warning message when someone specifies an invalid slug too!

src/category.rs
+
+impl Category {
+ pub fn find_by_category(conn: &GenericConnection, name: &str)
+ -> CargoResult<Option<Category>> {

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

If you want to change this to just return CargoResult<Category> an internalize the chain_error(|| NotFound) that'd also be fine.

@alexcrichton

alexcrichton Nov 21, 2016

Member

If you want to change this to just return CargoResult<Category> an internalize the chain_error(|| NotFound) that'd also be fine.

This comment has been minimized.

@carols10cents

carols10cents Nov 21, 2016

Member

I can do that!

@carols10cents

carols10cents Nov 21, 2016

Member

I can do that!

src/category.rs
+ let new_categories = try!(
+ Category::find_all_by_category(conn, categories)
+ );
+ let new_categories_ids: HashSet<_> = new_categories.iter().map(|cat| {

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

It may be a good idea here to put a limit on the number of new categories that can be added. Something high like 50 or 100 but just don't want to blow out the database or anything like that.

@alexcrichton

alexcrichton Nov 21, 2016

Member

It may be a good idea here to put a limit on the number of new categories that can be added. Something high like 50 or 100 but just don't want to blow out the database or anything like that.

This comment has been minimized.

@carols10cents

carols10cents Nov 21, 2016

Member

Hm, is that not taken care of by Decode in upload.rs before getting here? Originally, we weren't going to add a limit on categories, but then I saw keywords are limited to 5, and that seemed reasonable, so I did that as well. I'm happy to add a limit here too, though, preventing database overload is good! Just want to check that adding logic here isn't been too paranoid given upload.rs...

@carols10cents

carols10cents Nov 21, 2016

Member

Hm, is that not taken care of by Decode in upload.rs before getting here? Originally, we weren't going to add a limit on categories, but then I saw keywords are limited to 5, and that seemed reasonable, so I did that as well. I'm happy to add a limit here too, though, preventing database overload is good! Just want to check that adding logic here isn't been too paranoid given upload.rs...

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

Oh yup that'd do it, missed that before I got here, sounds good to me!

@alexcrichton

alexcrichton Nov 21, 2016

Member

Oh yup that'd do it, missed that before I got here, sounds good to me!

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

(that's the better location for a validation imo)

@alexcrichton

alexcrichton Nov 21, 2016

Member

(that's the better location for a validation imo)

fn new_req_body(krate: Crate, version: &str, deps: Vec<u::CrateDependency>,
- kws: Vec<String>) -> Vec<u8> {
+ kws: Vec<String>, cats: Vec<String>) -> Vec<u8> {

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

🐈 🐱

@alexcrichton

alexcrichton Nov 21, 2016

Member

🐈 🐱

src/tests/category.rs
+
+ // Attempting to add one valid category and one invalid category
+ Category::update_crate(tx(&req), &krate, &["cat1".to_string(),
+ "catnope".to_string()]).unwrap();

This comment has been minimized.

@alexcrichton

alexcrichton Nov 21, 2016

Member

I was initially thinking we should error out on bad category names, but this make so much more sense. We shouldn't prevent everyone from publishing just because we deleted a category...

@alexcrichton

alexcrichton Nov 21, 2016

Member

I was initially thinking we should error out on bad category names, but this make so much more sense. We shouldn't prevent everyone from publishing just because we deleted a category...

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 21, 2016

Member

I wonder if there's a warning message we could provide back to Cargo for nonexistent categories? Either for a typo'd category or for just picking something that doesn't exist.

Yeah, I've been struggling with this one, because there doesn't seem to be a mechanism for returning warnings to cargo from crates.io right now, just errors, unless I'm missing something. I was thinking of making a separate request that only fetches the list of categories and then cargo creates the warning about an invalid category name? Or do you think it should get rolled into the /new crate request response?

Right now we can't update a crate's metadata without publishing a new version, the addition of this feature is unfortunately likely to exacerbate this problem. Not something that needs to be fixed here per se, but in theory it's not too hard to add a cargo subcommand to update crate metadata...

I'm up for working on that if you'd like! We're tracking under my estimations at this point, so we should have hours for it. Definitely a new set of PRs though.

Member

carols10cents commented Nov 21, 2016

I wonder if there's a warning message we could provide back to Cargo for nonexistent categories? Either for a typo'd category or for just picking something that doesn't exist.

Yeah, I've been struggling with this one, because there doesn't seem to be a mechanism for returning warnings to cargo from crates.io right now, just errors, unless I'm missing something. I was thinking of making a separate request that only fetches the list of categories and then cargo creates the warning about an invalid category name? Or do you think it should get rolled into the /new crate request response?

Right now we can't update a crate's metadata without publishing a new version, the addition of this feature is unfortunately likely to exacerbate this problem. Not something that needs to be fixed here per se, but in theory it's not too hard to add a cargo subcommand to update crate metadata...

I'm up for working on that if you'd like! We're tracking under my estimations at this point, so we should have hours for it. Definitely a new set of PRs though.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Nov 21, 2016

Member

Oh I was thinking that whatever endpoint you use to upload errors would also transmit back warnings. We'd have to update Cargo yeah to process those warnings. I haven't looked at it in awhile, but I'd hope at least that it'd be backcompat to add more fields to the json response without breaking existing Cargos...

Member

alexcrichton commented Nov 21, 2016

Oh I was thinking that whatever endpoint you use to upload errors would also transmit back warnings. We'd have to update Cargo yeah to process those warnings. I haven't looked at it in awhile, but I'd hope at least that it'd be backcompat to add more fields to the json response without breaking existing Cargos...

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 21, 2016

Member

Oh I was thinking that whatever endpoint you use to upload errors would also transmit back warnings. We'd have to update Cargo yeah to process those warnings. I haven't looked at it in awhile, but I'd hope at least that it'd be backcompat to add more fields to the json response without breaking existing Cargos...

Yeah, I think that should work, I was just hesitant to add new protocols without checking. But since you're into it, I'll give it a try! :)

Member

carols10cents commented Nov 21, 2016

Oh I was thinking that whatever endpoint you use to upload errors would also transmit back warnings. We'd have to update Cargo yeah to process those warnings. I haven't looked at it in awhile, but I'd hope at least that it'd be backcompat to add more fields to the json response without breaking existing Cargos...

Yeah, I think that should work, I was just hesitant to add new protocols without checking. But since you're into it, I'll give it a try! :)

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Nov 21, 2016

Member

Oh and a thought about sync-categories, perhaps that could just get executed whenever the server starts? That way we don't need to maintain a separate binary and don't need to modify deployments.

Member

alexcrichton commented Nov 21, 2016

Oh and a thought about sync-categories, perhaps that could just get executed whenever the server starts? That way we don't need to maintain a separate binary and don't need to modify deployments.

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 22, 2016

Member

Made some progress on this today, addressed all your comments I think, and added subcategories.

Buuuuut I broke category display on the crates page :( Something isn't hooked together the way ember wants it to be, and I'm not sure what I changed yet.

And I want to make the plain-text list of slugs still.

And my cargo PR needs to be updated to do something with the warnings this should return on unknown category names now.

Just wanted to push up my progress!

Member

carols10cents commented Nov 22, 2016

Made some progress on this today, addressed all your comments I think, and added subcategories.

Buuuuut I broke category display on the crates page :( Something isn't hooked together the way ember wants it to be, and I'm not sure what I changed yet.

And I want to make the plain-text list of slugs still.

And my cargo PR needs to be updated to do something with the warnings this should return on unknown category names now.

Just wanted to push up my progress!

src/categories.rs
+ let categories = include_str!("./categories.txt");
+
+ let slug_categories: Vec<_> = categories.lines().map(|c| {
+ let mut parts = c.split(' ');

This comment has been minimized.

@alexcrichton

alexcrichton Nov 28, 2016

Member

This can be splitn so only two matches at most are returned perhaps?

@alexcrichton

alexcrichton Nov 28, 2016

Member

This can be splitn so only two matches at most are returned perhaps?

This comment has been minimized.

@carols10cents

carols10cents Nov 28, 2016

Member

Ah, cool!

@carols10cents

carols10cents Nov 28, 2016

Member

Ah, cool!

+ let sql = format!("\
+ SELECT COUNT(*) \
+ FROM {} \
+ WHERE category NOT LIKE '%::%'",

This comment has been minimized.

@alexcrichton

alexcrichton Nov 28, 2016

Member

Are we thinking two levels of nesting max? I wonder if it's perhaps better to have a parent_id field in the table for a checked pointer?

@alexcrichton

alexcrichton Nov 28, 2016

Member

Are we thinking two levels of nesting max? I wonder if it's perhaps better to have a parent_id field in the table for a checked pointer?

This comment has been minimized.

@alexcrichton

alexcrichton Nov 28, 2016

Member

(also helps things be more explicit elsewhere)

@alexcrichton

alexcrichton Nov 28, 2016

Member

(also helps things be more explicit elsewhere)

This comment has been minimized.

@carols10cents

carols10cents Nov 28, 2016

Member

Nope, I was thinking an arbitrary amount of levels. I started down the path of having a parent_id field, but the SQL for updating and querying got really complicated... I can pull that back out and show you if you'd like to take a look?

@carols10cents

carols10cents Nov 28, 2016

Member

Nope, I was thinking an arbitrary amount of levels. I started down the path of having a parent_id field, but the SQL for updating and querying got really complicated... I can pull that back out and show you if you'd like to take a look?

This comment has been minimized.

@alexcrichton

alexcrichton Nov 28, 2016

Member

Ah yeah if you have it on hand I wouldn't mind taking a peek.

@alexcrichton

alexcrichton Nov 28, 2016

Member

Ah yeah if you have it on hand I wouldn't mind taking a peek.

This comment has been minimized.

@carols10cents

carols10cents Nov 29, 2016

Member

Ah yeah if you have it on hand I wouldn't mind taking a peek.

Ugh, it looks like I didn't check it in anywhere, unfortunately.

The gist of it is for updating the category names, we wouldn't be able to do one bulk upsert, we'd have to select each category's parent category first (in order to be able to set the parent_id), which means making sure parents are in the table before children, etc.

And to get all crates in a particular category and any of its subcategories, instead of this query we'd have to do a recursive CTE like:


WITH RECURSIVE recursetree(id, parent_ids) AS (
    SELECT id, NULL::int[] || parent_id
    FROM categories 
    WHERE parent_id IS NULL
  UNION ALL
    SELECT 
    c.id, 
    rt.parent_ids || c.parent_id
    FROM categories c
    JOIN recursetree rt ON rt.id = c.parent_id
  )

SELECT * 
FROM crates
INNER JOIN crates_categories 
ON crates.id = crates_categories.crate_id 
WHERE crates_categories.category_id IN (
  SELECT id
  FROM recursetree
  WHERE parent_ids @> ARRAY[(
      SELECT id 
      FROM categories 
      WHERE slug = 'development-tools'
  )]
  UNION
  SELECT id
  FROM categories
  WHERE slug = 'development-tools'
);

There might be ways to make this a little better, but it makes me feel kind of icky :-/

@carols10cents

carols10cents Nov 29, 2016

Member

Ah yeah if you have it on hand I wouldn't mind taking a peek.

Ugh, it looks like I didn't check it in anywhere, unfortunately.

The gist of it is for updating the category names, we wouldn't be able to do one bulk upsert, we'd have to select each category's parent category first (in order to be able to set the parent_id), which means making sure parents are in the table before children, etc.

And to get all crates in a particular category and any of its subcategories, instead of this query we'd have to do a recursive CTE like:


WITH RECURSIVE recursetree(id, parent_ids) AS (
    SELECT id, NULL::int[] || parent_id
    FROM categories 
    WHERE parent_id IS NULL
  UNION ALL
    SELECT 
    c.id, 
    rt.parent_ids || c.parent_id
    FROM categories c
    JOIN recursetree rt ON rt.id = c.parent_id
  )

SELECT * 
FROM crates
INNER JOIN crates_categories 
ON crates.id = crates_categories.crate_id 
WHERE crates_categories.category_id IN (
  SELECT id
  FROM recursetree
  WHERE parent_ids @> ARRAY[(
      SELECT id 
      FROM categories 
      WHERE slug = 'development-tools'
  )]
  UNION
  SELECT id
  FROM categories
  WHERE slug = 'development-tools'
);

There might be ways to make this a little better, but it makes me feel kind of icky :-/

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 29, 2016

Member

Hmmm, interesting.... travis can't seem to fetch https://github.com/rust-lang/crates.io-index.... neat.

Member

carols10cents commented Nov 29, 2016

Hmmm, interesting.... travis can't seem to fetch https://github.com/rust-lang/crates.io-index.... neat.

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 29, 2016

Member

Ok, I gave up trying to get around Ember to render plain text like pypi and I just made a page to list all the valid category slugs within the usual crates.io template:

slugs-list-page

I'm going to add this URL to the warning on the cargo side when a crate has an invalid category slug.

Some more screenshots:

The top-level categories list page:

categories-page

Clicking into the Libraries category, showing the subcategories and all crates in that category and its subcategories:

libraries-category-page

Clicking into the Libraries::Date and Time subcategory:

date-and-time-subcategories-page

Categories on a crate's page:

categories-on-crate-page

Member

carols10cents commented Nov 29, 2016

Ok, I gave up trying to get around Ember to render plain text like pypi and I just made a page to list all the valid category slugs within the usual crates.io template:

slugs-list-page

I'm going to add this URL to the warning on the cargo side when a crate has an invalid category slug.

Some more screenshots:

The top-level categories list page:

categories-page

Clicking into the Libraries category, showing the subcategories and all crates in that category and its subcategories:

libraries-category-page

Clicking into the Libraries::Date and Time subcategory:

date-and-time-subcategories-page

Categories on a crate's page:

categories-on-crate-page

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Nov 29, 2016

Member

Ok! I changed my mind about how to communicate about unrecognized category slugs, but that and the cargo side in rust-lang/cargo#3301 are done now. Both sides still happily tolerate the other side not being deployed yet.

So yeah, I think this is done, unless you think we should change to having parent_ids or have any other changes you'd like me to make.

Member

carols10cents commented Nov 29, 2016

Ok! I changed my mind about how to communicate about unrecognized category slugs, but that and the cargo side in rust-lang/cargo#3301 are done now. Both sides still happily tolerate the other side not being deployed yet.

So yeah, I think this is done, unless you think we should change to having parent_ids or have any other changes you'd like me to make.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 1, 2016

Member

Ok I think it makes sense to me to use :: instead of parent_id from what you're saying, so sounds good to me. Want to update the Travis config to use stable Rust so we can see if this goes green?

Member

alexcrichton commented Dec 1, 2016

Ok I think it makes sense to me to use :: instead of parent_id from what you're saying, so sounds good to me. Want to update the Travis config to use stable Rust so we can see if this goes green?

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 5, 2016

Member

IT'S PASSING IT'S PASSING!!!!!

Member

carols10cents commented Dec 5, 2016

IT'S PASSING IT'S PASSING!!!!!

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 7, 2016

Member

I decided to get the PR for categories available started: #488

Member

carols10cents commented Dec 7, 2016

I decided to get the PR for categories available started: #488

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Dec 8, 2016

👎

I don't like cateogries. If you publish a crate thats outside of the list of already available categories, you will be forgotten. This leads to a totally pointless bikeshed disucssion which wil never end. Keywords are better, you can chose them yourself. Text search is better, it also finds the crates which don't have it in their keywords but on the description.

How do categories improve upon keywords? Yes, now there are not "HTTP-Server" and "HTTPServer" keywords, so its more unified, but what overall advantage is there?

est31 commented Dec 8, 2016

👎

I don't like cateogries. If you publish a crate thats outside of the list of already available categories, you will be forgotten. This leads to a totally pointless bikeshed disucssion which wil never end. Keywords are better, you can chose them yourself. Text search is better, it also finds the crates which don't have it in their keywords but on the description.

How do categories improve upon keywords? Yes, now there are not "HTTP-Server" and "HTTPServer" keywords, so its more unified, but what overall advantage is there?

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Dec 8, 2016

Mhh, maybe there is some advantage in browsing. One idea for improvement: why not implement categories as groups of keywords? So e.g. a crate is in the Cryptography cateogry if it has one of the (crypto, cryptography, encryption) keywords. This would remove the need to retrofit each crate to the categories system and make publishing of new crates more simple.

est31 commented Dec 8, 2016

Mhh, maybe there is some advantage in browsing. One idea for improvement: why not implement categories as groups of keywords? So e.g. a crate is in the Cryptography cateogry if it has one of the (crypto, cryptography, encryption) keywords. This would remove the need to retrofit each crate to the categories system and make publishing of new crates more simple.

@Nemo157

This comment has been minimized.

Show comment
Hide comment
@Nemo157

Nemo157 Dec 8, 2016

Contributor

why not implement categories as groups of keywords?

That was exactly what I was thinking while reading through this, it would definitely help with getting existing crates with good metadata into this system without forcing a new version to be published.

Contributor

Nemo157 commented Dec 8, 2016

why not implement categories as groups of keywords?

That was exactly what I was thinking while reading through this, it would definitely help with getting existing crates with good metadata into this system without forcing a new version to be published.

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 8, 2016

Member

I don't like cateogries. If you publish a crate thats outside of the list of already available categories, you will be forgotten. This leads to a totally pointless bikeshed disucssion which wil never end. Keywords are better, you can chose them yourself. Text search is better, it also finds the crates which don't have it in their keywords but on the description.

How do categories improve upon keywords? Yes, now there are not "HTTP-Server" and "HTTPServer" keywords, so its more unified, but what overall advantage is there?

Categories are not meant to replace keywords, they are meant to augment them. Crate authors and crate users will be free to use whichever they find most useful! :)

Member

carols10cents commented Dec 8, 2016

I don't like cateogries. If you publish a crate thats outside of the list of already available categories, you will be forgotten. This leads to a totally pointless bikeshed disucssion which wil never end. Keywords are better, you can chose them yourself. Text search is better, it also finds the crates which don't have it in their keywords but on the description.

How do categories improve upon keywords? Yes, now there are not "HTTP-Server" and "HTTPServer" keywords, so its more unified, but what overall advantage is there?

Categories are not meant to replace keywords, they are meant to augment them. Crate authors and crate users will be free to use whichever they find most useful! :)

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 8, 2016

Member

Mhh, maybe there is some advantage in browsing. One idea for improvement: why not implement categories as groups of keywords? So e.g. a crate is in the Cryptography cateogry if it has one of the (crypto, cryptography, encryption) keywords. This would remove the need to retrofit each crate to the categories system and make publishing of new crates more simple.

That was exactly what I was thinking while reading through this, it would definitely help with getting existing crates with good metadata into this system without forcing a new version to be published.

One of the problems with this approach is that there are keywords that should be split into two categories instead-- for instance, the cli keyword currently includes argonaut, "A simple argument parser" to help you build CLIs, and betsey, "An AppVeyor cli written in Rust", which is an application with a CLI for use with a particular tool, appveyor. IMO these should end up in Libraries::Command-line interface and Applications::System tools, respectively.

Member

carols10cents commented Dec 8, 2016

Mhh, maybe there is some advantage in browsing. One idea for improvement: why not implement categories as groups of keywords? So e.g. a crate is in the Cryptography cateogry if it has one of the (crypto, cryptography, encryption) keywords. This would remove the need to retrofit each crate to the categories system and make publishing of new crates more simple.

That was exactly what I was thinking while reading through this, it would definitely help with getting existing crates with good metadata into this system without forcing a new version to be published.

One of the problems with this approach is that there are keywords that should be split into two categories instead-- for instance, the cli keyword currently includes argonaut, "A simple argument parser" to help you build CLIs, and betsey, "An AppVeyor cli written in Rust", which is an application with a CLI for use with a particular tool, appveyor. IMO these should end up in Libraries::Command-line interface and Applications::System tools, respectively.

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 16, 2016

Member

@alexcrichton We're feeling good about this PR-- categories.txt is now categories.toml, and we've tested out the following locally:

  • Migrating to where crates.io currently is, to simulate deploying this PR to production
  • Running the migrations added in this pull request
  • Starting the server
  • Navigating through categories without any crates in them
  • Listing /category_slugs
  • Sorting top-level categories alphabetically (default) and by number of crates

Using this PR and my cargo PR, I tested:

  • Publishing a crate to a top-level category
  • Publishing a crate to a mid-level category
  • Publishing a crate to a leaf category
  • Attempting to publish to a nonexistent category
  • Viewing a crate page and seeing/navigating to its categories
  • Removing a category from a crate
  • Sorting crates within a category by downloads (default) and alphabetically

I also tested making these changes to categories.toml and restarting the server:

  • Removing a category that a crate was in; the crate was disassociated with that record
  • Adding a new category
  • Editing a name and description; it did not affect crate associations

I was also seeing a bunch of postgres deadlocks, so I changed the travis config to use 1 test thread and that seems to be helping.....

PR #488 has been updated to have descriptions, so once everyone feels good about those, I think we can merge these two PRs and rust-lang/cargo#3301!

Member

carols10cents commented Dec 16, 2016

@alexcrichton We're feeling good about this PR-- categories.txt is now categories.toml, and we've tested out the following locally:

  • Migrating to where crates.io currently is, to simulate deploying this PR to production
  • Running the migrations added in this pull request
  • Starting the server
  • Navigating through categories without any crates in them
  • Listing /category_slugs
  • Sorting top-level categories alphabetically (default) and by number of crates

Using this PR and my cargo PR, I tested:

  • Publishing a crate to a top-level category
  • Publishing a crate to a mid-level category
  • Publishing a crate to a leaf category
  • Attempting to publish to a nonexistent category
  • Viewing a crate page and seeing/navigating to its categories
  • Removing a category from a crate
  • Sorting crates within a category by downloads (default) and alphabetically

I also tested making these changes to categories.toml and restarting the server:

  • Removing a category that a crate was in; the crate was disassociated with that record
  • Adding a new category
  • Editing a name and description; it did not affect crate associations

I was also seeing a bunch of postgres deadlocks, so I changed the travis config to use 1 test thread and that seems to be helping.....

PR #488 has been updated to have descriptions, so once everyone feels good about those, I think we can merge these two PRs and rust-lang/cargo#3301!

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 17, 2016

Member

@carols10cents that all sounds great to me! I'm a little worried about the postgres deadlocks though. I can't quite seem to find a reference to that though, was that removed at some point?

Member

alexcrichton commented Dec 17, 2016

@carols10cents that all sounds great to me! I'm a little worried about the postgres deadlocks though. I can't quite seem to find a reference to that though, was that removed at some point?

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 17, 2016

Member

@carols10cents that all sounds great to me! I'm a little worried about the postgres deadlocks though. I can't quite seem to find a reference to that though, was that removed at some point?

Reference to what, exactly? This is the change I made to combat it, here are a few builds that failed because of deadlocks:

I haven't seen this happening at all locally. I've seen this happen with tests run on rails apps with postgres and I've never had much luck debugging them :-/ decreasing the amount of queries trying to happen all at the same time seems to help... I'm going to try to get travis to dump some logs or something.

Member

carols10cents commented Dec 17, 2016

@carols10cents that all sounds great to me! I'm a little worried about the postgres deadlocks though. I can't quite seem to find a reference to that though, was that removed at some point?

Reference to what, exactly? This is the change I made to combat it, here are a few builds that failed because of deadlocks:

I haven't seen this happening at all locally. I've seen this happen with tests run on rails apps with postgres and I've never had much luck debugging them :-/ decreasing the amount of queries trying to happen all at the same time seems to help... I'm going to try to get travis to dump some logs or something.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 19, 2016

Member

Oh sorry I didn't even realize we had a --test-threads argument, I was grepping for RUST_THREADS_* etc. I'll dig into those logs.

Member

alexcrichton commented Dec 19, 2016

Oh sorry I didn't even realize we had a --test-threads argument, I was grepping for RUST_THREADS_* etc. I'll dig into those logs.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 19, 2016

Member

@carols10cents ok so the chapter on postgres deadlocks is helpful here, but this is definitely something we don't want to paper over. If we do it sounds like we'll definitely hit it in deploying.

Looks like there's just two concurrent transactions that want locks in different orders here. My guess is that the locks are table/row level based on UPDATE and INSERT statements rather than explicit locks (which IIRC we don't use).

It looks like it's always good_categories that's failing, which I guess locks the crates table for an insertion followed by a lock of the categories table to insert a relation. I guess the deadlock happens when something else locks the categories table and then tries to lock the crates table?

I can't say I've ever encountered an error like this before and am currently not away of conventional fixes (if any), but does that ring any bells? Perhaps we can reorder some statements somewhere to resolve this?

Member

alexcrichton commented Dec 19, 2016

@carols10cents ok so the chapter on postgres deadlocks is helpful here, but this is definitely something we don't want to paper over. If we do it sounds like we'll definitely hit it in deploying.

Looks like there's just two concurrent transactions that want locks in different orders here. My guess is that the locks are table/row level based on UPDATE and INSERT statements rather than explicit locks (which IIRC we don't use).

It looks like it's always good_categories that's failing, which I guess locks the crates table for an insertion followed by a lock of the categories table to insert a relation. I guess the deadlock happens when something else locks the categories table and then tries to lock the crates table?

I can't say I've ever encountered an error like this before and am currently not away of conventional fixes (if any), but does that ring any bells? Perhaps we can reorder some statements somewhere to resolve this?

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 22, 2016

Member

I haven't had success reproducing this on my mac, and I downloaded a travis image was unable to reproduce the deadlock there, either. I have turned on travis on the integer32llc organization and I HAVE reproduced the deadlock on a different branch that's not a PR, I'm going to experiment over there in order to avoid clogging/waiting on the rust-lang travis queue :) 🤞

Member

carols10cents commented Dec 22, 2016

I haven't had success reproducing this on my mac, and I downloaded a travis image was unable to reproduce the deadlock there, either. I have turned on travis on the integer32llc organization and I HAVE reproduced the deadlock on a different branch that's not a PR, I'm going to experiment over there in order to avoid clogging/waiting on the rust-lang travis queue :) 🤞

carols10cents and others added some commits Nov 17, 2016

Use a pretty slug for categories in URLs
Categories are now specified by slug in Cargo.toml. This will allow
crates.io to change the display text of a category but still have crates
in those categories.

Characters allowed in slugs are from RFC 3986, those that are valid in
path segments (pchar) https://tools.ietf.org/html/rfc3986#page-22
Change the header text on a category page
Looking at some of the categories I have locally, I think, for example,
"Command-line argument parsing Crates" is clearer than "All Crates for
category 'Command-line argument parsing'"
Add a page listing all valid category slugs
To direct people to when they have specified an invalid slug.

JSON containing all the slugs is available at
/api/v1/category_slugs, but visiting that in a browser doesn't work.
Make warnings about invalid crate names be JSON instead of text
And cargo will handle making nice English messages out of them.
Add descriptions to categories
Have to switch from a nice batch insert to running a query for each
category so that we can use apostrophes in the descriptions and have
the string escaped for SQL.
Add a heading for Crates on a category page
To better distinguish subcategories and crates. This makes "crates" in
the h1 redundant, especially when there *aren't* subcategories.
Sort crates within a category by downloads by default
There will be an RFC soon about whether this is the best ordering or
not.
Sum crate count in all subcategories in a better way
And make the top-level query that does this consistent with
subcategory queries.
Use a different crate name in a test to prevent deadlocks
This test does a lot of different manipulations of categories and crate
categories and it was using a crate named foo. The good_categories test
also used a crate named foo, and these two tests were causing a postgres
deadlock.

I was able to cause deadlocks more often by duplicating the update_crate
test and the good_categories test:

https://travis-ci.org/integer32llc/crates.io/builds/187302718

Making this change and running the duplicated tests resulted in 0
deadlocks:

https://travis-ci.org/integer32llc/crates.io/builds/187306433

This is unlikely to happen in production; requests get a database
connection that gets closed when the request finishes, and the publish
request only modifies the categories once, not as much as the
update_crate test is. It seems unlikely that two people would publish
the same crate at exactly the same time.
@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 28, 2016

Member

I THINK I HAVE VANQUISHED THE DEADLOCK!!!!

The categories::update_crate test does a lot of different manipulations of categories and crate categories and it was using a crate named foo. The good_categories test also used a crate named foo, and these two tests were causing a postgres deadlock.

I was able to cause deadlocks more often by duplicating the update_crate test and the good_categories test:

https://travis-ci.org/integer32llc/crates.io/builds/187302718

Making this change and running the duplicated tests resulted in 0 deadlocks:

https://travis-ci.org/integer32llc/crates.io/builds/187306433

This long-running editing of a crate's categories is unlikely to happen in production; requests get a database connection that gets closed when the request finishes, and the publish request only modifies the categories once, not as much as the update_crate test is. It seems unlikely that two people would publish the same crate at exactly the same time.

Member

carols10cents commented Dec 28, 2016

I THINK I HAVE VANQUISHED THE DEADLOCK!!!!

The categories::update_crate test does a lot of different manipulations of categories and crate categories and it was using a crate named foo. The good_categories test also used a crate named foo, and these two tests were causing a postgres deadlock.

I was able to cause deadlocks more often by duplicating the update_crate test and the good_categories test:

https://travis-ci.org/integer32llc/crates.io/builds/187302718

Making this change and running the duplicated tests resulted in 0 deadlocks:

https://travis-ci.org/integer32llc/crates.io/builds/187306433

This long-running editing of a crate's categories is unlikely to happen in production; requests get a database connection that gets closed when the request finishes, and the publish request only modifies the categories once, not as much as the update_crate test is. It seems unlikely that two people would publish the same crate at exactly the same time.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 29, 2016

Member

Ok that sounds good to me. Want to make sure crates have unique names and I'll merge?

Member

alexcrichton commented Dec 29, 2016

Ok that sounds good to me. Want to make sure crates have unique names and I'll merge?

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 29, 2016

Member

@alexcrichton done! all test crates now have a unique name :)

Member

carols10cents commented Dec 29, 2016

@alexcrichton done! all test crates now have a unique name :)

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
Member

alexcrichton commented Dec 29, 2016

🎊

@alexcrichton alexcrichton merged commit 710f208 into rust-lang:master Dec 29, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 29, 2016

Member

@carols10cents hm it looks like cargo test locally is failing, maybe due to a recent push to master? Mind taking a peek at that?

Member

alexcrichton commented Dec 29, 2016

@carols10cents hm it looks like cargo test locally is failing, maybe due to a recent push to master? Mind taking a peek at that?

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 29, 2016

Member

@carols10cents hm it looks like cargo test locally is failing, maybe due to a recent push to master? Mind taking a peek at that?

On it!

Member

carols10cents commented Dec 29, 2016

@carols10cents hm it looks like cargo test locally is failing, maybe due to a recent push to master? Mind taking a peek at that?

On it!

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 29, 2016

Member

When can we have bors on this repo? ;)

Member

carols10cents commented Dec 29, 2016

When can we have bors on this repo? ;)

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Dec 29, 2016

Member

@alexcrichton Hm, cargo test on master isn't failing for me locally, nor is it failing on travis. Did you happen to try out a previous version of this branch? I did change some of the migrations along the way, maybe dry dropping and creating your cargo_registry_test database and see if that fixes it?

Member

carols10cents commented Dec 29, 2016

@alexcrichton Hm, cargo test on master isn't failing for me locally, nor is it failing on travis. Did you happen to try out a previous version of this branch? I did change some of the migrations along the way, maybe dry dropping and creating your cargo_registry_test database and see if that fixes it?

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 29, 2016

Member

Oh looks like I was missing the S3_BUCKET business, my bad!

Member

alexcrichton commented Dec 29, 2016

Oh looks like I was missing the S3_BUCKET business, my bad!

bors added a commit to rust-lang/cargo that referenced this pull request Jan 17, 2017

Auto merge of #3301 - integer32llc:categories, r=alexcrichton
Upload categories specified in the manifest

This adds support for uploading categories to crates.io, if they are specified in the manifest.

This goes with rust-lang/crates.io#473. It should be fine to merge this PR either before or after that one; crates.io master doesn't care if the categories are in the metadata or not. With that PR, I was able to use this patch with cargo to add categories to a crate!

@shepmaster shepmaster deleted the integer32llc:categorization branch Apr 13, 2017

@nasa42 nasa42 referenced this pull request in rust-unofficial/awesome-rust May 4, 2017

Closed

Transfer #289

@wking wking referenced this pull request in rust-lang/cargo Jan 9, 2018

Open

SPDX dual-license inconsistency #2039

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment