-
Notifications
You must be signed in to change notification settings - Fork 210
Detect changes using crates.io index repository #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I had the feeling that this is required for it to build.
This assures that no matter how often we poll, we will always see all changes since the last time we fetched. Fixes #89
let index = try!(Index::from_path_or_cloned(&self.options.crates_io_index_path)); | ||
let changes = try!(index.fetch_changes()); | ||
for krate in changes.iter().filter(|k| k.kind != ChangeKind::Yanked) { | ||
if self.db_cache.contains(&format!("{}-{}", krate.name, krate.version)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure if that is still needed, assuming that the crates.io index will never repeat itself. It might be that the previous implementation needed it to prevent duplicates or unnecessary work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes checking db_cache is no longer necessary and loading db_cache is actually really expensive. Actually that was the primary reason I always wanted to get new crates from crates.io-index repository.
It is safe to remove checking cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I removed it!
I also have a commit which removes the entire db_cache
, but that might just be too much :).
Awesome, and thank you very much again. There is only one more thing I'd like to mention since we are also updating all dependencies with this PR. docs.rs is currently using an old version of cargo and until docs.rs starts using iron 0.4 we are basically stuck with an old version of cargo. Before merging this PR I'd like to update cargo to a recent revision that still uses openssl 0.7. Cargo started depending on openssl 0.9 with this commit: rust-lang/cargo@15acaa9 Last revision with openssl 0.7 is f6500c6b. I'll update cargo to this revison and merge this PR. This revision is using git2 0.5 but I think there isn't any compatibility problems between git2 0.4 and 0.5. |
Great to hear! |
Docs.rs currently stuck with openssl 0.7. This is only cargo revision that is still using openssl 0.7 instead of 0.9. Ref: #90
@Byron this is working great but there is only one problem. When I first run |
Ahh its using head of |
Awesome! Because Docs.rs is so awesome (!!), I have now switched all my crates (at least those that I remember;)) to use docs.rs automatically. Those who are generated, i.e. the ones of google-apis-rs link to their specific version. Said link right now will only be available, in the worst case, after 10 minutes (I believe to have read that value somewhere here), which could already distract interested parties. |
Yes I was thinking the same yesterday. I already had a FIXME tag about this in daemon.rs. We can return length of |
Unfortunately each google crate is taking 6-10 minutes to build, so everything is fine they are in queue. This is the order of crates docs.rs used for build:
And rest of the queue:
I am not sure why we got different orders but getting the right order with git2 diff comparison was also an issue with my old code. |
Thanks for the clarification! But wouldn't that mean that on the landing page, the most recent google crate should at best be 6 minutes old? It seems odd that the most recent release is said to be done an hour ago. Maybe the release specifies the actual release date in crates.io, and not when the documentation was built on docs.rs? The order reported by However, another point of confusion is that docs.rs seems to have yet another order, maybe it's related to this sorting? It's not that I would think this is an actual issue, but it certainly added to my initial fear that it might not build all the docs. |
Yes, release date is actual release date in crates.io. id is just an auto incremented number. We are adding new crates to bottom of que with each INSERT in I am not sure why you have any fears but docs.rs was only skipping some crates because they weren't added properly into queue and thanks to you this problem is solved now. I only manually removed google-* crates from que in the past once or twice, because they were taking too long to build and you always released all of them at once. BTW bottom of the queue right now:
I think there is definitely a logical error in my INSERT and SELECT statements and 5 min timeouts between each |
Thanks again for letting me know! I should probably look more closely on how it works, as my use-case seems to push some limits to the point where I am blocking docs.rs for ours all by myself. After all, one release means ~140 crates, of which only 70 generate meaningful docs, with a total of 500kLOC. Something I could imagine is to make it easy to add additional workers and some scheduling logic that prevents crates known to take a long time to use all of them. |
@Byron you can see the list of crates in queue now: https://docs.rs/releases/queue Since crate order in queue and actual release time is a bit different, its hard to see new crates in the main page. |
This is so awesome! Thanks for sharing that. I never clicked the 'recent releases' link on the main page, but will surely do so more often now! 174 crates are queued at the time of writing, incredible. Something that would be awesome for my case is if instead of showing this (for example when clicking the documentation link of some google crate... ...one would show that it is queued, maybe alongside the queue number to give a hint how long it might take till it shows up, maybe like that: Doing this would solve my major problem, which is producing dead links right after uploading to crates.io. If you think it would be nice to have, I could possibly contribute this feature, even though so far I have not brought up a server on my own, but would work to make that easier as well. |
In addition to #90 this change will also run `build_queue` in it's own thread to catch panics (only panicked twice in last 6 months). This commit also introduces an attempt count to avoid endless tries to build yanked crates. This only happens if a crate gets yanked before docs.rs tries. docs.rs will only try to build a crate 5 times and ignore them after it but it will still leave it in queue for inspection.
Possible Improvements
Caveats
crates.io-index
is already checked out with the branchcrates-index-diff_last-seen
set to a recent commit, the first invocation ofget_new_crates()
will list (and possibly queue) all 40000 of them.git2
. If that should be the case, crates-io-cli could be used to move the leakage into its own process.Notes
#[ignore]
d, so I hope the CI will catch potential issues.