Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.sccache_check is on the hot path and causes rate limiting errors #2070

Open
alexandrnikitin opened this issue Feb 1, 2024 · 3 comments
Open

Comments

@alexandrnikitin
Copy link

Hey, I'm seeing a lot rate limiting errors at storage check (s3 backend). The ".sccache_check" file that is used for that check is on the hot path. What do you think if we make it configurable and expose it as an environment variable? Each actor can have it's own file that checks for read/write access. That would help to mitigate the issue. WDYT?

Example of the error:

storage write check failed: RateLimited (temporary) at Writer::write => S3Error { code: "SlowDown", message: "Please reduce your request rate.", resource: "", request_id: "T7HVSVY51KZ5E5ET" }

Context:
    response: Parts { status: 503, version: HTTP/1.1, headers: {"x-amz-request-id": "T7HVSVY51KZ5E5ET", "x-amz-id-2": "lx6IUMFEAgCQC32yIPFmwIV89vl9QnqkxzyyvYBg/VQTRtFC+21/dIrocKyworjoc/su/dQyyFA=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Thu, 01 Feb 2024 00:32:06 GMT", "server": "AmazonS3", "connection": "close"} }
    service: s3
    path: .sccache_check

The code:

sccache/src/cache/cache.rs

Lines 481 to 544 in 69be532

async fn check(&self) -> Result<CacheMode> {
use opendal::ErrorKind;
let path = ".sccache_check";
// Read is required, return error directly if we can't read .
match self.read(path).await {
Ok(_) => (),
// Read not exist file with not found is ok.
Err(err) if err.kind() == ErrorKind::NotFound => (),
// Tricky Part.
//
// We tolerate rate limited here to make sccache keep running.
// For the worse case, we will miss all the cache.
//
// In some super rare cases, user could configure storage in wrong
// and hitting other services rate limit. There are few things we
// can do, so we will print our the error here to make users know
// about it.
Err(err) if err.kind() == ErrorKind::RateLimited => {
eprintln!("cache storage read check: {err:?}, but we decide to keep running")
}
Err(err) => bail!("cache storage failed to read: {:?}", err),
};
let can_write = match self.write(path, "Hello, World!").await {
Ok(_) => true,
Err(err) if err.kind() == ErrorKind::AlreadyExists => true,
// Tolerate all other write errors because we can do read at least.
Err(err) => {
eprintln!("storage write check failed: {err:?}");
false
}
};
let mode = if can_write {
CacheMode::ReadWrite
} else {
CacheMode::ReadOnly
};
debug!("storage check result: {mode:?}");
Ok(mode)
}
fn location(&self) -> String {
let meta = self.info();
format!(
"{}, name: {}, prefix: {}",
meta.scheme(),
meta.name(),
meta.root()
)
}
async fn current_size(&self) -> Result<Option<u64>> {
Ok(None)
}
async fn max_size(&self) -> Result<Option<u64>> {
Ok(None)
}
}

@glandium
Copy link
Collaborator

glandium commented Feb 1, 2024

The check only happens when the server starts. How is that the hot path?

@alexandrnikitin
Copy link
Author

I'm also surprised to see it from AWS. We have dozens of worker nodes and thousands of builds per day but it's not a crazy number. But I frequently see that error in the logs.

I see that others also reported the same or similar issues:
#1485
#1485 (comment)
And PRs to mitigate it #1557

@orf
Copy link

orf commented May 11, 2024

S3 has rate limits: many reads and writes to a single key can hit rate limits far before the underlying partition is rate limited. Even 20-30 PUTs on a single key within a very short period of time will exhaust it.

On versioned buckets this is lower, especially if there are many millions of versions may exist with this key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants