New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write Performance between 'MinIO Client SDK' and 'mc' #816
Comments
Initial numbers ... I have one script which pipes bytes to mc and then one test for each client, Minio and AWS. I used a local Minio Server instance to avoid network cost started with:
Using dd && mc, I get on average 120MB/s
Using Minio client, flat around 19MB/sec. The "good" news, AWS is not "much" faster ... twice as fast as Minio, but still way bellow mc, around 42MB/sec |
On the Java side, the test-bed is pretty simple:
In the case of Minio client, the hot areas are:
In the case of AWS (just to have a baseline for another Java client), the hot areas are:
In both cases, both clients are spending significant time to calculate hashes (SHA256) ... |
I don't think native code compiled with Go is faster (or slower) than Java and I presume MC is using Minio Go Client, which should do about the same thing that Minio Java Client does .... so why is much faster (related to both Java Clients)? I also presume that the difference in performance between sha256 in Java and Go are pretty much the same (hopefully?) ... |
Made another test, this time for io.minio.Digest.sha256Hash ... even if it is protected, a little bit of reflection code and I was able to call it. Hashing 1MB array takes in avg 9ms, so I got ~112MB/sec ... seems pretty low? Any idea how much it takes in Go? By the way, I'm using Oracle JDK 1.8.0 build 202 ... |
Sorry, the AWS client did not have the default options, I disabled chunked encoding...with chunk encoding turned on, I get 62MB/sec. This is how the client is created (path style access is required to be able to access Minio/non-AWS S3 stores):
|
... and the secret why it got faster? skipping SHA 256 calculation ... in case of chunked encoding, ASW client skips SHA 256 calculation (basically skipping x-amz-content-sha256 header), even if it's not HTTPS (for HTTPS, it skips the calculation even if chunk encoding is disabled. |
They even have a warning in S3ClientOptions...so I guess it is expensive ... so ... is the Go client taking a shortcut and not calculating all these hashes like Java implementation does?
|
It looks like AWS client also has a few more secrets...with the eTag validation off (we trust the content, we are in a private network), I got to 82MB/sec. Still behind mc, but good enough.
|
Ok, Go client is cheating ;) Found the creation of MD5/SHA256 .... it looks like MD5 comes from base libraries (crypto/md5), but sha256 comes from github.com/minio/sha256-simd, which is Minio project: https://github.com/minio/sha256-simd Also found this blog entry: https://blog.minio.io/highwayhash-fast-hashing-at-over-10-gb-s-per-core-in-golang-fee938b5218a If SHA256 is that expensive (and it looks like it is)...no wonder why mc is faster ... The problem that I have is that I did some benchmarks in the past with Minio Client and AWS and Minio was slower than AWS even back then...However, it got even slower with the latest Minio Java SDK? 1.5-2 times slower? I do not have the right numbers right now, I could try to revert to an older version and compare. But now, with a tweaked AWS client, I can get 80MB/sec, which is good enough. I'm wondering if I can do the same thing with Minio Client? I think I have posted enough for a day :) I'll leave you guys some time to go over all my posts ... |
@adrian-tarau Adding to the chain of comments
while in Minio-java its is 5 MiB
Can you share the stats by putting MIN_MULTIPART_SIZE to 128 MiB in minio-java. |
Probably not the reason why ... Amazon Client has 128K chunk size and a 256k read buffer. I'll take the source code and change things (buffers) here and there and see how it goes. |
Actually, that's the limit to switch to chunk upload...the real chunk size is 64k |
What it looks like it kills the performance (and that was a surprise to me too) is that sha256 in Java is not that fast. On my machine, I got a maximum of 115MB/sec. And Minio does a full sha256 on the whole file and then it does sha256 on each chunk. |
My bad, Minio is not doing sha256 ones per file and ones per chunk, only per chunk. I went down a few levels, to the OS.
This shows me the following:
It would be interesting to know what are these limits on the Minio Go client. |
By the way, Minio Client vs Amazon Client for reads ... ~690MB/sec vs ~780MB/sec (reading the same file, so everything in cached in OS) |
Comparing a SDK with a tool is incorrect. I did little test by running minio with single endpoint locally. Below is the result. For writing 1 GiB data
Detailed output [bala@localhost tmp]$ dd oflag=direct if=/dev/zero of=1gb bs=1M count=1024
724566016 bytes (725 MB, 691 MiB) copied, 2 s, 362 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.96655 s, 362 MB/s
[bala@localhost tmp]$ dd oflag=direct if=/dev/zero of=1gb bs=1M count=1024
732954624 bytes (733 MB, 699 MiB) copied, 2 s, 366 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.93666 s, 366 MB/s
[bala@localhost tmp]$ dd oflag=direct if=/dev/zero of=1gb bs=1M count=1024
756023296 bytes (756 MB, 721 MiB) copied, 2 s, 378 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.8554 s, 376 MB/s
[bala@localhost tmp]$ dd oflag=direct if=/dev/zero of=1gb bs=1M count=1024
734003200 bytes (734 MB, 700 MiB) copied, 2 s, 367 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.94635 s, 364 MB/s
[bala@localhost checkspeed]$ time ./checkspeed
2019/11/13 11:31:48 Uploaded successfully
real 0m8.378s
user 0m6.826s
sys 0m1.264s
[bala@localhost checkspeed]$ time ./checkspeed
2019/11/13 11:32:07 Uploaded successfully
real 0m11.286s
user 0m6.008s
sys 0m0.640s
[bala@localhost checkspeed]$ time ./checkspeed
2019/11/13 11:32:24 Uploaded successfully
real 0m14.595s
user 0m6.356s
sys 0m0.456s
[bala@localhost checkspeed]$ time ./checkspeed
2019/11/13 11:32:41 Uploaded successfully
real 0m12.219s
user 0m6.008s
sys 0m0.524s
[bala@localhost checkspeed]$ time java -cp minio-6.0.12-DEV-all.jar:. PutObject
uploaded successfully
real 0m16.610s
user 0m12.000s
sys 0m1.244s
[bala@localhost checkspeed]$ time java -cp minio-6.0.12-DEV-all.jar:. PutObject
uploaded successfully
real 0m16.246s
user 0m11.759s
sys 0m1.232s
[bala@localhost checkspeed]$ time java -cp minio-6.0.12-DEV-all.jar:. PutObject
uploaded successfully
real 0m16.273s
user 0m11.784s
sys 0m1.199s
[bala@localhost checkspeed]$ time java -cp minio-6.0.12-DEV-all.jar:. PutObject
uploaded successfully
real 0m15.527s
user 0m11.632s
sys 0m1.267s Test sources package main
import (
"log"
"os"
minio "github.com/minio/minio-go/v6"
)
func main() {
s3Client, err := minio.New("localhost:9000", "minio", "minio123", false)
if err != nil {
log.Fatalln(err)
}
object, err := os.Open("/home/bala/tmp/1gb")
if err != nil {
log.Fatalln(err)
}
defer object.Close()
objectStat, err := object.Stat()
if err != nil {
log.Fatalln(err)
}
_, err = s3Client.PutObject("mybucket", "myobject", object, objectStat.Size(), minio.PutObjectOptions{ContentType: "application/octet-stream"})
if err != nil {
log.Fatalln(err)
}
log.Println("Uploaded successfully")
} import io.minio.MinioClient;
public class PutObject {
public static void main(String[] args) throws Exception {
MinioClient minioClient = new MinioClient("http://localhost:9000", "minio", "minio123");
minioClient.putObject("mybucket", "myobject", "/home/bala/tmp/1gb", 1073741824L, null, null, null);
System.out.println("uploaded successfully");
}
} Language versions [bala@localhost checkspeed]$ go version
go version go1.13.4 linux/amd64
[bala@localhost checkspeed]$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode) |
Do you mean comparing mc with Java SDK? It should be the same, I would think? mc uses Minio Go and comparing write throughput between mc and Minio Java should be the same as comparing Minio Go with Minio Java, right? I understand now that Minio Go is using an optimized sha256, which might/will be faster even in the absence of those CPU instructions designed to speed sha256. Anyway, your throughput, 92 MB/s vs 66 MB/s is great and I'm wondering why I have such a big difference between pure disk performance (dd), mc and Java. Don't get me wrong, I'm not criticizing Minio or Minio Java, you guys did a great job building these amazing tools.... |
No. |
This is what my disk can give me:
Minio Java : 26.11 MB/s |
I was wondering if you could run a test for Amazon SDK? This is the code:
the method testWrite comes from a base class...it only does looping over one file, calls putObject and times it. |
Hmmm...so how can I do parallel uploads with Minio Java? :) I do not expect amazing write throughput, most applications care about reading throughput from S3 ... however, when the write throughput is 10MB/s-20MB/s, that would become a problem. I'm trying to understand why do I get 10MB/sec. Would the number of Minio nodes affect the write throughput in a considerable way? Should I expect a significant difference between 5 nodes, 10 nodes or 30 nodes? I tried looking in the Minio documentation but I could not find clearly spelled out this aspect. |
To be clear I get 10MB/s from Java in a large cluster when mc flies. 10MB/s is way too low, 30MB/s-40MB/s would be what I would find reasonable. |
I did little check with aws-sdk-java which performs package examples;
import com.amazonaws.services.s3.internal.SkipMd5CheckStrategy;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.Protocol;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import java.io.File;
public class PutObject {
public static void main(String[] args) throws Exception {
System.setProperty(SkipMd5CheckStrategy.DISABLE_GET_OBJECT_MD5_VALIDATION_PROPERTY, "true");
System.setProperty(SkipMd5CheckStrategy.DISABLE_PUT_OBJECT_MD5_VALIDATION_PROPERTY, "true");
ClientConfiguration clientConfiguration = new ClientConfiguration()
.withProtocol(Protocol.HTTP)
.withTcpKeepAlive(true);
AmazonS3 client = AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials("minio", "minio123")))
.withClientConfiguration(clientConfiguration)
.withPathStyleAccessEnabled(true)
.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("http://localhost:9000", null))
.build();
client.putObject("mybucket", "myobject", new File("/home/bala/tmp/1gb"));
System.out.println("uploaded successfully");
}
} I modified minio-java source by increasing part size, I am able to get below results.
minio-java (with no code change) is 28% slower than minio-go as per my testing. If you get too slow, there is some other problem in your testing. Its better to do the testing in a controlled environment with bare minimal code. Solution to 28% slower problem is to increase part size reasonably. |
That's 12s vs 16s, right? So Amazon AWS is faster for you too...about 25% faster, basically close to Go SDK. I'll play with different settings of Minio too, see how it goes. |
This is already fixed with PutObjectOptions support in 7.0.0 - closing as fixed. |
I was asked to open an issue after I posted a question on slack: https://minio.slack.com/archives/C3NDUB8UA/p1573484464253800
I'll provide additional information once I set up a performance test in my local environment.
The text was updated successfully, but these errors were encountered: