Skip to content

Conversation

@bicycle1885
Copy link
Contributor

Hi, I've slightly improved the performance of fqcnt_jl1_klib.jl.

Here are the step-by-step improvements from 759cd6a to 15d9864 on my machine:

HEAD is now at 759cd6a use bit operators
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):      4.280 s ±  0.083 s    [User: 3.923 s, System: 0.838 s]
  Range (min … max):    4.134 s …  4.398 s    10 runs
 
Previous HEAD position was 759cd6a use bit operators
HEAD is now at 7fd4ff8 make readbyte type stable
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):      3.717 s ±  0.074 s    [User: 3.390 s, System: 0.793 s]
  Range (min … max):    3.620 s …  3.834 s    10 runs
 
Previous HEAD position was 7fd4ff8 make readbyte type stable
HEAD is now at f1b09c2 do not count  string length twice
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):      2.834 s ±  0.046 s    [User: 2.513 s, System: 0.805 s]
  Range (min … max):    2.769 s …  2.936 s    10 runs
 
Previous HEAD position was f1b09c2 do not count  string length twice
HEAD is now at 2984dc8 use unsafe_string to make strings
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):      2.380 s ±  0.026 s    [User: 2.101 s, System: 0.755 s]
  Range (min … max):    2.331 s …  2.411 s    10 runs
 
Previous HEAD position was 2984dc8 use unsafe_string to make strings
HEAD is now at 15d9864 use memchr to find delimiter
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):      1.893 s ±  0.040 s    [User: 1.615 s, System: 0.756 s]
  Range (min … max):    1.842 s …  1.995 s    10 runs
julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 5 2400G with Radeon Vega Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, znver1)
Environment:
  JULIA_PROJECT = @.

@lh3
Copy link
Owner

lh3 commented May 21, 2020

Brilliant. Thanks!

@lh3 lh3 merged commit 514a638 into lh3:master May 21, 2020
@bicycle1885 bicycle1885 deleted the perf-julia branch May 21, 2020 04:15
lh3 added a commit that referenced this pull request May 21, 2020
@lh3
Copy link
Owner

lh3 commented May 21, 2020

Yes, it is a significant improvement on uncompressed fastq. Do you know why julia is not that fast on compressed fastq? Also, which libz.so is the script calling? Is it the system zlib (which is 1.2.7 on CentOS7)? I have updated the timing in README.md. I will update fqcnt/README.md and blog post tomorrow.

The more interesting benchmark is bedcov. It is simpler than the first. I don't understand why Julia can't get good performance. Thanks.

@bicycle1885
Copy link
Contributor Author

Recent Julia distributions are shipped with zlib.so and it is used if you write ccall((func, "libz"), ...) by default. Unfortunately, that zlib.so is very slow because the build script was not configured correctly (in fact, this is my bad). That problem was fixed recently (yes, very recently!) but the fix is not yet included even in the latest release (i.e., Julia 1.4.1). So, in order to fix the problem right now, you may need to use your own libz.so installed in your system, for example.

This is an example patch on my machine:

diff --git a/lib/Klib.jl b/lib/Klib.jl
index 1b7fc39..a8f8d21 100644
--- a/lib/Klib.jl
+++ b/lib/Klib.jl
@@ -120,6 +120,9 @@ end
 
 tostring(b::ByteBuffer, n::Int) = unsafe_string(b.a, n)
 
+const libz = "/usr/lib/x86_64-linux-gnu/libz.so"
+#const libz = "libz"
+
 #
 # GzFile
 #
@@ -127,17 +130,17 @@ mutable struct GzFile <: IO
        fp::Ptr{Cvoid}
 
        function GzFile(fn::String, mode = "r")
-               x = ccall((:gzopen, "libz"), Ptr{Cvoid}, (Cstring, Cstring), fn, mode)
+               x = ccall((:gzopen, libz), Ptr{Cvoid}, (Cstring, Cstring), fn, mode)
                y = x == C_NULL ? nothing : new(x)
                if y != nothing finalizer(Base.close, y) end
                return y
        end
 end
 
-Base.readbytes!(fp::GzFile, buf::Vector{UInt8}) = ccall((:gzread, "libz"), Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Cuint), fp.fp, buf, length(buf))
+Base.readbytes!(fp::GzFile, buf::Vector{UInt8}) = ccall((:gzread, libz), Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Cuint), fp.fp, buf, length(buf))
 
 function Base.close(fp::GzFile)
-       ret = fp.fp != C_NULL ? ccall((:gzclose, "libz"), Cint, (Ptr{Cvoid},), fp.fp) : -1
+       ret = fp.fp != C_NULL ? ccall((:gzclose, libz), Cint, (Ptr{Cvoid},), fp.fp) : -1
        fp.fp = C_NULL
        return ret
 end

And this is a comparison before the patch:

~/w/biofast (master|✚1…) $ hyperfine 'fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq.gz'
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq.gz
  Time (mean ± σ):     13.930 s ±  0.058 s    [User: 13.710 s, System: 0.683 s]
  Range (min … max):   13.859 s … 14.023 s    10 runs

and after the patch:

~/w/biofast (master|✚1…) $ hyperfine 'fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq.gz'
Benchmark #1: fqcnt/fqcnt_jl1_klib.jl biofast-data-v1/M_abscessus_HiSeq.fq.gz
  Time (mean ± σ):      7.555 s ±  0.037 s    [User: 7.325 s, System: 0.707 s]
  Range (min … max):    7.504 s …  7.616 s    10 runs

I will check bedcov, too. But maybe I have not enough time to improve it immediately.

@KristofferC
Copy link

That problem was fixed recently (yes, very recently!) but the fix is not yet included even in the latest release

Do you know if the upcoming 1.5 has the improved version? I would guess no, since the fix to Yggdrasil was merged so recently. We should definitely try get that in, in my opinion.

@bicycle1885
Copy link
Contributor Author

I didn't know, but I expected ;) Considering @lh3 will update his blog post very soon and using the nightly build is not recommended in general, I think the quickest fix we can take for benchmarking is to hardcode the zlib path as I've shown above. Of course, Zlib_jll.jl may be used instead, but it adds a new dependency in the benchmark.

@lh3
Copy link
Owner

lh3 commented May 21, 2020

I have updated README and blog post using the system zlib. Julia is on par with faster languages now. Note that Klib.jl in this repo still uses "libz". When timing, I was using the system zlib on CentOS7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants