Skip to content

shenwei356/simhash-eval

Repository files navigation

SimHash evaluation

I implement a SimHash hash function for nucleic acid k-mers, combining SimHash and FracMinHash.

Reference:

Distribution of subsitutions which result in same hashes


Steps

# fs=A.muciniphila-ATCC_BAA-835.fasta.gz
# seqkit subseq -r 1:10000 A.muciniphila-ATCC_BAA-835.fasta.gz > t.fna

fs=t.fna

k=21

for m in 4 5 6; do
    for s in 4 5 6 7 8; do
        go run mutate.go $fs $k $m $s 1 | pigz -c > $fs.simhash-k$k-m$m-s$s-n1.tsv.gz
        go run mutate.go $fs $k $m $s 2 | pigz -c > $fs.simhash-k$k-m$m-s$s-n2.tsv.gz
        
        Rscript site-dist.R -k $k -m $m -s $s $fs.simhash-k$k-m$m-s$s-n1.tsv.gz $fs.simhash-k$k-m$m-s$s-n2.tsv.gz
    done
done

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published