Skip to content

Commit

Permalink
MRG: adjust protein ksize for record/manifest (#3019)
Browse files Browse the repository at this point in the history
Protein k-mer sizes are k=k*3 internally, but k in a manifest. 

Record is used within manifest, so should reflect k.

This should not impact selection at sig level. In`signature::Select`:
```
 valid = if let Some(ksize) = selection.ksize() {
                let k = s.ksize() as u32;
                k == ksize || k == ksize * 3
```
So here we match exact ksize or k=k*3, regardless of ksize or molecule
type. This is b/c we don't have access to `is_protein` or any other
property unless we load the minhash.

we may want to be more explicit at some point, but this solves the
immediate problem
  • Loading branch information
bluegenes committed Feb 21, 2024
1 parent 484c7ea commit fa4ae0b
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/core/src/manifest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ impl Record {
pub fn from_sig(sig: &Signature, path: &str) -> Vec<Self> {
sig.iter()
.map(|sketch| {
let (ksize, md5, with_abundance, moltype, n_hashes, num, scaled) = match sketch {
let (mut ksize, md5, with_abundance, moltype, n_hashes, num, scaled) = match sketch
{
Sketch::MinHash(mh) => (
mh.ksize() as u32,
mh.md5sum(),
Expand All @@ -106,6 +107,10 @@ impl Record {

let md5short = md5[0..8].into();

if moltype != HashFunctions::Murmur64Dna {
ksize /= 3;
}

Self {
internal_location: path.into(),
moltype: moltype.to_string(),
Expand Down

0 comments on commit fa4ae0b

Please sign in to comment.