The PhD repo
Exam happened in 2019-04-17.
Comparison of containment approaches using MinHash:
- CMash (containment minhash)
- mash screen
- smol (scaled minhash)
Regenerating results (after running the setup steps):
conda activate thesis cd experiments/smol_gather && snakemake --use-conda
Scaled MinHash sizes
Scaled MinHash sizes (number of hashes) analysis across domains in Genbank.
Inverted index and shared hashes
Analyzing unique and shared hashes in an inverted index.
All processing and analysis scripts were performed using the conda environment specified in
To build and activate this environment run:
conda env create --force --file environment.yml conda activate thesis