Skip to content

Feature/claude assisted tests#263

Merged
joshfactorial merged 15 commits into
developfrom
feature/claude-assisted-tests
Apr 18, 2026
Merged

Feature/claude assisted tests#263
joshfactorial merged 15 commits into
developfrom
feature/claude-assisted-tests

Conversation

@joshfactorial
Copy link
Copy Markdown
Collaborator

Tests written to cover gaps in test coverage with Claude help.

joshfactorial and others added 13 commits April 17, 2026 23:11
Rewrites/expands test coverage for split_inputs, bed_func, read, and
vcf_func modules — raising coverage from <50% to 88–100% across all four.

- test_split_inputs.py: rewrite from 1 → 25 tests covering chunk_record,
  write_fasta, disk_bytes_free, print_stderr, and main() in both contig
  and block modes (100% coverage)
- test_bed_func.py: new file with 37 tests covering intersect_regions,
  fill_out_mut_regions, recalibrate_mutation_regions, parse_single_bed,
  fill_out_bed_dict, and parse_beds (98% coverage)
- test_read.py: new file with 47 tests covering Read construction,
  comparisons, apply_mutations, calculate_flags, make_cigar,
  finalize_read_and_write, and all helper methods (98% coverage)
- test_vcf_func.py: new file with 28 tests covering retrieve_genotype,
  variant_genotype, and parse_input_vcf for all variant types and
  genotype formats (88% coverage; remaining 12% is the unreachable WP
  legacy genotype branches due to a known bug fixed separately)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
25 tests covering find_random_non_n, map_non_n_regions, and
generate_variants — including input variant range filtering, offset
ref_start handling, mutation count bounds, variant type validation,
qual score assignment, reproducibility, and high-vs-low mutation rate
comparison.

Also surfaces a pre-existing bug: when mutation_rate_regions has
multiple entries, probability_rates (len N) may not match the length
of local_mut_regions returned by intersect_regions, causing a numpy
ValueError. Documented by replacing the multi-region test with a
single-region variant that avoids the mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add coverage artifacts (.coverage, htmlcov/), pytest cache, NEAT log
files (*.log), compiled bytecode (__pycache__, *.pyc/pyo/pyd), common
virtual environment directories, and build/packaging output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
20 tests covering concat, merge_vcfs, merge_bam, and main:
- concat: single file, multiple files, empty list, content preservation,
  ordering
- merge_vcfs: comment/header line filtering, multi-file merge, empty
  input, data line ordering
- merge_bam: pysam merge+sort call verification, output path, temp file
  cleanup, >500-file chunking behaviour
- main: fq1-only, fq1+fq2, vcf, all-None entries, multi-chunk
  concatenation, BAM delegation to merge_bam

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
16 unit tests cover filter_thread_variants and filter_bed_regions:
range boundaries (inclusive lower, exclusive upper), partial overlap,
empty inputs, multi-region spanning, and return types.

6 integration tests exercise read_simulator_runner end-to-end against
a 400 bp single-contig reference with low coverage:
- FASTQ output created and contains valid records
- Paired-end mode produces two FASTQ files
- Missing output directory is created automatically
- Output files carry the supplied prefix
- Identical seeds produce identical output
- VCF output mode creates a vcf.gz file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
24 tests covering:
- reg2bin: same-bin, large-span fallback, adjacent coordinates
- Constructor: fq1/fq2/vcf/bam file handle creation, no-files ValueError,
  None paths when output type not requested
- VCF header: fileformat line, contig entries, #CHROM column header
- write_fastq_record: content written to gzip, unknown-file ValueError,
  multiple records
- write_vcf_record: data appended, unknown-file ValueError
- flush_and_close_files: handle closed, idempotent double-close
- write_bam_record: forward strand, reverse strand, odd-length padding

Also includes minor whitespace fix in test_generate_reads.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_error_models.py (21 tests):
- TraditionalQualityModel: default construction, quality score range,
  error rate dict, uniform mode init and score bounds, get_quality_scores
  for uniform/non-uniform/exact-length/shorter-than-model cases,
  ndarray return, reproducibility
- SequencingErrorModel: default construction, variant prob sum, custom
  error rate, zero-error returns empty list, high-error produces
  ErrorContainers with valid locations and alt bases, padding returned
- ErrorContainer: field storage for SNV, Deletion, Insertion types

test_single_runner.py (37 tests, 3 classes):
- TestInitializeAllModels: returns 4-tuple of correct types, rng attached,
  mutation_rate override, fragment_mean path, default mean=read_len*2
- TestWriteBlockVcf: single SNV written, ref/alt/qual correct, empty
  input produces no output, multiple SNVs in sorted order, 10 VCF columns
- TestReadSimulatorSingle: return tuple structure, thread_idx/contig_name
  passthrough, ContigVariants return, all file_dict keys present, FASTQ
  file created and valid, content has FASTQ records, seed reproducibility,
  vcf=None when produce_vcf=False

Note: quality_score_model.py is a standalone Markov analysis script with
hardcoded BAM paths — not a library module and not tested here.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_error_models.py: 4 new indel error tests (deletion, insertion,
  blacklist deduplication); fix monkeypatch approach for numpy Generator
- test_single_runner.py: 2 new tests covering produce_bam=True path
  (lines 127-152) and bam=None when not requested
- test_runner.py: 8 new integration tests covering discard_bed,
  mutation_bed, ploidy=1, ploidy=4, min_mutations, and produce_bam=True;
  also input VCF, target_bed, and mutation_rate override
- tests/test_variants/: new test_variant_types.py (73 tests for all
  comparison operators, __repr__, contains(), get_alt()) and
  test_contig_variants.py (24 tests documenting known bugs in
  remove_variant and check_if_ins)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
generate_variants.py:
  - N in mutation region (lines 139-157 avoidance logic)
  - N-heavy subsequence skip (line 204)
  - Trinucleotide with N triggers disallowed_chars path (lines 246-249)
  - High-rate deletion overlap handling (lines 291-302)

error_models.py:
  - MarkovQualityModel stub coverage (lines 112-118)
  - Score clamped to min 1 and max 42 via extreme qual_score_probs (lines 102-104)
  - Documents unreachable indel branches (lines 209-235, 252) caused by
    the total_indel_length > read_length//4 circular gate; includes
    TODO comment with proposed assertions for after the bug fix

vcf_func.py:
  - Three WP-genotype tests document that the WP condition is always False
    (string "WP" never equals a list); each includes a TODO comment with
    exact updated assertions to apply once the condition is corrected to
    any(x.split('=')[0] == "WP" for x in ...)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Duplicates removed:
- test_error_and_mut_models.py: test_sequencing_error_model_zero_error_returns_none_or_empty
  (covered by test_error_models.py::test_sem_zero_error_rate_returns_empty)
- test_error_and_mut_models.py: test_traditional_quality_model_reproducible_with_seed
  (covered by test_error_models.py::test_tqm_get_quality_scores_reproducible)
- test_seq_error.py: test_no_errors_when_avg_zero (same as above)
- test_models/test_stitch_outputs.py: test_concat_joins_files_in_order
  (covered by test_read_simulator/test_stitch_outputs.py)

Vacuous assertions fixed:
- test_error_models.py: rename indel dead-code tests and assert == 0 (not >= 0)
- test_output_file_writer.py: replace `assert True` with bam_handle.tell() > pos_before
- test_output_file_writer.py: strengthen test_reg2bin_same_16kb_bin to check determinism
- test_runner.py: test_filter_bed_regions_returns_list now checks content
- test_single_runner.py: test_returns_four_element_tuple now checks element types
- test_vcf_func.py: test_variant_genotype_returns_correct_ploidy_length checks values
- test_generate_variants.py: test_generate_variants_variant_types_are_valid checks attributes
- test_contig_variants.py: test_remove_variant_method_exists adds TODO comment for post-fix update

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- generate_reads.py lines 234-236: paired-end discard check for read2
  (test_generate_reads_paired_discard_region_removes_all)
- generate_reads.py: paired no-discard regression guard
  (test_generate_reads_paired_no_discard_produces_read_pairs)
- single_runner.py line 85: "Record too small" debug log path,
  with generate_reads/generate_variants patched to avoid infinite loop
  (test_record_too_small_logs_and_continues)
- bed_func.py line 209: mutation rate > 0.3 warning log
  (test_parse_single_bed_mutation_high_rate_logs_warning)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2. Change version to 4.3.6 in pyproject.toml
3. Fixed channel order in environment.ylm which cause crash due to not satisfied requirements of bcftools
@joshfactorial joshfactorial force-pushed the feature/claude-assisted-tests branch from 24396b2 to cc549dc Compare April 18, 2026 05:17
@joshfactorial joshfactorial merged commit cb4bde9 into develop Apr 18, 2026
1 check passed
@joshfactorial joshfactorial deleted the feature/claude-assisted-tests branch April 18, 2026 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants