Skip to content

fix: add missing commas in Alliance column lists to prevent silent string concatenation#120

Open
AaryanCode69 wants to merge 1 commit intoreactome:mainfrom
AaryanCode69:fix/alliance-missing-commas-column-lists
Open

fix: add missing commas in Alliance column lists to prevent silent string concatenation#120
AaryanCode69 wants to merge 1 commit intoreactome:mainfrom
AaryanCode69:fix/alliance-missing-commas-column-lists

Conversation

@AaryanCode69
Copy link
Copy Markdown

Summary

Fix missing commas in Alliance of Genome Resources column definition lists that caused Python's implicit string concatenation to silently merge adjacent column names into single malformed entries.

Problem

src/data_generation/alliance/__init__.py defines column name lists for parsing Alliance TSV files. Two lists had missing commas between adjacent string literals, triggering Python's implicit string concatenation — a language feature where "A" "B" silently becomes "AB" with no error.

Location 1 — molecular_interaction (line ~141)

# Before: 1 entry instead of 2
"Host organism(s)" "Interaction parameter(s)",
# Produced: "Host organism(s)Interaction parameter(s)"

Location 2 — genetic_interaction (lines ~180–184)

# Before: 5 strings silently concatenated into 1
"Annotation(s) interactor A"
"Annotation(s) interactor B"
"Interaction annotation(s)"
"Host organism(s)"
"Interaction parameter(s)"
"Creation date",
# Produced: one mega-string + "Creation date" → 2 entries instead of 6

This would cause incorrect column-to-name mapping when parsing Alliance TSV files, with every column after the concatenation point shifted by 1 (Location 1) or 4 (Location 2).

Fix

Added the missing commas after each string literal so they are treated as separate list entries:

# Location 1
- "Host organism(s)" "Interaction parameter(s)",
+ "Host organism(s)",
+ "Interaction parameter(s)",

# Location 2
- "Annotation(s) interactor A"
- "Annotation(s) interactor B"
- "Interaction annotation(s)"
- "Host organism(s)"
- "Interaction parameter(s)"
+ "Annotation(s) interactor A",
+ "Annotation(s) interactor B",
+ "Interaction annotation(s)",
+ "Host organism(s)",
+ "Interaction parameter(s)",
  "Creation date",

Changes

File Change
src/data_generation/alliance/__init__.py Added 6 missing commas across two column definition lists (molecular_interaction, genetic_interaction)

Impact Assessment

  • No runtime breakage — the current upload_to_chromadb() loop only processes filetype == "genes", so these column lists are not yet consumed in production.
  • No deployment risk — purely additive comma insertions; no logic changes, no new dependencies.
  • Future-proofing — when the Alliance pipeline is expanded to process molecular_interaction and genetic_interaction data, these lists will now correctly match the actual TSV headers from the Alliance API.

Related Issue

Closes #118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: missing commas in Alliance column lists cause silent string concatenation

1 participant