Skip to content

generate from homologs with hf#9

Merged
sarahalamdari merged 2 commits into
mainfrom
refactor_query_homolgs
May 6, 2025
Merged

generate from homologs with hf#9
sarahalamdari merged 2 commits into
mainfrom
refactor_query_homolgs

Conversation

@samirchar
Copy link
Copy Markdown
Collaborator

  • Created a new script to generate from homologs with hf based on query_from_homologs.
  • New script lets users specify a pattern for msa files or a list of file names. Also included argumets to avoid hardcoding, like msa min seqs msa, max seqs msa, max sequence length
  • For tracking left query_from_homologs untouched and moved to analysis

Comment thread src/generate_from_homologs.py Outdated
parser.add_argument("--include-pattern", type=str, default="*", help="glob pattern for MSA files to include from the directory.")
parser.add_argument("--msa-file-names",nargs='*', type=str, default=None, help="List of MSA file names to include.")
parser.add_argument("--max-length", type=int, default=768, help="The maximum length of the generated text.")
parser.add_argument("--max-seqs-msa", type=int, default=57, help="The maximum number of sequences in an MSA.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the max test msa contain 57 sequences, or should this default to 64?

same q about min

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the hard-coded values Kevin had in the original query from homologue script here: https://github.com/microsoft/dayhoff/blob/main/analysis/query_from_homologs.py

If these are incorrect, let me know and I can change them

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted - will look into this

@sarahalamdari sarahalamdari merged commit e540277 into main May 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants