Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the sample sheet in the pipeline #102

Closed
marissaDubbelaar opened this issue Oct 27, 2021 · 8 comments
Closed

Implementation of the sample sheet in the pipeline #102

marissaDubbelaar opened this issue Oct 27, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@marissaDubbelaar
Copy link
Contributor

marissaDubbelaar commented Oct 27, 2021

It could be suggested that we create the following annotation for the input file:

  • sample ID,
  • vcf file
  • alleles (A*01:01),
  • peptide sequences file*,
  • protein file
  • The question here is where fo the id and sequence stand for (UniProt id and peptide sequence). Would it not be interesting to include the associated protein here as well?
@marissaDubbelaar marissaDubbelaar created this issue from a note in Hackathon-October-2021 (epitopeprediction) Oct 27, 2021
@christopher-mohr
Copy link
Collaborator

alleles is mandatory, ONE of vcf file, peptide sequences file and protein file is mandatory

@christopher-mohr christopher-mohr added the enhancement New feature or request label Oct 27, 2021
@jonasscheid
Copy link
Contributor

Proposed solution for the sample sheet structure:

SampleID | Alleles | FileName

where Alleles column contains EITHER a string of alleles (A*02:01;A*24:01;B*07:02;B*08:01;C*04:01;C*07:01) OR a text file containing one allele per line (no header)

and FileName contains EITHER a tsv file containing the peptides like
id | sequence
peptid1 | SYFPEITHI

OR a fasta file containing protein sequences
OR an annotated vcf file

The input of those provided files need to be validated

@marissaDubbelaar
Copy link
Contributor Author

Maybe we can rename SampleID to ID to keep the uniform annotation

@marissaDubbelaar marissaDubbelaar moved this from epitopeprediction to Done - Day 2 in Hackathon-October-2021 Oct 28, 2021
@marissaDubbelaar marissaDubbelaar moved this from Done - Day 2 to epitopeprediction in Hackathon-October-2021 Oct 28, 2021
@christopher-mohr
Copy link
Collaborator

As discussed we will stick to the columns sample, alleles and filename.

christopher-mohr added a commit that referenced this issue Oct 29, 2021
Update check_samplesheet.py script for new format #102
@marissaDubbelaar
Copy link
Contributor Author

marissaDubbelaar commented Nov 1, 2021

@jonasscheid, @christopher-mohr
I noticed that the check_requested_models.py doesn't take "A01:01;A02:01" as input but "HLA-A01:01;HLA-A02:01".
So we need to update this in the check_samplesheet.py that it checks whether the HLA types start with "H-2-", or "HLA-" or we need to remove this check from the check_requested_models.py

@jonasscheid
Copy link
Contributor

How about leaving the mouse nomenclature as is and allow 2 notations for the HLA alleles: With and without "HLA-" prefix.
I noticed that I need to allow mouse alleles in the check_samplesheet.py as well so I need to update it anyway. Lets wait on @christopher-mohr 's suggestion

@ggabernet
Copy link
Member

Done in #124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Hackathon-October-2021
epitopeprediction
Development

No branches or pull requests

4 participants