Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing stitchr for use in other scripts - obtain stitched aa sequence without C region information #35

Closed
dcarbajo opened this issue Oct 9, 2023 · 5 comments

Comments

@dcarbajo
Copy link

dcarbajo commented Oct 9, 2023

I am actually trying to run stitchr inside a Shiny R app, for which I use reticulate to run it inside a conda environment.

For that, I do the following:

reticulate::use_condaenv('myenv')
reticulate::py_install("stitchr", pip = TRUE)

Then I create a run_stitchr.py script as detailed in https://jamieheather.github.io/stitchr/importing.html

run_stitchr.py looks exactly like the one in the link above:

# import stitchr
from Stitchr import stitchrfunctions as fxn
from Stitchr import stitchr as st

# specify details about the locus to be stitched
chain = 'TRB'
species = 'HUMAN'

# initialise the necessary data
tcr_dat, functionality, partial = fxn.get_imgt_data(chain, st.gene_types, species)
codons = fxn.get_optimal_codons('', species)

# provide details of the rearrangement to be stitched
tcr_bits = {'v': 'TRBV7-3*01', 'j': 'TRBJ1-1*01', 'cdr3': 'CASSYLQAQYTEAFF',
            'l': 'TRBV7-3*01', 'c': 'TRBC1*01',
            'skip_c_checks': False, 'species': species, 'seamless': False,
            '5_prime_seq': '', '3_prime_seq': '', 'name': 'TCR'}

# then run stitchr on that rearrangement
stitched = st.stitch(tcr_bits, tcr_dat, functionality, partial, codons, 3, '')

print(stitched)

and I run it as follows:

run_results <- reticulate::py_run_file(run_script)

This works perfectly, and run_results$stitched contains the following output:

[[1]]
[1] "TCR"             "TRBV7-3*01"      "TRBJ1-1*01"      "TRBC1*01"        "CASSYLQAQYTEAFF"
[6] "TRBV7-3*01(L)"  

[[2]]
[1] "ATGGGCACCAGGCTCCTCTGCTGGGCAGCCCTGTGCCTCCTGGGGGCAGATCACACAGGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCAGCCGTGTATCTCTGTGCCAGCAGCTACCTGCAGGCCCAGTACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAGAGGACCTGAACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAGGCCACACTGGTGTGCCTGGCCACAGGCTTCTTCCCCGACCACGTGGAGCTGAGCTGGTGGGTGAATGGGAAGGAGGTGCACAGTGGGGTCAGCACGGACCCGCAGCCCCTCAAGGAGCAGCCCGCCCTCAATGACTCCAGATACTGCCTGAGCAGCCGCCTGAGGGTCTCGGCCACCTTCTGGCAGAACCCCCGCAACCACTTCCGCTGTCAAGTCCAGTTCTACGGGCTCTCGGAGAATGACGAGTGGACCCAGGATAGGGCCAAACCCGTCACCCAGATCGTCAGCGCCGAGGCCTGGGGTAGAGCAGACTGTGGCTTTACCTCGGTGTCCTACCAGCAAGGGGTCCTGTCTGCCACCATCCTCTATGAGATCCTGCTAGGGAAGGCCACCCTGTATGCTGTGCTGGTCAGCGCCCTTGTGTTGATGGCCATGGTCAAGAGAAAGGATTTC"

[[3]]
[1] 0

Now here I have 2 questions:

1- using this approach, is there a possibility to obtain the aminoacid stitched sequence instead (similarly to command-line stitchr, that returns both DNA and aa sequences)?

2- I don't have C region information, but if I just write 'c': '' in the tcr_bits section of the run_stitchr.py above, I get the following:

Error: a CONSTANT sequence region has not been found for gene in the IMGT data for this chain/species. Please check your TCR and species data.

No C region information does not seem to be a problem for command-line stitchr... what is the correct way to run stitchr with no C region information using the approach above with run_stitchr.py?

Many thanks!
Daniel

@JamieHeather
Copy link
Owner

Hello hello,

Glad to see that the import option is getting some attention! It is however a relatively advanced use of the tool, so unfortunately you have to do a little more legwork to recapitulate the fuller functionality of the current scripts. To answer your questions:

1- The way that the tool works is that the core stitch function does the actual legwork of sticking the V/J/CDR3 together and then the different scripts take that information and process it to whatever the relevant format is. As this importing example only runs that st.stitch function you don't get the full output that the command line stitchr.py script produces.

However it's pretty simple to get the same information out of the results provided - you just need to translate the stitched nt sequence, which is the second item in the output, e.g.:

aa = fxn.translate_nt(stitched[1])
print(aa)
MGTRLLCWAALCLLGADHTGAGVSQTPSNKVTEKGKYVELRCDPISGHTALYWYRQSLGQGPEFLIYFQGTGAADDSGLPNDRFFAVRPEGSVSTLKIQRTERGDSAVYLCASSYLQAQYTEAFFGQGTRLTVVEDLNKVFPPEVAVFEPSEAEISHTQKATLVCLATGFFPDHVELSWWVNGKEVHSGVSTDPQPLKEQPALNDSRYCLSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRADCGFTSVSYQQGVLSATILYEILLGKATLYAVLVSALVLMAMVKRKDF

2- stitchr always requires there to be a constant region supplied, as adding the C is one of the required parts of its function. You're right, the command line version does cope without one, but that's because it's doing some extra checks the under the hood to infer the C before running the stitch function:

For human and mouse TCRs, the script will use the TRBC gene located in the same cluster as the J gene (i.e. TRBJ1-1 through TRBJ1-6 will get TRBC1, while TRBJ2-1 through TRBJ2-7 will get TRBC2). This can be overriden (see optional arguments). Unfortunately we are not experts in TCR loci architecture of all species, so we haven’t hard-wired any other constant region assumptions, so for all other species you’ll need to explicitly state which constant region you want used.

When using the bare function you're going to have to make that determination yourself for each of your rearrangement. Alternatively if you don't care about the constant you can just specify the same C for all of them and trim it off later, or use the extra gene file to apply some arbitrary sequence in place of the C.

Hope that helps!
Jamie

@dcarbajo
Copy link
Author

Thanks for the info!

@JamieHeather
Copy link
Owner

No probs, my pleasure - glad to see people using the tool!

@dcarbajo
Copy link
Author

dcarbajo commented Dec 4, 2023

Sorry to re-float this, I am trying to run Stitchr inside R in the same way as above, but this time trying to get rid of the python file that I create, and instead passing the arguments as variables to the stitch function.

I'm not very proficient with Python, so I think the below code needs converting the variables with reticulate, using r_to_py() or something.

this is my best attempt:

fxn <- reticulate::import("Stitchr.stitchrfunctions")
st <- reticulate::import("Stitchr.stitchr")
chain <- "TRB"
species <- "HUMAN"
output <- fxn$get_imgt_data(chain, st$gene_types, species)
tcr_dat <- output[[1]]
functionality <- output[[2]]
partial <- output[[3]]
codons <- fxn$get_optimal_codons('', species)
tcr_bits <- list(v = 'TRBV7-3*01',
                 j = 'TRBJ1-1*01',
                 cdr3 = 'CASSYLQAQYTEAFF',
                 l = 'TRBV7-3*01',
                 c = 'TRBC1*01',
                 skip_c_checks = 'False',
                 species = species,
                 seamless = 'False',
                 '5_prime_seq' = '',
                 '3_prime_seq' = '',
                 name = 'TCR')
stitched <- st$stitch(tcr_bits, tcr_dat, functionality, partial, codons, 3, '')

but I get:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  KeyError: 'TRBV7-3'

Do you have any insight on how to solve this? Thanks!

@JamieHeather
Copy link
Owner

I'm afraid that I don't use R much (or reticulate at all), and that's not a stitchr error, so I can't offer much insight here. I've confirmed that the TCR details and stitchr commands in the python example in the docs work, and it looks like they've been converted OK here.

Is the underlying data there/OK? What's the content of tcr_dat and output? Ordinarily stitchr will throw errors if it can't find the data in the right place, but I don't know how/if reticulate handles python errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants