-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tcrdist working with sample data but not with my data #94
Comments
The most likely issue is that your V-gene names are not recognized. Do they
have allele level resolution? If not, you can add "*01" for approximate
result. V-genes must match one of the following values in the id columns --
https://github.com/kmayerb/tcrdist3/blob/master/tcrdist/db/alphabeta_gammadelta_db.tsv
Alternatively you can define cdr1_a_aa, cdr2_a_aa, pmhc_a_aa your
self instead of using TCRdist initialization to infer them:
see `infer_cdrs = False.`
https://github.com/kmayerb/tcrdist3/blob/55d906b19e4c5038f5fdde843eb2edf8293efd88/tcrdist/repertoire.py#L14-L69
Can you provide 10 lines of your input data?
…On Thu, Nov 2, 2023 at 1:31 PM rutha32 ***@***.***> wrote:
Hi, tcrdist works fine when I use the sample data, but when I try it with
other datasets, I'm getting errors. These are my columns: 'subject',
'epitope', 'count', 'v_a_gene', 'd_call', 'j_a_gene',
'cdr3_a_aa', 'cdr3_a_nucseq', 'junction', 'decombinator_id', 'rev_comp',
'productive', 'sequence_aa', 'cdr1_aa', 'cdr2_aa', 'chain', 'clone_id',
'time'],
dtype='object'
this is the error I get
ValueError: zero-size array to reduction operation maximum which has no
identity
*My code*
import pandas as pd
Define the file path
file_path =
r'C:\Users\pythonProject\ResearchProject\alpha_TCR_all_sample_100.csv'
Read the CSV file into a DataFrame
df = pd.read_csv(file_path)
Display the first few rows of the DataFrame
df.head()
from tcrdist.repertoire import TCRrep
Assuming you've already loaded your data into the 'df' DataFrame
tr = TCRrep(
cell_df=df,
organism='human',
chains=['alpha'],
db_file='alphabeta_gammadelta_db.tsv'
)
Calculate pairwise distances for the alpha chain
pw_alpha = tr.pw_alpha
Thanks
—
Reply to this email directly, view it on GitHub
<#94>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALD2PVZP2PPBN5AC6CGPZQTYCP7IZAVCNFSM6AAAAAA63PBZC6VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TIOJYGY4TGNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi thanks for the reply, I got it working when I added the "*01". I removed the some of the columns and only kept the core columns count , v_a_gene, j_a_gene and cdr3_a_aa. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
tcrdist_alpha_sample.pdf
Hi, tcrdist works fine when I use the sample data (dash.csv), but when I try it with other datasets, I'm getting errors.
These are my columns: 'subject', 'epitope', 'count', 'v_a_gene', 'd_call', 'j_a_gene',
'cdr3_a_aa', 'cdr3_a_nucseq', 'junction', 'decombinator_id', 'rev_comp',
'productive', 'sequence_aa', 'cdr1_aa', 'cdr2_aa', 'chain', 'clone_id',
'time'],
dtype='object'
this is the error I get
ValueError: zero-size array to reduction operation maximum which has no identity
My code
import pandas as pd
file_path = r'C:\Users\pythonProject\ResearchProject\alpha_TCR_all_sample_100.csv'
df = pd.read_csv(file_path)
df.head()
from tcrdist.repertoire import TCRrep
tr = TCRrep(
cell_df=df,
organism='human',
chains=['alpha'],
db_file='alphabeta_gammadelta_db.tsv'
)
pw_alpha = tr.pw_alpha
Thanks
The text was updated successfully, but these errors were encountered: