Tcrdist working with sample data but not with my data #94

rutha32 · 2023-11-02T20:30:57Z

Hi, tcrdist works fine when I use the sample data (dash.csv), but when I try it with other datasets, I'm getting errors.

These are my columns: 'subject', 'epitope', 'count', 'v_a_gene', 'd_call', 'j_a_gene',
'cdr3_a_aa', 'cdr3_a_nucseq', 'junction', 'decombinator_id', 'rev_comp',
'productive', 'sequence_aa', 'cdr1_aa', 'cdr2_aa', 'chain', 'clone_id',
'time'],
dtype='object'

this is the error I get
ValueError: zero-size array to reduction operation maximum which has no identity

My code
import pandas as pd

file_path = r'C:\Users\pythonProject\ResearchProject\alpha_TCR_all_sample_100.csv'

df = pd.read_csv(file_path)

df.head()
from tcrdist.repertoire import TCRrep

tr = TCRrep(
cell_df=df,
organism='human',
chains=['alpha'],
db_file='alphabeta_gammadelta_db.tsv'
)

pw_alpha = tr.pw_alpha

Thanks

kmayerb · 2023-11-03T18:37:05Z

The most likely issue is that your V-gene names are not recognized. Do they have allele level resolution? If not, you can add "*01" for approximate result. V-genes must match one of the following values in the id columns -- https://github.com/kmayerb/tcrdist3/blob/master/tcrdist/db/alphabeta_gammadelta_db.tsv Alternatively you can define cdr1_a_aa, cdr2_a_aa, pmhc_a_aa your self instead of using TCRdist initialization to infer them: see `infer_cdrs = False.` https://github.com/kmayerb/tcrdist3/blob/55d906b19e4c5038f5fdde843eb2edf8293efd88/tcrdist/repertoire.py#L14-L69 Can you provide 10 lines of your input data?

…

On Thu, Nov 2, 2023 at 1:31 PM rutha32 ***@***.***> wrote: Hi, tcrdist works fine when I use the sample data, but when I try it with other datasets, I'm getting errors. These are my columns: 'subject', 'epitope', 'count', 'v_a_gene', 'd_call', 'j_a_gene', 'cdr3_a_aa', 'cdr3_a_nucseq', 'junction', 'decombinator_id', 'rev_comp', 'productive', 'sequence_aa', 'cdr1_aa', 'cdr2_aa', 'chain', 'clone_id', 'time'], dtype='object' this is the error I get ValueError: zero-size array to reduction operation maximum which has no identity *My code* import pandas as pd Define the file path file_path = r'C:\Users\pythonProject\ResearchProject\alpha_TCR_all_sample_100.csv' Read the CSV file into a DataFrame df = pd.read_csv(file_path) Display the first few rows of the DataFrame df.head() from tcrdist.repertoire import TCRrep Assuming you've already loaded your data into the 'df' DataFrame tr = TCRrep( cell_df=df, organism='human', chains=['alpha'], db_file='alphabeta_gammadelta_db.tsv' ) Calculate pairwise distances for the alpha chain pw_alpha = tr.pw_alpha Thanks — Reply to this email directly, view it on GitHub <#94>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALD2PVZP2PPBN5AC6CGPZQTYCP7IZAVCNFSM6AAAAAA63PBZC6VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TIOJYGY4TGNI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

rutha32 · 2023-11-06T17:49:05Z

Hi thanks for the reply, I got it working when I added the "*01". I removed the some of the columns and only kept the core columns count , v_a_gene, j_a_gene and cdr3_a_aa.

rutha32 closed this as completed Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tcrdist working with sample data but not with my data #94

Tcrdist working with sample data but not with my data #94

rutha32 commented Nov 2, 2023 •

edited

Loading

kmayerb commented Nov 3, 2023 via email

rutha32 commented Nov 6, 2023

Tcrdist working with sample data but not with my data #94

Tcrdist working with sample data but not with my data #94

Comments

rutha32 commented Nov 2, 2023 • edited Loading

kmayerb commented Nov 3, 2023 via email

rutha32 commented Nov 6, 2023

rutha32 commented Nov 2, 2023 •

edited

Loading