List of Tools #15

grst · 2020-04-09T14:45:38Z

In GitLab by @grst on Jan 30, 2020, 12:59

Tools are functions that work with the data parsed from 10x/tracer and add either

new columns to obs
new matrices to obsm (e.g. distance matrices)
other summary data to uns.

They are usually required as an additional processing step before running certain plotting functions.
Here's a list of tools we want to implement.

@szabogtamas, feel free to add to/edit the list.

List of tools

st.tl.define_clonotypes(adata) assignes clonotypes to cells based on their CDR3 sequences
st.tl.tcr_dist(adata, chains=["TRA_1, "TRB_1"], combination=np.min) adds TCR dist to obsm (TCR dist #11)
st.tl.kidera_dist adds Kidera distances to obsm
st.tl.chain_convergence(adata, groupby) adds column to obs that contains the number of nucleotide versions for each CDR3 AA sequence
st.tl.alpha_diversity(adata, groupby, diversityforgroup) Now we were only thinking about calculating diversity of clonotypes in different groups. But the diversity of any group could just as well be calculated.
st.tl.sequence_logos(adata, ?forgroup?) Precompute MSAs and sequence logos for plotting with st.pl.sequence_logos.
st.tl.dendrogram(adata, groupby) Compute a dendrogram on an arbitrary distance matrix (e.g. from tcr_dist).

Needs discussion

st.tl.create_group(group_membership={Group1: ['barcode1', barcode2']} adds a group membership to each cell by adding a column to obsm and the name of the grouping to a list in uns (by default, groups based on samples, V gene usage and even clonotypes could be created at initial run); might call chain_convergence and alpha_diversity functions to calculate these measures right when creating a group

Ideas, might be implemented at later stage

Shared Kmers
GLIPH
Chains recognizing the same eiptopes based on McPAS-TCR
epitope reactivity -> query external database
tcellmatch (Fischer, Theis et al. )

The text was updated successfully, but these errors were encountered:

grst · 2020-04-09T14:45:41Z

In GitLab by @szabogtamas on Jan 30, 2020, 15:27

changed the description

grst · 2020-04-09T14:45:44Z

In GitLab by @grst on Jan 30, 2020, 15:52

st.tl.create_group(group_membership={Group1: ['barcode1', barcode2']}

A common pattern how this is handled with scanpy/anndata is to add another column to obs like this:

adata.obs["cell_type"] = "na" # initialize default value
adata.obs.loc[['barcode1', 'barcode2'], "group1"] = "CD8+ T cells"

So I don't think we need a tool for that.

grst · 2020-04-09T14:45:47Z

In GitLab by @szabogtamas on Jan 31, 2020, 10:03

The reason I was thinking about a separate tool was that diversity (and also convergence) could be calculated at the same time a group is created.

The information what columns in the obs are groups would be usefull in a scenario, when we want to see an information for all possible groupings (e.g. CDR3 length by sample, V genes, receptor pairing status, cell types, etc.) it would be convenient to just loop through the grouping columns and plot the same.

grst · 2020-04-09T14:45:51Z

In GitLab by @grst on Jan 31, 2020, 10:11

changed the description

grst · 2020-04-09T14:45:54Z

In GitLab by @grst on Jan 31, 2020, 10:18

I'm not a big fan of implicitly computing stuff that might not even be needed by the user.

Also, in such a case you can simply do

for group in ["TRA_1_cdr3_len", "TRB_1_cdr3_len", "TRA_1_v_gene", ...]:
   st.pl.cdr3_length(adata, group=group)

Or, we might want to support an API that supports multiple groups, as scanpy does for the color attribute in umap

st.pl.cdr3_length(adata, group=["TRA_1_cdr3_len", "TRB_1_cdr3_len", "TRA_1_v_gene", ...])

But we can discuss that in more detail the next time we meet

grst · 2020-04-09T14:45:57Z

In GitLab by @szabogtamas on Jan 31, 2020, 10:34

It definitively makes sense to have a tool function that precomputes data and a plotting function to visualize. My only concern was that in our case, much of that information is never reused by anything other the actual plot. What I would see a great saving here, however, is to write the computed values into a table (or json) so that the plot can be generated (and the more important point here: modified to match a given visual style, or remove groups, add p-values, etc.) by just parsing a small file and not having to load the whole big table of obs... Just wondering.

So let us stick to the canonical way and make tool functions, even if it seems duplicate. Since they will mostly compute group-level statistics, I would suggest adding the result to uns

Should this be a nested dictionary in uns, like {sample_grouping: {name: 'Samples', groups: {g1: 'Sample 1', g2: 'Sample 2'}, diversities: {div_by_clonotypes: {name: 'Repertoire diversity', values: {g1: 2.3, g2: 2.7}}}, convergences: {conv_by_clonotypes: {name: 'Repertoire convergence', cdrlengths: {cdrlength: {name: 'Length distribution of CDR regions', values: {g1: [6, 9, 11], g2: [6, 9, 6]}}}, pairingratios: {cpr_by_clonotypes: {name: 'Ratio of unconventional number of chains', labels: {orhpan_alpha: 'Alpha chain only', orhpan_beta: 'Beta chain only'}, values: {orhpan_alpha: 0.3, orhpan_beta: 0.7}}}}

Or should we go for objects stored in the uns?

grst · 2020-04-09T14:46:00Z

In GitLab by @grst on Jan 31, 2020, 10:40

Or should we go for objects stored in the uns?

This is not a good idea, since it cannot be stored by AnnData (yet). See the discussion at scverse/anndata#115.
We should stick to std python dictionaries here.

Should this be a nested dictionary in uns

That's probably the way to go.

grst · 2020-04-09T14:46:03Z

In GitLab by @szabogtamas on Jan 31, 2020, 10:46

Well, I am not against dictionaries, and it usually comes to this for me in python: objects are nice, but let's just stay with a dictionary...

We only have to agree on a structure for the dictionary then. But I guess this will be flexible in the beginning and evolve as more tool functions are implemented.

grst · 2020-04-09T14:46:06Z

In GitLab by @grst on Jan 31, 2020, 10:50

I would go for one entry for each tool. What's done within this entry is flexible and can be decided on a tool-by-tool basis.
Also, because scanpy also uses the uns I would prefix every entry with tcr_ to avoid name conflicts.

Example

adata.uns["tcr_alpha_diversity"] = { ... }
adata.uns["tcr_sequence_logos"] = { ... }

Alternatively, we could go for another sub-dictionary:

adata.uns["sctcrpy"]["alpha_diversity"] = { ... }
adata.uns["sctcrpy"]["sequence_logos"] = { ... }

grst · 2020-04-09T14:46:09Z

In GitLab by @szabogtamas on Jan 31, 2020, 10:54

Yes, we can discuss this later, we don't need to deal with this right now. It is probably also the matter of what level of users we want to support - this idea was something towards automatically generating an exploratory report that can be refined by the user but points out right away some issues that are worth investigating. At this point, we should just leave it.

grst · 2020-04-09T14:46:12Z

In GitLab by @szabogtamas on Jan 31, 2020, 10:57

I would prefer the subdirectory just because it saves us the prefix. But this is not crucial.

grst · 2020-04-09T14:46:15Z

In GitLab by @grst on Feb 2, 2020, 19:48

marked the task st.tl.alpha_diversity(adata, groupby, diversityforgroup) Now we were only thinking about calculating diversity of clonotypes in different groups. But the diversity of any group could just as well be calculated. as completed

grst · 2020-04-09T14:46:19Z

In GitLab by @grst on Feb 4, 2020, 14:14

changed the description

grst · 2020-04-09T14:46:22Z

In GitLab by @szabogtamas on Feb 12, 2020, 13:47

marked the task st.tl.tcr_dist(adata, chains=["TRA_1, "TRB_1"], combination=np.min) adds TCR dist to obsm (#11) as completed

grst · 2020-04-09T14:46:25Z

In GitLab by @szabogtamas on Feb 12, 2020, 13:47

marked the task st.tl.kidera_dist adds Kidera distances to obsm as completed

grst · 2020-04-09T14:46:28Z

In GitLab by @szabogtamas on Feb 12, 2020, 13:47

marked the task st.tl.chain_convergence(adata, groupby) adds column to obs that contains the number of nucleotide versions for each CDR3 AA sequence as completed

grst · 2020-04-09T14:46:31Z

In GitLab by @szabogtamas on Feb 12, 2020, 13:47

closed

grst closed this as completed Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of Tools #15

List of Tools #15

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

List of Tools #15

List of Tools #15

Comments

grst commented Apr 9, 2020

List of tools

Needs discussion

Ideas, might be implemented at later stage

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020

grst commented Apr 9, 2020