Copyright 2021 Regeneron Pharmaceuticals Inc. All rights reserved.

License for Non-Commercial Use of TCRAI code

All files in this repository (“source code”) are licensed under the following terms below:

“You” refers to an academic institution or academically employed full-time personnel only. 

“Regeneron” refers to Regeneron Pharmaceuticals, Inc.

Regeneron hereby grants You a right to use, reproduce, modify, or distribute the source code to the TCRAI algorithms, in whole or in part, whether in original or modified form, for academic research purposes only.  The foregoing right is royalty-free, worldwide, revocable, non-exclusive, and non-transferable.  

Prohibited Uses:  The rights granted herein do not include any right to use by commercial entities or commercial use of any kind, including, without limitation, any integration into other code or software that is used for further commercialization, any reproduction, copy, modification or creation of a derivative work that is then incorporated into a commercial product or service or otherwise used for any commercial purpose, or distribution of the source code not in conformity with the restrictions set forth above, whether in whole or in part and whether in original or modified form, and any such commercial usage is not permitted.  

Except as expressly provided for herein, nothing in this License grants to You any right, title or interest in and to the intellectual property of Regeneron (either expressly or by implication or estoppel).  Notwithstanding anything else in this License, nothing contained herein shall limit or compromise the rights of Regeneron with respect to its own intellectual property or limit its freedom to practice and to develop its products and product candidates.

If the source code, whole or in part and in original or modified form, is reproduced, shared or distributed in any manner, it must (1) identify Regeneron Pharmaceuticals, Inc. as the original creator, and (2) include the terms of this License.  

UNLESS OTHERWISE SEPARATELY AGREED UPON, THE SOURCE CODE IS PROVIDED ON AN AS-IS BASIS, AND REGENERON PHARMACEUTICALS, INC. MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE SOURCE CODE, IN WHOLE OR IN PART AND IN ORIGINAL OR MODIFIED FORM, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHER REPRESENTATIONS OR WARRANTIES. THIS INCLUDES, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. 

In no case shall Regeneron be liable for any loss, claim, damage, or expenses, of any kind, which may arise from or in connection with this License or the use of the source code. You shall indemnify and hold Regeneron and its employees harmless from any loss, claim, damage, expenses, or liability, of any kind, from a third-party which may arise from or in connection with this License or Your use of the source code. 

You agree that this License and its terms are governed by the laws of the State of New York, without regard to choice of law rules or the United Nations Convention on the International Sale of Goods.

Please reach out to Regeneron Pharmaceuticals Inc./Administrator relating to any non-academic or commercial use of the source code.

### Notebook for plotting Gene Usage diagrams

Note that to run this notebook, one will also require plotly : pip install plotly 

In [1]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import os

define where to load the tcr data for each cluster from, and where to save output

motif_dir0 should be set to the location of saved clusters for a dataset. scripts/motif_kmeans.py automatically saves results into a directory structure that will work with this simple script.

In [2]:
motif_dir0 = '../analysis/motif/CNN-prediction-with-REGN-pilot-version-2'
output_dir = os.path.join(os.path.expanduser('~'),'TCRAI_OUTPUT','gene_plots')

In [3]:
def save_figure(df_cluster, pmhc, cluster, counts=1):
    fig = go.Figure(go.Parcats(
        dimensions=[
            {'label': 'TRBV',
             'values': df_cluster['TRB_v_gene'].values},
            {'label': 'TRBJ',
             'values': df_cluster['TRB_j_gene'].values},
            {'label': 'TRAV',
             'values': df_cluster['TRA_v_gene'].values},
            {'label': 'TRAJ',
             'values': df_cluster['TRA_j_gene'].values},
        ],
        line={'color': pd.factorize(df_cluster['TRB_v_gene'])[0], #df_cluster['size'].values, 
              'colorscale': 'Rainbow',
              'shape': 'hspline'},
        bundlecolors=True,
        sortpaths='forward',
        labelfont=go.parcats.Labelfont(size=14),
        tickfont=go.parcats.Tickfont(size=11),
        counts=counts

    ))

    fig.update_layout(
        autosize=False,
        width=500,
        height=200,
        margin=dict(
            l=40,
            r=40,
            b=40,
            t=40,
            pad=4
        ),
    )
    fig.write_image(os.path.join(output_dir, pmhc+'_cluster'+str(cluster)+'.pdf') )

In [4]:
for pmhc in ['GLCTLVAML','GILGFVFTL']:
    for cluster in range(5):
        motif_dir = os.path.join(motif_dir0,pmhc)
        motif_dir = os.path.join(motif_dir,'cluster_'+str(cluster))
        
        try:
            df_cluster = pd.read_csv(os.path.join(motif_dir,pmhc+'_df_cluster'+str(cluster)+'.csv'))
        except:
            continue

        df_cluster['joint_genes'] = df_cluster['TRB_v_gene']+df_cluster['TRB_j_gene']+df_cluster['TRA_v_gene']+df_cluster['TRA_j_gene']

        gene_counts = df_cluster['joint_genes'].value_counts()

        df_cluster['size'] = df_cluster['joint_genes'].map(lambda x : gene_counts[x])
        print(pmhc,cluster)
        save_figure(df_cluster, pmhc, cluster)
        if pmhc=='GILGFVFTL':
            if cluster==0:
                joint_counts = df_cluster['joint_genes'].value_counts()
                print(len(joint_counts))
                save_figure(df_cluster[df_cluster['joint_genes'].map(lambda x: x in list(joint_counts.index[:30]))], 
                            pmhc, 
                            cluster,
                            counts=1)


GLCTLVAML 0
GLCTLVAML 1
GLCTLVAML 2
GLCTLVAML 3
GLCTLVAML 4
GILGFVFTL 0
235
GILGFVFTL 1
