In [1]:
from dotenv import load_dotenv
load_dotenv()
import scllm as sl
import scanpy as sc
import pandas as pd
from langchain_openai import ChatOpenAI

from IPython.display import display, Markdown, Latex

def pprint(text):
    display(Markdown(text))


In [2]:
openai_model = "gpt-4o-mini"
llm = ChatOpenAI(temperature=0.0, model=openai_model)
pbmc = sc.datasets.pbmc3k_processed()

In [3]:
sc.tl.leiden(pbmc)


 To achieve the future defaults please pass: flavor="igraph" and n_iterations=2.  directed must also be False to work with igraph's implementation.
  sc.tl.leiden(pbmc)


## Gather Descriptions

In [5]:
m0 = sl.tl.ClusterAnnotationDescription('leiden')

In [5]:
r0 = m0.fit(pbmc, llm)

Cluster 1

In [6]:
pprint(r0[0]['target'])

The list of genes you provided includes a mix of ribosomal protein genes (RPS and RPL genes), T-cell markers (CD3D, CD3E, IL7R), and other genes that are often associated with immune responses and cellular metabolism (such as LDHB and IL32). 

1. **Ribosomal Proteins (RPS and RPL)**: The presence of many ribosomal protein genes (RPS and RPL) suggests that the cells are likely to be actively translating proteins, which is characteristic of many cell types, but particularly those that are rapidly proliferating or have high metabolic activity.

2. **T-cell Markers**: The genes CD3D, CD3E, and IL7R are well-known markers for T-cells. CD3 is part of the T-cell receptor complex, and IL7R is important for T-cell development and homeostasis. 

3. **Immune Response Genes**: The presence of IL32, which is involved in immune responses, further supports the idea that these cells are part of the immune system.

Given this combination of ribosomal proteins and T-cell markers, it is likely that the cell type represented by these genes is **T-cells**, possibly activated or proliferating T-cells, such as CD4+ helper T-cells or CD8+ cytotoxic T-cells. The expression of these genes could also indicate a specific state of activation or differentiation within the T-cell lineage.

In [7]:
pprint(r0[2]['target'])

The list of genes you provided is primarily associated with B cells and antigen-presenting cells, particularly those involved in the immune response. Here are some key points regarding the genes:

1. **Major Histocompatibility Complex (MHC) Class II Genes**: 
   - Genes such as **HLA-DRA**, **HLA-DPB1**, **HLA-DRB1**, **HLA-DPA1**, **HLA-DQA1**, **HLA-DQB1**, **HLA-DMA**, **HLA-DMB**, and **HLA-DRB5** are part of the MHC class II molecules, which are crucial for presenting antigens to CD4+ T cells. These genes are typically expressed in professional antigen-presenting cells (APCs) like dendritic cells, macrophages, and B cells.

2. **B Cell Markers**:
   - Genes like **CD79A**, **CD79B**, **MS4A1** (also known as CD20), **FCRLA**, **CD37**, and **BLK** are markers associated with B cells. They play roles in B cell receptor signaling and development.

3. **Other Immune-Related Genes**:
   - **TCL1A** is involved in T cell activation and is often associated with certain types of lymphomas.
   - **BANK1** is involved in B cell signaling and is associated with autoimmune diseases.
   - **FCER2** (CD23) is a low-affinity IgE receptor expressed on B cells and is involved in B cell activation and regulation.

4. **Additional Genes**:
   - **HVCN1** is involved in the regulation of reactive oxygen species in B cells.
   - **LTB** (lymphotoxin beta) is important for lymphoid organ development and B cell function.
   - **TSPAN13** and **EAF2** are less specific but can be involved in various cellular processes.

Given this information, the cell type that these genes likely refer to is **B cells**, particularly activated B cells or germinal center B cells, which are involved in the adaptive immune response. The presence of MHC class II genes also suggests that these B cells may be functioning as antigen-presenting cells.

## Multiple terms

In [4]:
m1 = sl.tl.ClusterAnnotationTerms('leiden')

In [5]:
r1 = m1.fit(pbmc, llm)

In [6]:
len(r1)

8

In [11]:
r1[0]['term']

['T cell activation',
 'Regulation of immune response',
 'Ribosome biogenesis',
 'Cellular response to cytokine stimulus',
 'Translation']

In [8]:
m2 = sl.tl.ClusterAnnotation('leiden')

TypeError: Can't instantiate abstract class ClusterAnnotation without an implementation for abstract method '_get_chain'

In [8]:
m2.fit(pbmc, llm)

[{'group': '0',
  'data': ['LDHB',
   'CD3D',
   'RPS25',
   'RPS27',
   'RPS27A',
   'RPS12',
   'RPL31',
   'LTB',
   'CD3E',
   'RPL9',
   'RPL30',
   'RPS29',
   'RPS3',
   'RPS6',
   'MALAT1',
   'RPS15A',
   'RPL21',
   'RPS3A',
   'RPLP2',
   'RPL23A',
   'RPL3',
   'IL7R',
   'RPS20',
   'IL32',
   'RPSA',
   'RPS14',
   'RPL32',
   'RPL13',
   'RPL27A',
   'TPT1'],
  'init': 0,
  'term': 'T cell',
  'features': ['CD3D',
   'CD3E',
   'IL7R',
   'LTB',
   'RPS25',
   'RPS27',
   'RPS12',
   'RPL31',
   'RPL9',
   'RPL30',
   'RPS29',
   'RPS3',
   'RPS6',
   'RPS15A',
   'RPL21',
   'RPS3A',
   'RPLP2',
   'RPL23A',
   'RPL3',
   'RPS20',
   'RPSA',
   'RPS14',
   'RPL32',
   'RPL13',
   'RPL27A',
   'TPT1',
   'MALAT1',
   'IL32']},
 {'group': '1',
  'data': ['FTL',
   'LYZ',
   'TYROBP',
   'S100A9',
   'CST3',
   'S100A8',
   'FTH1',
   'LGALS1',
   'S100A6',
   'FCN1',
   'S100A4',
   'AIF1',
   'LGALS2',
   'LST1',
   'GSTP1',
   'TYMP',
   'GPX1',
   'CTSS',
   'OAZ1',
  

In [12]:
m3  = sl.tl.FactorAnnotation(
    'PCs',
)

In [13]:
m3._prepare(pbmc)[0].keys()

dict_keys(['data', 'factor', 'genes', 'sign', 'init'])

In [14]:
r3 = m3.fit(pbmc, llm)

In [15]:
pd.DataFrame(r3)

Unnamed: 0,data,factor,genes,sign,init,term,features
0,"[CST3, TYROBP, FCN1, LST1, AIF1, S100A8, TYMP,...",0,"[CST3, TYROBP, FCN1, LST1, AIF1, S100A8, TYMP,...",+,0,Monocyte/Macrophage,"[CST3, TYROBP, FCN1, LST1, AIF1, S100A8, TYMP,..."
1,"[NKG7, GZMB, PRF1, CST7, GZMA, FGFBP2, GNLY, C...",1,"[NKG7, GZMB, PRF1, CST7, GZMA, FGFBP2, GNLY, C...",+,0,Cytotoxic T Cell,"[NKG7, GZMB, PRF1, CST7, GZMA, FGFBP2, GNLY, C..."
2,"[PF4, PPBP, SDPR, SPARC, GNG11, HIST1H2AC, GP9...",2,"[PF4, PPBP, SDPR, SPARC, GNG11, HIST1H2AC, GP9...",+,0,Platelet,"[PF4, PPBP, SDPR, SPARC, GNG11, HIST1H2AC, GP9..."
3,"[CD79A, HLA-DQA1, CD79B, MS4A1, HLA-DQB1, HLA-...",3,"[CD79A, HLA-DQA1, CD79B, MS4A1, HLA-DQB1, HLA-...",+,0,B cell,"[CD79A, HLA-DQA1, CD79B, MS4A1, HLA-DQB1, HLA-..."
4,"[FCGR3A, CTD-2006K23.1, IFITM3, ABI3, CEBPB, C...",4,"[FCGR3A, CTD-2006K23.1, IFITM3, ABI3, CEBPB, C...",+,0,Monocyte,"[FCGR3A, IFITM3, C1QA, C1QB, CD79B, FCGR2A, AIF1]"
...,...,...,...,...,...,...,...
95,"[NARS, AGPAT1, HPRT1, SHOC2, FBXO4, TMEM50A, R...",45,"[NARS, AGPAT1, HPRT1, SHOC2, FBXO4, TMEM50A, R...",-,0,Cardiomyocyte,"[NARS, AGPAT1, HPRT1, SHOC2, FBXO4, TMEM50A, R..."
96,"[ALG5, OSBPL7, ARAP1, TMX2, ARMCX5, TIGIT, OAS...",46,"[ALG5, OSBPL7, ARAP1, TMX2, ARMCX5, TIGIT, OAS...",-,0,T cells,"[ALG5, OSBPL7, ARAP1, TMX2, ARMCX5, TIGIT, OAS..."
97,"[ATP6V1C1, NME6, LRRK2, REXO2, SCP2, SLC4A7, N...",47,"[ATP6V1C1, NME6, LRRK2, REXO2, SCP2, SLC4A7, N...",-,0,Macrophage,"[ATP6V1C1, NME6, LRRK2, REXO2, SCP2, SLC4A7, N..."
98,"[LMF2, SAMD4B, TMEM80, TRUB2, BACH1, DESI1, R3...",48,"[LMF2, SAMD4B, TMEM80, TRUB2, BACH1, DESI1, R3...",-,0,T cell,"[LMF2, SAMD4B, TMEM80, TRUB2, BACH1, DESI1, R3..."


In [8]:
m4 = sl.tl.FactorAnnotationTerms('PCs', factors='1')

In [9]:
res = m4.fit(pbmc, llm)

In [14]:
res[1]['term']

['B cell', 'Naive B cell', 'Memory B cell', 'Activated B cell', 'Plasma cell']

In [None]:
res[0]['term']

In [3]:
m5 = sl.tl.FactorDescription('PCs', factors='1')

In [4]:
r5 = m5.fit(pbmc, llm)

In [10]:
pprint(r5[0]['target'])

The list of genes you provided is primarily associated with immune cell functions, particularly those related to cytotoxic T cells and natural killer (NK) cells. Here’s a brief overview of some of the key genes and their roles:

1. **Cytotoxicity Markers**:
   - **NKG7**, **GZMB**, **PRF1**, **GZMA**, **GZMH**: These genes encode proteins involved in the cytotoxic activity of NK cells and CD8+ T cells. They are associated with the release of cytotoxic granules that induce apoptosis in target cells.
   - **CST7**: This gene encodes cystatin F, which is involved in the regulation of proteases in immune responses.

2. **Cytokines and Chemokines**:
   - **CCL4**, **CCL5**, **XCL1**, **XCL2**: These are chemokines that play roles in the recruitment and activation of immune cells.
   - **FGFBP2**: This gene is involved in cell signaling and may play a role in immune responses.

3. **Receptors and Signaling**:
   - **FCGR3A**: This gene encodes a receptor for the Fc region of immunoglobulins, which is important for antibody-dependent cellular cytotoxicity (ADCC).
   - **S1PR5**: This gene encodes a sphingosine-1-phosphate receptor, which is involved in lymphocyte trafficking and immune responses.

4. **Other Immune-Related Genes**:
   - **CD247**: This gene encodes a component of the T cell receptor complex, essential for T cell activation.
   - **HOPX**: This gene is involved in the regulation of gene expression in immune cells.

Given the expression of these genes, the cell type most likely referred to is **cytotoxic T lymphocytes (CTLs)** or **natural killer (NK) cells**. These cells are crucial components of the adaptive and innate immune systems, respectively, and are involved in the direct killing of infected or malignant cells. The presence of chemokines and receptors also suggests a role in immune cell signaling and migration.