In [5]:
from dotenv import load_dotenv
load_dotenv()
import scllm as sl
import scanpy as sc
import pandas as pd
from langchain_openai import ChatOpenAI

from IPython.display import display, Markdown, Latex

def pprint(text):
    display(Markdown(text))


In [3]:
openai_model = "gpt-4o-mini"
llm = ChatOpenAI(temperature=0.0, model=openai_model)
pbmc = sc.datasets.pbmc3k_processed()

In [4]:
sc.tl.leiden(pbmc)


 To achieve the future defaults please pass: flavor="igraph" and n_iterations=2.  directed must also be False to work with igraph's implementation.
  sc.tl.leiden(pbmc)


## Gather Descriptions

In [7]:
m0 = sl.tl.ClusterAnnotationDescription()

In [8]:
r0 = m0.fit(pbmc, llm, 'leiden')

Cluster 1

In [14]:
pprint(r0[0]['target'])

The list of genes you provided includes a mix of ribosomal protein genes (RPS and RPL genes), T-cell markers (CD3D, CD3E, IL7R), and other genes that are often associated with immune responses and cellular metabolism (such as LDHB and IL32). 

1. **Ribosomal Proteins (RPS and RPL)**: The presence of numerous ribosomal protein genes (RPS and RPL) suggests that the cells are likely to be actively translating proteins, which is a characteristic of many cell types, but particularly those that are rapidly proliferating or have high metabolic activity.

2. **T-cell Markers**: The genes CD3D, CD3E, and IL7R are well-known markers for T-cells. CD3 is a part of the T-cell receptor complex, and IL7R is important for T-cell development and homeostasis. 

3. **Other Immune-Related Genes**: The presence of IL32, which is involved in immune responses, further supports the idea that these cells are part of the immune system.

Given this combination of ribosomal proteins and T-cell markers, it is likely that the cell type represented by these genes is a **T-lymphocyte (T-cell)**, possibly a subtype such as CD4+ helper T-cells or CD8+ cytotoxic T-cells, depending on the specific context and additional markers that might be present in the dataset. 

In summary, the genes you listed are indicative of T-cells, particularly those that are actively engaged in immune responses.

In [None]:
pprint(r0[2]['target'])

The list of genes you provided is primarily associated with B cells and antigen-presenting cells, particularly those involved in the immune response. Here are some key points regarding the genes:

1. **Major Histocompatibility Complex (MHC) Class II Genes**: 
   - Genes such as **HLA-DRA**, **HLA-DRB1**, **HLA-DPB1**, **HLA-DQA1**, **HLA-DQB1**, **HLA-DPA1**, **HLA-DMA**, and **HLA-DMB** are part of the MHC class II molecules, which are crucial for presenting antigens to CD4+ T cells. This suggests a role in immune response and antigen presentation.

2. **B Cell Markers**:
   - Genes like **CD79A**, **CD79B**, **MS4A1** (also known as CD20), **FCRLA**, and **BLK** are well-known markers of B cells. They are involved in B cell receptor signaling and development.

3. **Other Immune-Related Genes**:
   - **CD74** is involved in MHC class II presentation and is often expressed in B cells and antigen-presenting cells.
   - **TCL1A** is associated with T cell activation and has been implicated in B cell malignancies.
   - **FCER2** (CD23) is a low-affinity IgE receptor expressed on B cells and is involved in B cell activation and regulation.

4. **Additional Genes**:
   - **BANK1** is involved in B cell signaling and is associated with autoimmune diseases.
   - **HVCN1** is involved in the regulation of reactive oxygen species in B cells.
   - **LTB** (lymphotoxin beta) is important for lymphoid organ development and B cell function.

Given this information, the genes you listed are indicative of **B cells**, particularly activated B cells or germinal center B cells, which are involved in the adaptive immune response. They may also represent a subset of B cells that are engaged in antigen presentation, possibly in the context of an immune response or in a disease state such as lymphoma or autoimmune disorders.

## Multiple terms

In [17]:
m1 = sl.tl.ClusterAnnotationTerms()

In [19]:
r1 = m1.fit(pbmc, llm, 'leiden')

In [22]:
r1[0]['term']

['Regulation of immune response',
 'Translation',
 'Cellular response to cytokine stimulus',
 'T cell activation',
 'Metabolic process']

In [23]:
r1[0]['features']

[['CD3D', 'CD3E', 'IL7R', 'IL32'],
 ['RPS25',
  'RPS27',
  'RPS12',
  'RPL31',
  'RPL9',
  'RPL30',
  'RPS29',
  'RPS3',
  'RPS6',
  'RPS15A',
  'RPL21',
  'RPS3A',
  'RPLP2',
  'RPL23A',
  'RPL3',
  'RPS20',
  'RPSA',
  'RPS14',
  'RPL32',
  'RPL13',
  'RPL27A',
  'TPT1'],
 ['IL7R', 'IL32'],
 ['CD3D', 'CD3E', 'IL7R'],
 ['LDHB']]