<center> Vizum HD data set compared between 2 to 8 micron 

## Analyzing Spatial Transcriptomics Data from Visium Output

In this blog, we delve into the world of spatial transcriptomics, using the 'Mouse_Brain_square_example' provided by 10X Genomics. This example serves as a practical illustration of how spatial transcriptomic technologies can enhance our understanding of biological processes within specific tissue sections.

## Comparison Between 2 Micron Bins and 8 Micron Bins

An essential part of our analysis involves comparing data resolution at different scales. We will explore the distinctions between 2 micron bins and 8 micron bins. This comparison aims to highlight how the resolution affects our interpretation of the transcriptomic data, potentially impacting the conclusions we can draw about cellular functions and interactions within the tissue.

## Displaying Different Representations of Areas with Low and High Read Counts

Further, it is crucial to visualize how areas of low and high read counts are represented in the data. These visualizations help elucidate the distribution and density of transcriptomic activity across the tissue sample. By examining these differences, we can gain insights into the heterogeneity of gene expression and its implications for understanding tissue structure and function.



In [2]:
import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

In [4]:
def display_directory_map(dir_path, indent=0):
    # Print the current directory name
    print('  ' * indent + '|-- ' + os.path.basename(dir_path))
    # Get all items in the directory
    for item in os.listdir(dir_path):
        # Full path of the item
        path = os.path.join(dir_path, item)
        # Check if it's a directory and recurse into it
        if os.path.isdir(path):
            display_directory_map(path, indent + 1)
        else:
            # It's a file, print its name
            print('  ' * (indent + 1) + '|-- ' + item)

# Start directory map display from the specified path
display_directory_map('/data/kanferg/Sptial_Omics/playGround/Data/Visium_HD_Mouse_Brain_square_example')

|-- Visium_HD_Mouse_Brain_square_example
  |-- ._.DS_Store
  |-- ._square_008um
  |-- ._square_002um
  |-- square_008um
    |-- ._analysis
    |-- ._.DS_Store
    |-- raw_feature_bc_matrix
      |-- ._features.tsv.gz
      |-- ._matrix.mtx.gz
      |-- features.tsv.gz
      |-- barcodes.tsv.gz
      |-- ._barcodes.tsv.gz
      |-- matrix.mtx.gz
    |-- analysis
      |-- clustering
        |-- ._gene_expression_graphclust
        |-- ._gene_expression_kmeans_4_clusters
        |-- gene_expression_kmeans_6_clusters
          |-- clusters.csv
          |-- ._clusters.csv
        |-- ._gene_expression_kmeans_5_clusters
        |-- ._gene_expression_kmeans_10_clusters
        |-- gene_expression_kmeans_4_clusters
          |-- clusters.csv
          |-- ._clusters.csv
        |-- ._gene_expression_kmeans_2_clusters
        |-- gene_expression_kmeans_2_clusters
          |-- clusters.csv
          |-- ._clusters.csv
        |-- gene_expression_kmeans_3_clusters
          |-- clusters.csv
  

In [3]:
path2mc = '/data/kanferg/Sptial_Omics/playGround/Data/Visium_HD_Mouse_Brain_square_example/square_002um'
path8mc = '/data/kanferg/Sptial_Omics/playGround/Data/Visium_HD_Mouse_Brain_square_example/square_008um'

In [19]:
adata2mc = sc.read_visium(path = path8mc)
adata2mc

  utils.warn_names_duplicates("var")
  utils.warn_names_duplicates("var")


OSError: Could not find '/data/kanferg/Sptial_Omics/playGround/Data/Visium_HD_Mouse_Brain_square_example/square_008um/spatial/tissue_positions_list.csv'

The missing file ```tissue_positions_list.csv``` is replaced with tissue_position.parquet. 

In [6]:
def parquet_to_csv(path):
    # Read the Parquet file
    df = pd.read_parquet(os.path.join(path,'tissue_positions.parquet'))
    # Write to a CSV file
    df.to_csv(os.path.join(path,'tissue_positions_list.csv'), index=False)
parquet_to_csv(path2mc + '/spatial')
parquet_to_csv(path8mc + '/spatial')

In [4]:
adata2mc = sc.read_visium(path = path2mc)
adata8mc = sc.read_visium(path = path8mc)

  utils.warn_names_duplicates("var")
  utils.warn_names_duplicates("var")
  positions = pd.read_csv(
  utils.warn_names_duplicates("var")
  utils.warn_names_duplicates("var")
  positions = pd.read_csv(


In [9]:
adata2mc

AnnData object with n_obs × n_vars = 6296688 × 19059
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

The ```adata``` shows that it found 6_296_688 n_obs (or cells or tiles)  19_059 n_vars (genes).

In [11]:
adata2mc.obsm['spatial'],adata2mc.obsm['spatial'].shape

(array([[9295.414883943817, 18095.4075623532],
        [5556.6583091044295, 5393.048694317109],
        [13608.742910510004, 2385.629487758882],
        ...,
        [5004.368167173434, 3620.2772970198544],
        [13680.331660278203, 2289.6081731694144],
        [9735.187714137932, 6533.6050688085215]], dtype=object),
 (6296688, 2))

```adata.obsm[‘spatial’]``` each coordinate corresponds to a specific cell for each feature. 6_296_688 by 2. 

In [12]:
adata2mc.var_names_make_unique()
adata8mc.var_names_make_unique()
adata2mc

AnnData object with n_obs × n_vars = 6296688 × 19059
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

In [14]:
adata2mc.uns

OrderedDict([('spatial',
              {'Visium_HD_Mouse_Brain': {'images': {'hires': array([[[0.92941177, 0.92941177, 0.92941177],
                         [0.9254902 , 0.9254902 , 0.92941177],
                         [0.92156863, 0.9254902 , 0.93333334],
                         ...,
                         [0.91764706, 0.91764706, 0.9254902 ],
                         [0.9137255 , 0.91764706, 0.9254902 ],
                         [0.9137255 , 0.91764706, 0.9254902 ]],
                 
                        [[0.9254902 , 0.9254902 , 0.93333334],
                         [0.9254902 , 0.9254902 , 0.93333334],
                         [0.9254902 , 0.92941177, 0.9372549 ],
                         ...,
                         [0.92156863, 0.9254902 , 0.93333334],
                         [0.91764706, 0.92156863, 0.92941177],
                         [0.91764706, 0.92156863, 0.92941177]],
                 
                        [[0.92941177, 0.92941177, 0.9372549 ],
              

[Analysis and visualization of spatial transcriptomics data](https://scanpy.readthedocs.io/en/stable/tutorials/spatial/basic-analysis.html)

In [25]:
adata2mc.var["mt"] = adata2mc.var_names.str.startswith("mt-")
adata8mc.var["mt"] = adata8mc.var_names.str.startswith("mt-")

Unnamed: 0,gene_ids,feature_types,genome,mt
Xkr4,ENSMUSG00000051951,Gene Expression,mm10,False
Rp1,ENSMUSG00000025900,Gene Expression,mm10,False
Sox17,ENSMUSG00000025902,Gene Expression,mm10,False
Lypla1,ENSMUSG00000025903,Gene Expression,mm10,False
Tcea1,ENSMUSG00000033813,Gene Expression,mm10,False
...,...,...,...,...
mt-Nd4,ENSMUSG00000064363,Gene Expression,mm10,True
mt-Nd5,ENSMUSG00000064367,Gene Expression,mm10,True
mt-Nd6,ENSMUSG00000064368,Gene Expression,mm10,True
mt-Cytb,ENSMUSG00000064370,Gene Expression,mm10,True
