## In this notebook, I will demostrate how to 
1. how to read from mtx
2. how to read from txt
3. how to add cell metadata
4. how to add gene metadata
5. how to add global variable (uns)
6. how to add umap coordinate (obsm)
7. how to add knn adjacency matrix (obsp)

In [None]:
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad

## loading mtx, make sure following three files in the folder
1. barcodes.csv
2. gene.csv
3. matrix.mtx

In [5]:
# load from mtx
adata = sc.read_10x_mtx('/Users/ligk2e/PycharmProjects/scanpy/data/filtered_gene_bc_matrices/hg19',var_names='gene_symbols')
adata.var_names_make_unique()

In [9]:
adata

AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'cell_meta_data'
    var: 'gene_ids', 'gene_meta_data'

## loading from txt, I prefer to build my own anndata

In [None]:
# from txt (not run, because file is on the cluster, just showcase the codes)
data = pd.read_csv('./counts.Kit_TA.txt',sep='\t',index_col=0)
adata = ad.AnnData(X=data.values.T,var=pd.DataFrame(index=data.index),obs=pd.DataFrame(index=data.columns))

## Modify your anndata at your will

In [12]:
# add cell metadata, obs is a pandas dataframe
cell_meta_data = np.random.rand(len(adata.obs_names))
adata.obs['cell_meta_data'] = cell_meta_data
print(adata)

AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'cell_meta_data'
    var: 'gene_ids', 'gene_meta_data'


In [13]:
# add gene metadata, var is a pandas dataframe
gene_meta_data = np.random.rand(len(adata.var_names))
adata.var['gene_meta_data'] = gene_meta_data
print(adata)

AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'cell_meta_data'
    var: 'gene_ids', 'gene_meta_data'


In [14]:
# add uns data, uns is just a python dictionary
data1 = [1,2,3]
data2 = {'test':[1,2,3]}
adata.uns['data1'] = data1
adata.uns['data2'] = data2
print(adata)

AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'cell_meta_data'
    var: 'gene_ids', 'gene_meta_data'
    uns: 'data1', 'data2'


In [15]:
# add umap coordinate
umap_x = np.random.rand(len(adata.obs_names))
umap_y = np.random.rand(len(adata.obs_names))
adata.obsm['X_umap'] = np.column_stack([umap_x,umap_y])
print(adata)

AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'cell_meta_data'
    var: 'gene_ids', 'gene_meta_data'
    uns: 'data1', 'data2'
    obsm: 'X_umap'


In [16]:
# add cell-cell distance matrix
adjacency = np.random.rand(len(adata.obs_names),len(adata.obs_names))
adata.obsp['distances'] = adjacency
print(adata)

AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'cell_meta_data'
    var: 'gene_ids', 'gene_meta_data'
    uns: 'data1', 'data2'
    obsm: 'X_umap'
    obsp: 'distances'


## Other operation and caveats
1. the same for adding varm and varp
2. when adding cell metadata, it will automatically detect if your input array is discrete or continuous, discrete (like cluster information) will be stored as pandas.Categories type for saving space, but it also limit the ability to run a lot of syntax we are familiar with of pandas normal Series object. I sometimes will perfer to transform pandas.Categories to pandas.Series with dtype 'str', by doing that, using a