## If you are working with single cell (scanpy package, https://scanpy.readthedocs.io/en/stable/), you will notice AnnData object (https://anndata.readthedocs.io/en/latest/).

## There are some properties of Anndata that people usually overlooked, so I want to summarize some of the caveats here

1. The obs_name and var_name (index of obs dataframe and var dataframe) can not have name, it won't be written to h5ad properly.
2. depending on how you construct your anndata, the X can be sparse or dense, make sure to check it
3. subset Anndata will result in a AnnDataView (not always though, depending on how you subset the adata), which is different from pandas behavior, although modifying the AnnDataView will result in copy-on-modify. It is still good to keep in mind to not modify AnnDataView at your will. Especially, replace .X will not trigger copy-on-modify.
4. Subset Anndata and then accsee X will return ArrayView object, I would recommend to mannually copy if you want to modify. 
5. assign adata = old_adata is not the same as subsetting, it is literally just point to the original one, so you can not modify at your own will.

In [1]:
import anndata as ad
import pandas as pd
import numpy as np

## Now let's talk about the concat 

In [2]:
# using AnnData.concatenate() function
# this is easier, the obsm, varm and uns will be ignored
# can only concat obs
# only consider var and obs
# assuming the var name is equal
a_obs = pd.DataFrame(data=np.random.random((3,2)),index=['i1','i2','i3'],columns=['obs_a','obs_c'])
a_var = pd.DataFrame(index=['i1','i2'],data=np.random.random((2,2)),columns=['var_a','var_c'])
a = ad.AnnData(X=np.random.random([3,2]),obs=a_obs,var=a_var)

b_obs = pd.DataFrame(data=a_obs.values,index=['i1','i2','i3'],columns=['obs_b','obs_c'])
b_var = pd.DataFrame(index=['i1','i2'],data=np.random.random((2,2)),columns=['var_b','var_c'])
b = ad.AnnData(X=np.random.random([3,2]),obs=b_obs,var=b_var)

In [3]:
a

AnnData object with n_obs × n_vars = 3 × 2
    obs: 'obs_a', 'obs_c'
    var: 'var_a', 'var_c'

In [4]:
b

AnnData object with n_obs × n_vars = 3 × 2
    obs: 'obs_b', 'obs_c'
    var: 'var_b', 'var_c'

In [5]:
a.concatenate(b) 
# shared and unique obs will all be kept, missing value will be filled to unique column
# assuming the var name is equal, then just naturally keep all columns unique and shared across two objects

AnnData object with n_obs × n_vars = 6 × 2
    obs: 'obs_a', 'obs_c', 'obs_b', 'batch'
    var: 'var_a-0', 'var_c-0', 'var_c-1', 'var_b-1'

In [6]:
# using ad.concat() function can allow you to concat var as well,
# and varm and obsm can be concat as well
# uns, I decided to not consider that
# again, assuming the var is equal
new = ad.concat([a,b],axis=0,join='outer',merge='first',label='batch',keys=['a','b'])
# adding a batch column to obs, indicating the origins, labelled by "keys"
# join control the axis, merge control the another axis, we hope to be outer for both

Observation names are not unique. To make them unique, call `.obs_names_make_unique`.


In [7]:
new

AnnData object with n_obs × n_vars = 6 × 2
    obs: 'obs_a', 'obs_c', 'obs_b', 'batch'
    var: 'var_a', 'var_c', 'var_b'

In [8]:
a.var

Unnamed: 0,var_a,var_c
i1,0.621512,0.273783
i2,0.365935,0.165045


In [9]:
b.var

Unnamed: 0,var_b,var_c
i1,0.245504,0.44106
i2,0.458893,0.774106


In [10]:
new.var

Unnamed: 0,var_a,var_c,var_b
i1,0.621512,0.273783,0.245504
i2,0.365935,0.165045,0.458893


In [11]:
# now you understand how "first" control the way to display shared var, unlike concatenate() function, 
# now it takes the value that it firstly encoutered, which is a object, so if you have two objects that have shared
# column in var, make sure they are the same