# The Metadata object

Metadata have to be provided as a table.

There are two obligatory columns: 'sample_ID' and 'file_name'. It is highly recommended to add a column
'staining' as well, since this information is required for the automated cofactor calculation.

'sample_ID' can be filled with anything, as long as the entries are unique. We used ascending
integers here.
'file_name' has to be filled with the .fcs file names, including the data extension .fcs.
Only the files specified here will be read.
'staining' can be filled with either 'unstained' or 'stained'.

Here, we read in example metadata as a normal dataframe via the `pandas` library:

In [1]:
import pandas as pd

user_metadata = pd.read_csv("../Tutorials/spectral_dataset/metadata.csv", sep = ";")
user_metadata.head()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,batch
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1
2,3,4449.fcs,healthy,4449,PB,stained,healthy,healthy,4449,PBMC,2
3,4,5143.fcs,healthy,5143,PB,stained,healthy,healthy,5143,PBMC,2
4,5,6042.fcs,healthy,6042,PB,stained,healthy,healthy,6042,PBMC,1


## Create metadata from a pandas dataframe

In order to create a FACSPy-readable Metadata object, we use the `fp.dt.Metadata` class where 'fp' is the alias for FACSPy and 'dt' stands for dataset.

In this scenario, we use the metadata table that we read via the pandas library from above. We pass the table via
the `metadata` parameter. 

A `Metadata` object is created with 36 entries. 'factors' refer to the column names specifying the individual parameters.

In [2]:
import FACSPy as fp

In [3]:
metadata = fp.dt.Metadata(metadata = user_metadata)
metadata

Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])

## Create metadata from a .csv file


We can also read the metadata table directly from the hard drive. In order to do that, we pass the path to the `fp.dt.Metadata` class. Any file format that can be accessed by `pd.read_csv()` can be used.

In [4]:
metadata = fp.dt.Metadata(file = "../Tutorials/spectral_dataset/metadata.csv")
metadata

Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])

## Access the metadata table

The underlying table is stored in the `.dataframe` attribute and can be accessed and modified.

Use the method `.to_df()` to return the underlying table or directly access the table via `.dataframe` as shown here.

In [5]:
df = metadata.dataframe
df.head()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,batch
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1
2,3,4449.fcs,healthy,4449,PB,stained,healthy,healthy,4449,PBMC,2
3,4,5143.fcs,healthy,5143,PB,stained,healthy,healthy,5143,PBMC,2
4,5,6042.fcs,healthy,6042,PB,stained,healthy,healthy,6042,PBMC,1


In [6]:
df = metadata.to_df()
df.head()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,batch
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1
2,3,4449.fcs,healthy,4449,PB,stained,healthy,healthy,4449,PBMC,2
3,4,5143.fcs,healthy,5143,PB,stained,healthy,healthy,5143,PBMC,2
4,5,6042.fcs,healthy,6042,PB,stained,healthy,healthy,6042,PBMC,1


## Access metadata factors

In order to access the parameters that the user has specified in the metadata, use the `.get_factors()` method.

In [7]:
metadata.get_factors()

['group_fd',
 'internal_id',
 'organ',
 'diag_main',
 'diag_fine',
 'donor_id',
 'material',
 'batch']

## Rename columns

Metadata table columns can be renamed by the `.rename()` method. It expects two arguments: the current column name and the new column name. Note that the change happens inplace.

In [8]:
metadata.rename_column(current_name = "batch", new_name = "newly_named_batch")
metadata.dataframe.columns

Index(['sample_ID', 'file_name', 'group_fd', 'internal_id', 'organ',
       'staining', 'diag_main', 'diag_fine', 'donor_id', 'material',
       'newly_named_batch'],
      dtype='object')

## Subset metadata

In order to subset the metadata, the `.subset()` method can be used. The function expects the current column and a list of entries in that column.

In [9]:
metadata.subset(column = "file_name", values = ["3742.fcs", "4337.fcs"])
metadata.to_df()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,newly_named_batch
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1


## Add annotations to metadata

In order to add a new annotation to the metadata, we can use the `.annotate()` method. Currently, the filenames or sample_IDs can be passed as a list or a singular value, the second argument specifies the column that is created and the third argument specifies the value that is added.

In [10]:
metadata.annotate(
    file_names = ["3742.fcs", "4337.fcs"],
    column = "new_col",
    value = "new_val"
)
metadata.dataframe.head()

  self.dataframe.loc[self.dataframe["file_name"].isin(file_names), column] = value


Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,newly_named_batch,new_col
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1,new_val
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1,new_val


## Rename values

Entries can be modified using the pandas notation, or via the convenience method `.rename_factors()`. In the next example, every entry gets renamed to 'renamed_val'.

In [11]:
metadata.rename_values("new_col", "renamed_val")
metadata.dataframe.head()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,newly_named_batch,new_col
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1,renamed_val
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1,renamed_val


If we want to rename the entries one-by-one, we pass a list:

In [12]:
metadata.rename_values("new_col", ["renamed_val1", "renamed_val2"])
metadata.dataframe.head()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,newly_named_batch,new_col
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1,renamed_val1
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1,renamed_val2


Lastly, we can also pass a dictionary, where the old values are the keys and the values to be renamed to are the values.

In [13]:
metadata.rename_values("new_col", {"renamed_val1": "final_var1",
                                   "renamed_val2": "final_var2"})
metadata.dataframe.head()

Unnamed: 0,sample_ID,file_name,group_fd,internal_id,organ,staining,diag_main,diag_fine,donor_id,material,newly_named_batch,new_col
0,1,3742.fcs,healthy,3742,PB,stained,healthy,healthy,3742,PBMC,1,final_var1
1,2,4337.fcs,healthy,4337,PB,stained,healthy,healthy,4337,PBMC,1,final_var2


## Write metadata to the hard drive

In order to write the metadata table to the hard drive, use the `.write()` method, specifying a file-path with the file name.

In [14]:
metadata.write("../Tutorials/spectral_dataset/vignette_metadata.csv")