# Tutorial on the MultiPhenotypeObject Functionalities

In [1]:
import os
import sys
import numpy as np
import pandas as pd

dir = os.path.abspath('../')
if not dir in sys.path: sys.path.append(dir)

from snputils.phenotype.io.read import MultiPhenTabularReader

### 1. Read a TSV/MAP File into a MultiPhenotypeObject

Load a phenotype file (e.g., a TSV/MAP file) into a MultiPhenotypeObject, which stores phenotype data in a structured DataFrame.

In [2]:
# Path to the phenotype file
path = '../data/samples_pops.tsv'

# Read the file into a MultiPhenotypeObject with specified delimiter, no header, and a phenotype name
phenobj = MultiPhenTabularReader(path).read(sep='\t', header=None, phen_names=['ancestry'])

# Display the DataFrame containing phenotype data
phenobj.phen_df

Unnamed: 0,samples,ancestry
0,HG00096,EUR
1,HG00097,EUR
2,HG00099,AFR
3,HG00100,AFR


### 2. Filter MultiPhenotypeObject by Samples

The `filter_samples()` method allows you to filter the phenotype data by sample names or sample indexes. You can include or exclude specific samples based on your criteria.

#### 2.1. Filter by Sample Names

Include specific samples by their names.

In [3]:
phenobj.filter_samples(samples=['HG00096', 'HG00097']).phen_df

Unnamed: 0,samples,ancestry
0,HG00096,EUR
1,HG00097,EUR


#### 2.2. Filter by Sample Indexes

Exclude specific samples by their indexes in the data.

In [4]:
filtered_phen_df_exclude = phenobj.filter_samples(indexes=[0, 3], include=False).phen_df
filtered_phen_df_exclude

Unnamed: 0,samples,ancestry
0,HG00097,EUR
1,HG00099,AFR
