## Add callable attribute
Because attributes are just Python objects, one can add functions as attributes to make it syntactically simple to ask complex questions.

In this example, we're going to read in a proteome and define some lambda functions which we assign to the proteome as an attribute. This means the proteome actually gets a new function which we can pass variables to! 



In [1]:
import numpy as np
from shephard.apis import uniprot

In [2]:
# read in the proteome
# define file 
in_file = '../shprd_data/human_proteome_validated.fasta'

# Build proteome 
proteome = uniprot.uniprot_fasta_to_proteome(in_file)


In [3]:
# determine the length of all proteins in human proteome
protein_lengths = []

# for each protein in the human proteome
for p in proteome:
    
    # append the protein's length to the protein_lengths list
    protein_lengths.append(len(p))

# note we need this to be an array for a later function
protein_lengths = np.array(protein_lengths)
    

In [4]:
# define some custom functions related to proteome
# lambda functions are a conveinent way of defining simple functions inline
# the general syntax is
#
# lambda <variable> : <code that uses <variable>>


# here we define a lambda function which takes a percentile (i.e. a value between 0 and 100)
# and returns the length of the protein in the human proteome that correspond to this 
# specific percentile
proteome.add_attribute('get_length_from_percentile', lambda a: int(np.percentile(protein_lengths, a)))

# here we do the opposite as above - define a function that takes a protein length and returns
# the percentile that length corresponds to (i.e. a value between 0 and 100)
proteome.add_attribute('get_percentile_from_length', lambda a: 100*(sum(protein_lengths < a) / len(protein_lengths)))


In [5]:
# now lets say we are doing analyis and want to know percentile a given sequence falls at 
length_example = 100
p_out = proteome.attribute('get_percentile_from_length')(length_example)
print("Seq. Length of %i in the %.1f percentile" % (length_example, p_out))

# here we can ask what is the lenght threshold a x percenctile in the data
percentile_example = 20
l_out = proteome.attribute('get_length_from_percentile')(percentile_example)
print("Seq. Length threshold for the %ith percentile: %i" % (percentile_example, l_out))

Seq. Length of 100 in the 3.6 percentile
Seq. Length threshold for the 20th percentile: 214
