# Testing Inference Functions

The goal of this notebook is to test out functions in the inference.py file. These functions are all related to network inference -- that is, the challenge of trying to infer the structure of a gene regulatory network (GRN) from gene expression data.

In case you don't like working with Jupyter notebooks (like me), this code is also available in the testing_inference.py file. The only reason I'm making a Jupyter notebook out of it is because I think it makes it easier to explain the tests.

In [5]:
import numpy as np
import inference

Let's start off by making some fake data to work with. We'll generate some random numbers that will represent data for 20 genes, across 100 samples.

Importantly, the data matrix will be in the shape [genes,samples]. All expression matrices in this project will take that shape.

In [6]:
num_genes = 20
num_samps = 100


genes = []
samps = []


for i in range(1,num_genes+1):
    genes.append(f"G{i}")

for i in range(1,num_samps+1):
    samps.append(f"sample{i}")


## expression data, rows are genes and columns are samples
data = np.random.rand(num_genes,num_samps)

### Inference Methods: General Format

Each inference function takes in the data matrix with shape [genes,samples], and returns a matrix with shape [genes,genes]. In this output matrix, which I'll call "edges", the number at index [i,j] corresponds to the relationship between genes[i] and genes[j].

For example, in the code below, we pass our data into the correlation() function, and get an output matrix called "edges". If we want to know the correlation between the i-th gene and the j-th gene in the "genes" list, we can find this at edges[i,j].

This general setup is going to be true for every inference method in this project. You will always have the option to simply call say: edges = inference.example_function(data), and this will yield an edges matrix.

Any additional options for these functions will be included as parameters that have some default value. For example, the correlation function has a default parameter signed=1. If you prefer to get an unsigned correlation matrix, you can set this parameter to signed=0.

In [14]:
edges = inference.correlation(data)
# print(edges)
print(edges.shape)

edges = inference.correlation(data,signed=0)
# print(edges)
print(edges.shape)

[edges,p_values] = inference.correlation(data,get_p_value=True)
# print(p_values)
print(p_values.shape)

(20, 20)
(20, 20)
(20, 20)


### Methods List

Right now, the current methods are available:

Pearson correlation -- correlation()

Linear regression -- linear_regression()

Example code for each is given in the following cell.



In [15]:
## CORRELATION
edges = inference.correlation(data)
# print(edges)
print(edges.shape)


## LINEAR REGRESSION
edges = inference.linear_regression(data)
# print(edges)
print(edges.shape)

(20, 20)
(20, 20)


If there are any other particular methods you'd like to see, let me know and I'll try to add them!