# Pipeline Code
Pipeline for inferring ecological interactions between microbes in treehole communities. Explanation of how the different steps of the pipeline work can be found below the pipeline, in the 'Pipeline explanation' section.

Pipeline steps:
1. Importing tree hole data
2. Applying FlashWeave to infer co-occurrence networks
3. Applying functionInk to infer communities from co-occurrence networks
4. Visualising communities using Cytoscape
5. Comparing co-occurrence network motifs

## 1 - Importing tree hole data
Importing the raw data, and converting it to the correct input format for FlashWeave.
The code below is written in the Python kernel.

In [None]:
# Importing pandas
import pandas as pd

# Importing data
asv_table = pd.read_csv('../data/seqtable_readyforanalysis.csv', sep='\t') # Importing ASV table
metadata = pd.read_csv('../data/metadata_Time0D-7D-4M_May2022_wJSDpart_ext.csv', sep='\t') # Importing metadata
taxonomy_data = pd.read_csv('../data/taxa_wsp_readyforanalysis.csv', sep='\t') # Importing taxonomy

##### Cleaning the data, and giving it the correct format. #####

# Getting rid of the samples belonging to experiment 4M:
asv_table.reset_index(inplace=True) # Making the sample ID into a column for the ASV table
asv_table.rename(columns={'index': 'sampleid'}, inplace=True) # renaming this new column to 'sampleid'
asv_table = pd.merge(asv_table, metadata, on='sampleid') # Merging metadata and asv_table by 'sampleid' so that can remove 4M samples
asv_table = asv_table[asv_table['Experiment'] != '4M'] # Taking rows which are not 4M samples

# Splitting the ASV table into starting and final communities
starting_asv_table = asv_table[asv_table['Experiment'] == '0D']
final_asv_table = asv_table[asv_table['Experiment'] != '0D']

# Splitting the metadata and ASV tables
starting_metadata = starting_asv_table[['partition']] # The metadata only contains the column for the community classes
final_metadata = final_asv_table[['partition']]
columns_to_drop = ['sampleid', 'Name.2', 'Community', 'Species', 'replicate',
       'BreakingBag', 'parent', 'Location', 'Experiment', 'Part_Time0D_17',
'Community', 'Species', 'replicate',
       'BreakingBag', 'parent', 'Location', 'Experiment', 'Part_Time0D_17',
       'Part_Time0D_6', 'Part_Time4M_64', 'Part_Time7D_rep1_2',
       'Part_Time7D_rep2_2', 'Part_Time7D_rep3_2', 'Part_Time7D_rep4_2',
       'replicate.partition', 'partition', 'ExpCompact',
       'exp.replicate.partition', 'exp.partition', 'Part_Time0D_6', 'Part_Time4M_64', 'Part_Time7D_rep1_2',
       'Part_Time7D_rep2_2', 'Part_Time7D_rep3_2', 'Part_Time7D_rep4_2',
       'replicate.partition', 'partition', 'ExpCompact',
       'exp.replicate.partition', 'exp.partition'] # Columns to drop for ASV tables
starting_asv_table = starting_asv_table.drop(columns=columns_to_drop) # The ASV tables are now in the correct input format for FlashWeave
final_asv_table = final_asv_table.drop(columns=columns_to_drop)

# Exporting to .tsv files, which can then be inputted to FlashWeave
starting_metadata.to_csv('../data/starting_metadata.tsv', sep='\t', index=False)
final_metadata.to_csv('../data/final_metadata.tsv', sep='\t', index=False)
starting_asv_table.to_csv('../data/starting_asv_table.tsv', sep='\t', index=False)
final_asv_table.to_csv('../data/final_asv_table.tsv', sep='\t', index=False)


## 2 - Applying FlashWeave
Applying heterogeneous FlashWeave to the starting communities and the final communities with classes as a factor, and applying homogeneous FlashWeave to the starting communities and the final communities - ignoring community classes. This creates 4 networks in total. The code below is written in the Julia kernel.starting_netw_results = learn_network(starting_data_pstarting_netw_results = learn_network(starting_data_path, ath, 

In [None]:
##### Applying FlashWeave #####
# Co-occurrence network for starting communities, ignoring classes
using FlashWeave
starting_data_path = "../data/starting_asv_table.tsv"
starting_netw_results = learn_network(starting_data_path, sensitive=true, heterogeneous=false)

# Co-occurrence network for final communities, ignoring classes
using FlashWeave
final_data_path = "../data/final_asv_table.tsv"
final_netw_results = learn_network(final_data_path, sensitive=true, heterogeneous=false)

# Co-occurrence network for starting communities, taking into account classes
using FlashWeave
starting_classes_data_path = "../data/starting_asv_table.tsv"
starting_classes_meta_data_path = "../data/starting_metadata.tsv"
starting_classes_netw_results = learn_network(starting_classes_data_path, starting_classes_meta_data_path, sensitive=true, heterogeneous=true)

# Co-occurrence network for final communities, taking into account classes
using FlashWeave
final_classes_data_path = "../data/final_asv_table.tsv"
final_classes_meta_data_path = "../data/final_metadata.tsv"
final_classes_netw_results = learn_network(final_classes_data_path, final_classes_meta_data_path, sensitive=true, heterogeneous=true)

# Saving the networks
save_network("../data/starting_network_output.edgelist", starting_netw_results)
save_network("../data/starting_classes_network_output.edgelist", starting_classes_netw_results)
save_network("../data/final_network_output.edgelist", final_netw_results)
save_network("../data/final_classes_network_output.edgelist", final_classes_netw_results)

## 3 - Applying functionInk
First, converting the FlashWeave outputs into the correct input format for functionInk. The below code is written in the Python kernel.

In [None]:
##### Converting the FlashWeave outputs into the correct input format for functionInk #####

import pandas as pd # Importing Pandas again, as switched back to Python kernel

# Removing the headers
with open('../data/starting_network_output.edgelist', 'r') as f:
    lines = f.readlines()
with open('../data/starting_network_output.edgelist', 'w') as f:
    f.writelines(lines[2:])

with open('../data/starting_classes_network_output.edgelist', 'r') as f:
    lines = f.readlines()
with open('../data/starting_classes_network_output.edgelist', 'w') as f:
    f.writelines(lines[2:])

with open('../data/final_network_output.edgelist', 'r') as f:
    lines = f.readlines()
with open('../data/final_network_output.edgelist', 'w') as f:
    f.writelines(lines[2:])

with open('../data/final_classes_network_output.edgelist', 'r') as f:
    lines = f.readlines()
with open('../data/final_classes_network_output.edgelist', 'w') as f:
    f.writelines(lines[2:])

# Adding new headers and a column for the type of interaction (here, all assumed to be the same)
starting_network_data = pd.read_csv("../data/starting_network_output.edgelist", sep="\t", header=None, names=["ASV_A", "ASV_B", "Interaction"])
starting_network_data['Type'] = 1

starting_classes_network_data = pd.read_csv("../data/starting_classes_network_output.edgelist", sep="\t", header=None, names=["ASV_A", "ASV_B", "Interaction"])
starting_classes_network_data['Type'] = 1

final_network_data = pd.read_csv("../data/final_network_output.edgelist", sep="\t", header=None, names=["ASV_A", "ASV_B", "Interaction"])
final_network_data['Type'] = 1

final_classes_network_data = pd.read_csv("../data/final_classes_network_output.edgelist", sep="\t", header=None, names=["ASV_A", "ASV_B", "Interaction"])
final_classes_network_data['Type'] = 1

# Outputting as .tsv files
starting_network_data.to_csv('../data/starting_network_data.tsv', sep='\t', index=False, header=['#ASV_A', 'ASV_B', 'Interaction', 'Type'])
starting_classes_network_data.to_csv('../data/starting_classes_network_data.tsv', sep='\t', index=False, header=['#ASV_A', 'ASV_B', 'Interaction', 'Type'])
final_network_data.to_csv('../data/final_network_data.tsv', sep='\t', index=False, header=['#ASV_A', 'ASV_B', 'Interaction', 'Type'])
final_classes_network_data.to_csv('../data/final_classes_network_data.tsv', sep='\t', index=False, header=['#ASV_A', 'ASV_B', 'Interaction', 'Type'])

Applying functionInk to each of the 4 networks. The below code is written in the Python kernel.

In [None]:
##### Applying functionInk #####

import os # Importing os package
os.chdir('../code/functionInk') # Moving from the directory in which this notebook is found into the root of the functionInk repository

# The first step to the pipeline - computing similarities between nodes
!./NodeSimilarity.pl -w 1 -d 0 -t 0 -f ../../data/starting_network_data.tsv
!./NodeSimilarity.pl -w 1 -d 0 -t 0 -f ../../data/starting_classes_network_data.tsv
!./NodeSimilarity.pl -w 1 -d 0 -t 0 -f ../../data/final_network_data.tsv
!./NodeSimilarity.pl -w 1 -d 0 -t 0 -f ../../data/final_classes_network_data.tsv

# The second step - clustering nodes using the similarity metrics calculated
!./NodeLinkage.pl -fn ../../data/starting_network_data.tsv -fs Nodes-Similarities_starting_network_data.tsv
!./NodeLinkage.pl -fn ../../data/starting_classes_network_data.tsv -fs Nodes-Similarities_starting_classes_network_data.tsv
!./NodeLinkage.pl -fn ../../data/final_network_data.tsv -fs Nodes-Similarities_final_network_data.tsv
!./NodeLinkage.pl -fn ../../data/final_classes_network_data.tsv -fs Nodes-Similarities_final_classes_network_data.tsv

Extracting the partition densities. Please switch to the R kernel to run the code below.

In [None]:
##### Sourcing function that extracts partition densities #####
library(ggplot2) # Loading ggplot
source("functionInk/scripts/analysis_R/extractPartDensity.R") # Sourcing the function that extracts the partition densities
setwd("functionInk") # Moving to the functionInk repository

In [None]:
##### Extracting partition densities #####
# Importing the partition histories and cleaning them
hist_comp_starting=read.table(file="HistCompact-NL_Average_NoStop_starting_network_data.tsv")
colnames(hist_comp_starting) <- as.character(unlist(hist_comp_starting[1, ]))
hist_comp_starting <- hist_comp_starting[-1, ]
columns_to_convert <- c("Step", "Similarity", "Density", "DensityInt", "DensityExt", "NumNodesA", "NumEdgesA", 
                       "NumNodesB", "NumEdgesB", "NumNodesAB", "NumEdgesAB", "NumIntNodesA", "NumIntNodesB",
                       "NumExtNodesA", "NumExtNodesB", "NumIntNodesAB", "NumExtNodesAB", "NumIntEdgesA",
                       "NumIntEdgesB", "NumExtEdgesA", "NumExtEdgesB", "NumIntEdgesAB", "NumExtEdgesAB",
                       "NcumInt", "NcumExt", "Ncum")
hist_comp_starting[columns_to_convert] <- lapply(hist_comp_starting[columns_to_convert], as.numeric)

hist_comp_starting_classes=read.table(file="HistCompact-NL_Average_NoStop_starting_classes_network_data.tsv")
colnames(hist_comp_starting_classes) <- as.character(unlist(hist_comp_starting_classes[1, ]))
hist_comp_starting_classes <- hist_comp_starting_classes[-1, ]
hist_comp_starting_classes[columns_to_convert] <- lapply(hist_comp_starting_classes[columns_to_convert], as.numeric)

hist_comp_final=read.table(file="HistCompact-NL_Average_NoStop_final_network_data.tsv")
colnames(hist_comp_final) <- as.character(unlist(hist_comp_final[1, ]))
hist_comp_final <- hist_comp_final[-1, ]
hist_comp_final[columns_to_convert] <- lapply(hist_comp_final[columns_to_convert], as.numeric)

hist_comp_final_classes=read.table(file="HistCompact-NL_Average_NoStop_final_classes_network_data.tsv")
colnames(hist_comp_final_classes) <- as.character(unlist(hist_comp_final_classes[1, ]))
hist_comp_final_classes <- hist_comp_final_classes[-1, ]
hist_comp_final_classes[columns_to_convert] <- lapply(hist_comp_final_classes[columns_to_convert], as.numeric)


# Calculating partition densities, plotting them, and moving the plot into results
print("Starting network partition densities:")
part_density_starting=extractPartDensity(hist.comp=hist_comp_starting, plot = TRUE)
system(paste("mv", "figures/Plot_PartitionDensityVsStep.pdf", "../../results/starting_Plot_PartitionDensityVsStep.pdf"))
print("Step of the clustering in which the maximum of the total partition density was found: ")
part_density_starting$total_dens_step
print("Step of the clustering in which the maximum of the internal partition density was found ")
part_density_starting$int_dens_step
print("Step of the clustering in which the maximum of the external partition density was found: ")
part_density_starting$ext_dens_step

print("Starting classes network partition densities:")
part_density_starting_classes=extractPartDensity(hist.comp=hist_comp_starting_classes, plot = TRUE)
system(paste("mv", "figures/Plot_PartitionDensityVsStep.pdf", "../../results/starting_classes_Plot_PartitionDensityVsStep.pdf"))
print("Step of the clustering in which the maximum of the total partition density was found: ")
part_density_starting_classes$total_dens_step
print("Step of the clustering in which the maximum of the internal partition density was found ")
part_density_starting_classes$int_dens_step
print("Step of the clustering in which the maximum of the external partition density was found: ")
part_density_starting_classes$ext_dens_step

print("Final network partition densities:")
part_density_final=extractPartDensity(hist.comp=hist_comp_final, plot = TRUE)
system(paste("mv", "figures/Plot_PartitionDensityVsStep.pdf", "../../results/final_Plot_PartitionDensityVsStep.pdf"))
print("Step of the clustering in which the maximum of the total partition density was found: ")
part_density_final$total_dens_step
print("Step of the clustering in which the maximum of the internal partition density was found ")
part_density_final$int_dens_step
print("Step of the clustering in which the maximum of the external partition density was found: ")
part_density_final$ext_dens_step

print("Final classes network partition densities:")
part_density_final_classes=extractPartDensity(hist.comp=hist_comp_final_classes, plot = TRUE)
system(paste("mv", "figures/Plot_PartitionDensityVsStep.pdf", "../../results/final_classes_Plot_PartitionDensityVsStep.pdf"))
print("Step of the clustering in which the maximum of the total partition density was found: ")
part_density_final_classes$total_dens_step
print("Step of the clustering in which the maximum of the internal partition density was found ")
part_density_final_classes$int_dens_step
print("Step of the clustering in which the maximum of the external partition density was found: ")
part_density_final_classes$ext_dens_step


Obtaining communities. Please switch to the Python kernel to run the code below.

In [None]:
import os # Importing os again, as switched back to Python kernel
os.chdir('functionInk') # Moving from the directory in which this notebook is found into the root of the functionInk repository

In [None]:
##### Running the clustering until the step at which the total partition density is reached #####
!./NodeLinkage.pl -fn ../../data/starting_network_data.tsv -fs Nodes-Similarities_starting_network_data.tsv -s step -v 857
!./NodeLinkage.pl -fn ../../data/starting_classes_network_data.tsv -fs Nodes-Similarities_starting_classes_network_data.tsv -s step -v 214
!./NodeLinkage.pl -fn ../../data/final_network_data.tsv -fs Nodes-Similarities_final_network_data.tsv -s step -v 861
!./NodeLinkage.pl -fn ../../data/final_classes_network_data.tsv -fs Nodes-Similarities_final_classes_network_data.tsv -s step -v 328

Moving the output files from the functionInk pipeline from the data directory into the results directory.

In [None]:
##### Moving the output files from the functionInk pipeline #####
# Moving the node similarity files to the results folder
source_file = 'Nodes-Similarities_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Nodes-Similarities_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Nodes-Similarities_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Nodes-Similarities_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

# Moving the compact node clustering history files
source_file = 'HistCompact-NL_Average_NoStop_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistCompact-NL_Average_NoStop_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistCompact-NL_Average_NoStop_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistCompact-NL_Average_NoStop_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

# Moving the detailed node clustering history files with the stop
source_file = 'HistExtend-NL_Average_NoStop_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistExtend-NL_Average_NoStop_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistExtend-NL_Average_NoStop_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistExtend-NL_Average_NoStop_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

# Moving the compact node clustering history files with the stop
source_file = 'HistCompact-NL_Average_StopStep-857_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistCompact-NL_Average_StopStep-214_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistCompact-NL_Average_StopStep-861_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistCompact-NL_Average_StopStep-328_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

# Moving the extended node clustering history files with the stop
source_file = 'HistExtend-NL_Average_StopStep-857_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistExtend-NL_Average_StopStep-214_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistExtend-NL_Average_StopStep-861_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'HistExtend-NL_Average_StopStep-328_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

# Moving the cluster description files
source_file = 'Clusters-NL_Average_StopStep-857_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Clusters-NL_Average_StopStep-214_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Clusters-NL_Average_StopStep-861_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Clusters-NL_Average_StopStep-328_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

# Moving the file describing the cluster to which each node belongs
source_file = 'Partition-NL_Average_StopStep-857_starting_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Partition-NL_Average_StopStep-214_starting_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Partition-NL_Average_StopStep-861_final_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

source_file = 'Partition-NL_Average_StopStep-328_final_classes_network_data.tsv'
destination_directory = '../../results'
file_name = os.path.basename(source_file)
destination_path = os.path.join(destination_directory, file_name)
os.rename(source_file, destination_path)

## 4 - Visualisation in Cytoscape

Cytoscape visualisations of the four networks can be found in the results folder.

## 5 - Comparing co-occurrence network structure

There are different methods of comparing co-occurrence network structure:
- Compare whether a given pair of ASVs are found in the same cluster, between networks, for all ASV pairs
- Compare whether a given pair of ASVs has a similar correlation value between networks, for all ASV pairs

#### 5.1 Comparing whether ASVs have similar links between networks
Comparing the strength of links between ASVs between networks. For each network, each pair of ASVs has a correlation value (the correlation between the two ASVs in the pair). Intending to compare whether ASVs are correlated similarly in both networks. The below code is written in R.

In [None]:
# 
# Can alternatively use functionInk's node similarities


#### 5.2 Comparing clusters between networks
Comparing the clusters between the networks. The main way of comparing clusters seems to be by comparing whether a given pair of ASVs are found within the same cluster within each network. For example, if ASV_A and ASV_B are found in the same cluster in both the starting_classes network and the final_classes network, then this is a similarity between the networks. Perhaps later on it will be possible to identify larger motifs - for example, if ASV_A is connected to ASV_B and ASV_B is connected to ASV_B, within both networks, then both networks share this larger motif.

Below, I use Cohen's Kappa to compare whether or not each pair of ASVs are found within the same cluster within each network. First, I construct a table containing information about whether each ASV pair is in the same cluster, for each network
The code below is written in R.

In [None]:

# Data frame that will contain a row for each ASV pair, and a column for each network.
# Each cell will indicate whether or not the ASV pair are found in the same cluster.
all_pair_cluster_data <- data.frame(matrix(nrow = 0, ncol = 5))
colnames(all_pair_cluster_data) <- c("ASV.pair", "starting.same.cluster", "starting.classes.same.cluster",
                                "final.same.cluster", "final.classes.same.cluster")

# Data frame containing a row for each pair and a column for whether or not that pair is found within the same cluster in the starting network.
starting_pair_cluster_data <- data.frame(matrix(nrow = 0, ncol = 5))
colnames(starting_pair_cluster_data) <- c("ASV.pair", "starting.same.cluster")

# Importing the cluster data for the starting network - this indicates which cluster each ASV is in within the starting network.
starting_cluster_data <- read.table("../results/Partition-NL_Average_StopStep-857_starting_network_data.tsv", sep = "\t", header = FALSE)

# Looping across every possible pair of ASVS to determine whether they are in the same class.
for (i in 1:nrow(starting_cluster_data)){
    #print("yo tengo")
    for (j in 1:nrow(starting_cluster_data)){
        if (i != j) {
            if (starting_cluster_data[i, 2] == starting_cluster_data[j, 2]) {
                pair <- paste(starting_cluster_data[i, 1], starting_cluster_data[j, 1], sep = ".")
                new_row <- c(pair, "yes")
                starting_pair_cluster_data <- rbind(starting_pair_cluster_data, new_row)
                print("Chungus")
                }
            else if (starting_cluster_data[i, 2] != starting_cluster_data[j, 2]) {
                pair <- paste(starting_cluster_data[i, 1], starting_cluster_data[j, 1], sep = ".")
                new_row <- c(pair, "no")
                starting_pair_cluster_data <- rbind(starting_pair_cluster_data, new_row)
                print("Nonegus")
                }
            }
        }
    }

starting_pair_cluster_data

Pipeline Explanation
-----

### Data set

The data set includes samples from 275 tree holes. There is 1 initial sample from each tree hole (275 total) as well as 4 replicate samples for each of the tree holes from after about a week (1100 total). The data is in the form of an ASV table, where the counts of ASVs are listed. The samples each represent a community, and they have been grouped into community classe within both the starting samples and each of the replicate sets of the final samples, based upon beta diversity dissimilarity.

### What is FlashWeave?
FlashWeave is considered to be the gold standard for inferring co-occurrences between microbial ASVs. It has shown an improved accuracy and performance upon synthetic data, when compared to other commonly used methods such as SparCC and SpiecEasi.

### How does FlashWeave work?
For each ASV, FlashWeave identifies its directly associated neighbouring ASVs.















