In [None]:
#ignore
%pylab inline
%load_ext rpy2.ipython
%config InlineBackend.figure_format='retina'
from SuchTree import SuchTree, SuchLinkedTrees
import pandas
import seaborn
import matplotlib

from rpy2 import robjects

# tell R to be quiet
import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings( 'ignore', category=RRuntimeWarning )
robjects.r( 'options( warn = -1 )' )
robjects.r( 'sink( "/dev/null" )' )

# load libraries into the R global context
robjects.r( 'library("phytools")' )
robjects.r( 'library("igraph")' )

In [5]:
%%R
#ignore

tr1 <- read.tree( "../../SuchTree/data/gopher-louse/gopher.tree" )
tr2 <- read.tree( "../../SuchTree/data/gopher-louse/lice.tree" )
links <- read.csv( "../../SuchTree/data/gopher-louse/links.csv", row.names=1, stringsAsFactors = F )
im <- graph_from_incidence_matrix( as.matrix( links ) )
assoc <- as_edgelist( im )
obj <- cophylo( tr1, tr2, assoc=assoc )
svg( "figures/gopher_louse_cophylo.svg", width = 4, height = 4 )
plot( obj )
dev.off()

In [None]:
#ignore
gl_host  = SuchTree( '../../SuchTree/data/gopher-louse/gopher.tree' )
gl_guest = SuchTree( '../../SuchTree/data/gopher-louse/lice.tree' )
gl_links = pandas.read_csv( '../../SuchTree/data/gopher-louse/links.csv', index_col=0 )

gl_SLT = SuchLinkedTrees( gl_host, gl_guest, gl_links )

gl_ld = gl_SLT.linked_distances()

seaborn.jointplot( gl_ld['TreeA'], gl_ld['TreeB'], stat_func=None, kind='reg' )
xlabel( 'Gopher' )
ylabel( 'Louse' )
xticks( [ 0.0, 0.1, 0.2, 0.3 ] )
yticks( [ 0.0, 0.1, 0.2, 0.3 ] )
xlim( -0.05, 0.32 )
ylim( -0.05, 0.32 )

tight_layout()
savefig( 'figures/gopher_louse_correlation.svg', size=(4,4) )

## What microbial ecologists can learn from parasites

Biology has a major problem with charisma. Humans relate to some organisms more easily than others. Fuzzy creatures get more attention than scaly or slimy ones. Creatures with faces get more attention than creatures without. Animals get more attention than plans. By abundance, diversity, age or metabolic wattage, eukaryotes make up a small fraction of life on Earth, but they occupy almost all of our attention. The same phenomenon happens in ecology.

Ecology is the study of how organisms interact with one another and with their environment. Some relationships are more charismatic than others, and those relationships dominate our attention. There are trophic strategies in myriad diversity, but predator-prey interactions make the best television. Parasites give us the creeps as much as they fascinate us. Mutualisms speak powerfully to aspirations and anxieties regarding our own societies. Disease has shaped and reshaped nations, and drives a large part of the moral imperative behind biological research. Nevertheless, the great majority of ecological relationships are none of the above, or cannot neatly fit into any one category.

Charisma is one of many heuristics that help us identify things that are likely to be relevant to our own experience. Heuristics are useful, but usefulness should not be mistaken for accuracy. The fact that charisma cannot be separated from the observer means that it generalizes poorly. The fact that charisma is a heuristic measure of importance means that it can still be wrong more often than right. It is a form of bias, and distorts the model of reality we use to understand how things work.

Biologists address charisma bias by applying other metrics for importance, and often look to ecology for parameters to include in these metrics. Importance is contextual, after all, and ecology is the study of the relationships that comprise the context in which organisms live. How, then, can ecology correct for its own charisma bias? Relationships can be categorized by their effects or their dynamics, but unless they exhibit a charismatic property -- often a symmetry in structure, a simplicity of concept or an analogy to human experience -- they can defy categorization or escape notice altogether. The symmetry of the Red Queen's Race, the straightforwardness of a predator's relationship with its prey, or the (projected) virtue of cooperation are conspicuous because they are unusual. Most relationships are too complicated or ambiguous to clearly exhibit them. They are not charismatic.

<style>
.rendered_html td.p { text-align: left !important; }
</style>
<table>
<tr>
    <td> <img src="figures/gopher_louse_cophylo.svg" width=280x280> </td>
    <td> <img src="figures/gopher_louse_correlation.svg" width=280x280> </td>
</tr>
<tr>
    <td colspan="2">
        <p><b>Figure 1 :</b> The relationship between pocket gophers and their chewing lice parasites has served as a benchmark case in the literature on coevolution since its appearance in Hafner <i>et al.</i> [Hafner 1994]. However, despite the strong case for coevolution from multiple lines of evidence, the agreement between the two trees (as measured by the correlation of pairwise patristic distances through the two trees [Hommola 2009]) is modest, with a Pierson's $r$ of 0.49. If one were to exclude the relationships between the outgroups as outliers, the correlation would collapse. Without other forms of evidence, the detection of such relationships is challenging.</p>
    </td>
</tr>
</table>

This is especially evident in microbial interactions, where high-throughput sequencing has made it possible to generate spectacular quantities of complex, nuanced and mostly inconclusive data. This is frustrating, but the scale of the problem represents a unique opportunity to address charisma bias in ecology.

Correcting a bias always begins with the same task : find a way to collect data in a way this isn't subject to the bias. Microbiome surveys are vulnerable to a variety of technical biases (there is _always_ another bias), but sequencing machines are not impressed by charisma. Based on this data, it is possible to construct and test new metrics of importance, and to see which relationships merit further attention.

Unfortunately, we do not have very many theoretical tools that address the question of how to categorize ecological relationship data gathered using an non-targeted, unsupervised process.

Fortunately, ecology is not the only field that cares about the problem of detecting and assigning categories to complex relationships from non-targeted data. Vast resources have been devoted to this problem when it appears in the form of social networks of human beings. Technology companies want to know how to detect and identify categories within social networks so that they can sell more advertisements, and pursuit of this capability steers R&D budgets in the billions of dollars. If meaningful representations of microbiome data can be constructed in the same mathematical framework, we can divert some of those resources into our own projects. We can parasitize the tech companies.

A successful parasite must have a strategy for invading its host. If we want to abscond with their shiny computational tools, we need a way to inject our data into them. In the next post, I will lay out my strategy for injecting microbiome data into off-the-shelf machine learning frameworks, and in the subsequent posts, I will try to give you a taste of why this is awesome.

But, before we get into that, it needs to be said that there are downsides to these tools. Particularly, the intermediate steps in machine learning can be difficult to interpret intuitively, and machine learning stacks tend to both reflect and obfuscate the biases of their designers. Nevertheless, if the goal is to address charisma bias, placing {\em all} of the available data into the same context is a good start. Machine learning does not eliminate bias; it formalizes it. This can cut two ways. If used uncritically, machine learning can atomize and dissolve the designer's bias into the mathematics in ways that can be very harmful. Or, it can provide a framework within which one can select, quantify, explore and hopefully understand one's own biases.

Always compute responsibly.



### Further reading :

* [**Role of the Gut Microbiome in Vertebrate Evolution**](http://msystems.asm.org/content/3/2/e00174-17), by Thomas J. Sharpton (doi:10.1128/mSystems.00174-17).
* [**The microbiome beyond the horizon of ecological and evolutionary theory**](https://www.nature.com/articles/s41559-017-0340-2), by Britt Koskella, Lindsay J. Hall and C. Jessica E. Metcalf (doi:10.1038/s41559-017-0340-2)
* [**Testing the Context and Extent of Host-Parasite Coevolution**](https://academic.oup.com/sysbio/article-abstract/28/3/299/1651575), by Daniel R. Brooks (doi:10.1093/sysbio/28.3.299)
* [**Disparate rates of molecular evolution in cospeciating hosts and parasites**](http://science.sciencemag.org/content/265/5175/1087), by Hafner MS, Sudman PD, Villablanca FX, Spradling TA, Demastes JW, Nadler SA (doi:10.1126/science.8066445)