# Examine the effects of geography and divergence on proportion of variants shared among samples 

June 24, 2020 

We would like to determine whether samples collected in the same geographic area share more variants than expected by chance alone. In addition to the permutation test, I would also like to perform some sort of metric for whether being close together on the tree also predicts having shared variation. There are probably a few different ways to do this: 

1. Compare some sort of raw measure of sequence divergence, like hamming distance (number of differences/length of sequence)
2. Compare the branch length of the path between the 2 sequences. 
3. Compare the tmrca, where more divergent sequences will have older tmrcas.

All 3 of these could be proxies for how close together sequences are on the tree. It would be good to test this out using the Wisconsin-only build as well as the Wisconsin-focused build with other sequences in there for context.

In [17]:
import imp
import importlib, json
import glob
import re,copy,json
import Bio.Phylo
import requests
import pandas as pd 
import numpy as np

import copy
from scipy.special import binom
import datetime as dt
    
import rpy2
%load_ext rpy2.ipython

# for this to work, you will need to download the most recent version of baltic, available here 
bt = imp.load_source('baltic', '/Users/lmoncla/src/baltic/baltic/baltic.py')

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [18]:
def compute_shared_variant_proportion(sample1,sample2,df):
    shared_variants = 0
    
    s1_df = df[df['strain_name'] == sample1]
    variants_in_s1 = set(s1_df['minor_nuc_muts'].tolist())
    
    s2_df = df[df['strain_name'] == sample2]
    variants_in_s2 = set(s2_df['minor_nuc_muts'].tolist())
    
    total_variants = len(variants_in_s1) + len(variants_in_s2)
    
    for v in variants_in_s1:
        if v in variants_in_s2:
            shared_variants += 2
            
    proportion_shared = float(shared_variants/total_variants)
            
    return(proportion_shared)

## Read in VCF data and output SNVs to query into a dataframe

In [19]:
"""to load in an ipython notebook as a module, just run the following. You will now have access to all of the 
functions written in that jupyter notebook"""

%run vcf-module.ipynb

In [20]:
"""now, input the strain names file/metadata file, the directory containing the vcfs, and return the dataframess"""

strain_names_file = "/Users/lmoncla/src/ncov-WI-within-host/data/sample-metadata.tsv"
fasta_file = "/Users/lmoncla/src/ncov-WI-within-host/data/sample-avrl.fasta"
clades_file = "/Users/lmoncla/src/ncov-WI-within-host/data/clades-file-2020-08-28.txt"
vcf_directory = "/Users/lmoncla/src/ncov-WI-within-host/data/vcfs-all/"
samples_to_ignore = ["N_transcript"]

snvs_only, indels_only, all_intersection_variants,metadata_dict = return_dataframes(strain_names_file, clades_file,vcf_directory,samples_to_ignore,fasta_file)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexin

In [21]:
snvs_to_query = set(snvs_only['minor_nuc_muts'])
indels_to_query = set(indels_only['minor_nuc_muts'])
all_variants_to_query = snvs_to_query.copy()
all_variants_to_query.update(indels_to_query)
print(len(snvs_to_query))
print(len(indels_to_query))
print(len(all_variants_to_query))

376
124
500


## Code for parsing through tree

In [22]:
"""This is a small, recursive function to return the TMRCA for 2 tips. Starting with the parental node of tip1,
go recursively backwards in the tree until you find an internal node whose children contains both tip1 and 
tip2. Return that node."""

def return_TMRCA_node(input_node,tip1,tip2):
    
    # for a given internal node, generate a list of all its children, i.e., tips descending from that node
    node = input_node
    children = list(node.children)   # .children will output all of the direct descendants as baltic objects
    leaves = list(node.leaves)       # .leaves will output the names of all tips descending from the node
    
    if tip2 in leaves and tip1 in leaves: 
        node_to_return = node
    else:
        node_to_return = return_TMRCA_node(node.parent,tip1,tip2)
            
    return(node_to_return)

In [23]:
"""given 2 tips and a tree, iterate through the tree. when we reach tip 1, run return_TMRCA_node, to find the 
internal node that is the TMRCA for tips 1 and 2. Extract its date and return the node object and date"""

def return_TMRCA(tip1,tip2,tree):
    for k in tree.Objects: 
        if k.branchType == "leaf" and k.name == tip1:
            tmrca_node = return_TMRCA_node(k.parent,tip1,tip2)
            date = tmrca_node.traits['node_attrs']['num_date']['value']  # output the mean inferred date
            
    return(tmrca_node, date)

In [24]:
"""Given a starting internal node, and a tip you would like to end at, traverse the full path from that node to
tip. Along the way, gather nucleotide mutations that occur along that path. Once you have reached the ending 
tip, return the list of mutations that fell along that path"""

def return_divergence_on_path_to_tip(starting_node, ending_tip):
    
    children = starting_node.children
    
    for child in children:
        
        """if the child is a leaf: if leaf is the target end tip, collect its divergence and return; 
        if leaf is not the target end tip, move on"""
        """if the child is an internal node: first, test whether that child node contains the target tips in its 
        children. child.leaves will output a list of the names of all tips descending from that node. If not, pass. 
        if the node does contain the target end tip in its leaves, keep traversing down that node recursively"""

        if child.branchType == "leaf":
            if child.name != ending_tip:
                pass
            elif child.name == ending_tip:
                child_divergence = child.traits['node_attrs']['div']
                return(child_divergence)
         
        elif child.branchType == "node":
            if ending_tip not in child.leaves:
                pass
            else:
                child_divergence = return_divergence_on_path_to_tip(child, ending_tip)
    
    return(child_divergence)

In [25]:
def return_clade(tipname, tree):
    for k in tree.Objects:
        if k.branchType == "leaf" and k.name == tipname:
            clade = k.traits['node_attrs']['clade_membership']['value']
    return(clade)

In [26]:
def return_all_Wisconsin_tips(tree):
    Wisconsin_leaves = []
    
    for k in tree.Objects: 
        if k.branchType == "leaf":
            division = k.traits['node_attrs']['division']['value']
            if division == "Wisconsin":
                Wisconsin_leaves.append(k.name)
                
    return(Wisconsin_leaves)

In [27]:
def return_mean_Ct(metadata,tip):
    Cts = []
    
    Ct1 = metadata[tip]['Ct1']
    Ct2 = metadata[tip]['Ct2']
    
    if Ct1 not in ["","-"]:
        Cts.append(Ct1)
    if Ct2 not in ["","-"]:
        Cts.append(Ct2)
    
    # now find mean 
    Ct_sum = 0
    for c in Cts: 
        Ct_sum += float(c)
        
    if len(Cts) > 0:
        mean = float(Ct_sum)/len(Cts)
    else: 
        mean = 'NaN'
    
    return(mean)

In [28]:
def compare_Cts(tip1,tip2,metadata):
    tip1_Ct = return_mean_Ct(metadata,tip1)
    tip2_Ct = return_mean_Ct(metadata,tip2)
    
    if tip1_Ct != 'NaN' and tip2_Ct != 'NaN':
        difference = abs(tip1_Ct-tip2_Ct)
    else:
        difference = 'NaN'
    return(difference)

In [29]:
def compare_location(tip1,tip2,metadata):
    geo1 = metadata[tip1]["location"]
    geo2 = metadata[tip2]["location"]
    
    if geo1 == geo2: 
        location = 1
    else: 
        location = 0
        
    return(location,geo1,geo2)

In [30]:
def return_lat_longs_dictionary(lat_longs_file):
    
    output_dict = {}
    
    with open(lat_longs_file, "r") as infile: 
        for line in infile:
            if len(line.split("\t")) == 4:
                location = line.split("\t")[1]
                longitude = line.split("\t")[2]
                latitude = line.split("\t")[3].strip()
                output_dict[location] = {"latitude":latitude, "longitude":longitude}
    return(output_dict)

In [31]:
"""I took this from here, https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points
A decent overview of this formula can be found here: https://www.movable-type.co.uk/scripts/latlong.html"""
from math import radians, cos, sin, asin, sqrt

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees). 
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

In [32]:
def return_distance_between_locations(tip1,tip2,metadata,lat_longs):
    geo1 = metadata[tip1]["location"]
    geo2 = metadata[tip2]["location"]
    
    if geo1 != "" and geo2 != "":
        lat1 = lat_longs_dict[geo1]['latitude']
        lat2 = lat_longs_dict[geo2]['latitude']

        long1 = lat_longs_dict[geo1]['longitude']
        long2 = lat_longs_dict[geo2]['longitude']

        distance_km = haversine(float(long1), float(lat1), float(long2), float(lat2))
    else:
        distance_km = 'NaN'
    return(distance_km)

# run!

In [34]:
# test this out first on the Wisconsin-only build json
WI_tree_path = "/Users/lmoncla/src/ncov-WI-within-host/data/ncov_wisconsin_global-context.json"
#WI_tree_path = "/Volumes/gradschool-and-postdoc-backups/post-doc/stored_files_too_big_for_laptop/ncov-build-forced-WI/ncov/auspice/ncov_usa_wisconsin.json"

analysis_level = "division"

with open(WI_tree_path) as json_file:
    WI_tree_json = json.load(json_file)
WI_tree_object=WI_tree_json['tree']
WI_meta=WI_tree_json['meta']
json_translation={'absoluteTime':lambda k: k.traits['node_attrs']['num_date']['value'],'name':'name'} ## allows baltic to find correct attributes in JSON, height and name are required at a minimum
json_meta={'file':WI_meta,'traitName':analysis_level} ## if you want auspice stylings you can import the meta file used on nextstrain.org

WI_tree=bt.loadJSON(WI_tree_object,json_translation,json_meta)


Tree height: 0.632565
Tree length: 418.027815
multitype tree
annotations present

Numbers of objects in tree: 15126 (7176 nodes and 7950 leaves)



The putative transmission pairs are: 
1. USA/WI-UW-65/2020 and USA/WI-UW-32/2020  (2 days apart)
2. USA/WI-UW-41/2020 and USA/WI-UW-48/2020 (same day)
3. USA/WI-UW-74/2020 and USA/WI-UW-29/2020 (4 days apart)
4. USA/WI-UW-120/2020 and USA/WI-UW-119/2020 (3 days apart)
5. USA/WI-UW-333/2020 and USA/WI-UW-334/2020 (same day)
6. USA/WI-UW-337/2020 and USA/WI-UW-338/2020 (same day) -> I am actually excluding these, because they do not have the same consensus sequence; they are 3 mutations different, which seems pretty unlikely
7. USA/WI-UW-158/2020 and USA/WI-UW-159/2020 and USA/WI-UW-160/2020  (these are all in the same household, all collected on the same day) -> I am also going to call 158 and 160 true transmission pairs because they have the same consensus sequence, while 159 has 2 additional mutations (158 and 160 isolated same day). 159 does look like it be descendant from 158 and 160, but becuase it has 2 mutations different tthey were all collected on the same day, it seems a little weird.

pairs 5 and 6 are from the Milwaukee area, whereas the rest are more from the Madison area

In [35]:
transmission_pairs = [["USA/WI-UW-65/2020","USA/WI-UW-32/2020"],["USA/WI-UW-41/2020","USA/WI-UW-48/2020"],
                      ["USA/WI-UW-74/2020","USA/WI-UW-29/2020"],["USA/WI-UW-120/2020","USA/WI-UW-119/2020"],
                     ["USA/WI-UW-333/2020","USA/WI-UW-334/2020"],["USA/WI-UW-158/2020","USA/WI-UW-160/2020"]]

In [36]:
Wisconsin_tips_in_tree = return_all_Wisconsin_tips(WI_tree)
print(len(Wisconsin_tips_in_tree))

1096


In [37]:
tips_to_query = set(all_intersection_variants['strain_name'].tolist())

not_in_tree = []
for t in tips_to_query:
    if t not in Wisconsin_tips_in_tree:
        print(t)
        not_in_tree.append(t)
        
print(len(tips_to_query))

USA/DO NOT UPLOAD - time series sample/2020
USA/DO NOT UPLOAD/2020
103


In [38]:
for n in not_in_tree:
    tips_to_query.remove(n)
print(len(tips_to_query))

101


In [39]:
# read in metadata and latitude and longitude files
metadata_input_file = "/Users/lmoncla/src/ncov-WI-within-host/data/sample-metadata.tsv"
metadata_dict = return_metadata_dict(metadata_input_file, clades_file)
lat_longs_dict = return_lat_longs_dictionary("/Users/lmoncla/src/ncov/defaults/lat_longs.tsv")
wh_df_to_use = snvs_only
tree = WI_tree

In [43]:
df = pd.DataFrame()

combos = []
for t in tips_to_query: 
    tip1 = t
    
    for a in tips_to_query:
        tip2 = a
        combo = set([tip1,tip2])
        
        if tip1 != tip2 and combo not in combos:   # to prevent doing the pairwise comparisons twice
            
            # output Cts
            Ct_diff = compare_Cts(tip1,tip2,metadata_dict)
            
            # are the locations the same? 0 means no, 1 means yes
            location,loc1,loc2 = compare_location(tip1,tip2,metadata_dict)
            
            # output great circle distance between locations
            great_circle_distance = return_distance_between_locations(tip1,tip2,metadata_dict,lat_longs_dict)

            # output their clades
            tip1_clade = return_clade(tip1, tree)
            tip2_clade = return_clade(tip2, tree)
            if tip1_clade == tip2_clade:
                clades_same = 1
            else:
                clades_same = 0

            # output the tmrca and divergence
            parental_node,tmrca_date = return_TMRCA(tip1,tip2,tree)
            parent_divergence = parental_node.traits['node_attrs']['div']

            tip1_divergence = return_divergence_on_path_to_tip(parental_node, tip1)
            tip2_divergence = return_divergence_on_path_to_tip(parental_node, tip2)

            node_to_tip1 = tip1_divergence - parent_divergence
            node_to_tip2 = tip2_divergence - parent_divergence
            total_divergence = node_to_tip1 + node_to_tip2

            # calculate the proportion of variants shared
            shared_proportion_snvs = compute_shared_variant_proportion(tip1,tip2,wh_df_to_use)

            d = pd.DataFrame.from_dict({"tip1":[tip1],"tip2":[tip2],"tmrca":[tmrca_date],"clades_same":[clades_same],
                                        "divergence":[total_divergence],"prop_snvs_shared":[shared_proportion_snvs],
                                       "Ct_diff":[Ct_diff], "location1":[loc1],"location2":[loc2],
                                       "great_circle_distance_km":[great_circle_distance],"location_same":[location],})

            df = df.append(d)
            combos.append(combo)

In [44]:
df.head()
len(df)

5050

In [45]:
# write to csv so I can use it in R 
df.to_csv("../data/WI-variants-vs-geo-2020-09-09.csv")

## First, evaluate divergence 

I will first model shared diversity as a function of divergence. I will then model it as a combination of divergence, Ct differences and having the same clade

In [60]:
# evaluate the proportion of variants shared as a function of divergence
%R -i df
%R model.div = glm(prop_snvs_shared~divergence,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.div))

  res = PandasDataFrame.from_items(items)



Call:
glm(formula = prop_snvs_shared ~ divergence, family = gaussian(link = "identity"), 
    data = df, weights = na.action(na.omit))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.42374  -0.09296  -0.00088   0.08480   0.71872  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.423743   0.004360   97.19   <2e-16 ***
divergence  -0.020665   0.000372  -55.55   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.01898181)

    Null deviance: 143.286  on 4464  degrees of freedom
Residual deviance:  84.716  on 4463  degrees of freedom
AIC: -5025.4

Number of Fisher Scoring iterations: 2



In [61]:
# evaluate the proportion of variants shared as a function of great circle distance
%R -i df
%R model.geo = glm(prop_snvs_shared~great_circle_distance_km,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.geo))


Error in eval(predvars, data, env) : 
  object 'great_circle_distance_km' not found

Error in summary(model.geo) : object 'model.geo' not found


  object 'great_circle_distance_km' not found




In [100]:
# evaluate the proportion of variants shared as a function of Ct difference
%R -i df
%R model.div2 = glm(prop_snvs_shared~Ct_diff,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.div2))
%R print(anova(model.div2, test="Chisq"))


Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Error in summary(model.div2) : object 'model.div2' not found

Error in anova(model.div2, test = "Chisq") : 
  object 'model.div2' not found


  contrasts can be applied only to factors with 2 or more levels



In [44]:
# lastly, try with clade same
%R -i df
%R model.div2 = glm(prop_snvs_shared~clades_same,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.div2))
%R print(anova(model.div2, test="Chisq"))


Call:
glm(formula = prop_snvs_shared ~ clades_same, family = gaussian(link = "identity"), 
    data = df, weights = na.action(na.omit))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.20805  -0.17344  -0.01959   0.12656   0.62656  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.208053   0.005015  41.489  < 2e-16 ***
clades_same -0.034615   0.005846  -5.921 3.44e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.02967363)

    Null deviance: 133.47  on 4464  degrees of freedom
Residual deviance: 132.43  on 4463  degrees of freedom
AIC: -3030.5

Number of Fisher Scoring iterations: 2



Analysis of Deviance Table

Model: gaussian, link: identity

Response: prop_snvs_shared

Terms added sequentially (first to last)


            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                         4464     133.47              
clades_same  1   1.0402      4463     132.43 3.203e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


## now try all together

In [102]:
# evaluate the proportion of variants shared as a linear combination of divergence and whether the clade is the same
%R -i df
%R model.marsh = glm(prop_snvs_shared~divergence+clades_same+Ct_diff+great_circle_distance_km,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.marsh))
%R print(coef(model.marsh))
%R print(anova(model.marsh, test="Chisq"))


Error in eval(predvars, data, env) : 
  object 'great_circle_distance_km' not found

Error in summary(model.marsh) : object 'model.marsh' not found

Error in coef(model.marsh) : object 'model.marsh' not found

Error in anova(model.marsh, test = "Chisq") : 
  object 'model.marsh' not found




  object 'model.marsh' not found



In [508]:
# evaluate the proportion of variants shared as a linear combination of divergence and whether the clade is the same
%R -i df
%R model.marsh = glm(prop_snvs_shared~divergence+clades_same+Ct_diff,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.marsh))
%R print(coef(model.marsh))
%R print(anova(model.marsh, test="Chisq"))


Call:
glm(formula = prop_snvs_shared ~ divergence + clades_same + Ct_diff, 
    family = gaussian(link = "identity"), data = df, weights = na.action(na.omit))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.19530  -0.08062  -0.02077   0.08819   0.28793  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.569131   0.023748  23.965   <2e-16 ***
divergence  -0.003896   0.002032  -1.917   0.0571 .  
clades_same  0.006918   0.022554   0.307   0.7595    
Ct_diff      0.003609   0.003121   1.156   0.2494    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.01235454)

    Null deviance: 1.9210  on 152  degrees of freedom
Residual deviance: 1.8408  on 149  degrees of freedom
AIC: -232.1

Number of Fisher Scoring iterations: 2



 (Intercept)   divergence  clades_same      Ct_diff 
 0.569130801 -0.003895600  0.006918280  0.003608546 


Analysis of Deviance Table

Model: gaussian, link: identity

Response: prop_snvs_shared

Terms added sequentially (first to last)


            Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
NULL                          152     1.9210           
divergence   1 0.062623       151     1.8584  0.02436 *
clades_same  1 0.001026       150     1.8573  0.77322  
Ct_diff      1 0.016516       149     1.8408  0.24759  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [509]:
# evaluate the proportion of variants shared as a linear combination of divergence and whether the clade is the same
%R -i df
%R model.marsh = glm(prop_snvs_shared~great_circle_distance_km+clades_same+Ct_diff,data=df,family = gaussian(link="identity"),na.action(na.omit))
%R print(summary(model.marsh))
%R print(coef(model.marsh))
%R print(anova(model.marsh, test="Chisq"))


Call:
glm(formula = prop_snvs_shared ~ great_circle_distance_km + clades_same + 
    Ct_diff, family = gaussian(link = "identity"), data = df, 
    weights = na.action(na.omit))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.19447  -0.08212  -0.01615   0.07871   0.27799  

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               0.5659632  0.0231075  24.493   <2e-16 ***
great_circle_distance_km -0.0010747  0.0005681  -1.892   0.0605 .  
clades_same              -0.0044782  0.0200395  -0.223   0.8235    
Ct_diff                   0.0036859  0.0031183   1.182   0.2391    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.01236245)

    Null deviance: 1.921  on 152  degrees of freedom
Residual deviance: 1.842  on 149  degrees of freedom
AIC: -232

Number of Fisher Scoring iterations: 2



             (Intercept) great_circle_distance_km              clades_same 
             0.565963186             -0.001074677             -0.004478218 
                 Ct_diff 
             0.003685932 


Analysis of Deviance Table

Model: gaussian, link: identity

Response: prop_snvs_shared

Terms added sequentially (first to last)


                         Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
NULL                                       152     1.9210           
great_circle_distance_km  1 0.060703       151     1.8603   0.0267 *
clades_same               1 0.001011       150     1.8593   0.7749  
Ct_diff                   1 0.017272       149     1.8420   0.2372  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
