## Population Structure
### MDS
To construct a MDS for fairy tern populations, we first ran ANGSD as below. Notably, here we add the `-doGeno 8` flag.  

In [None]:
angsd -P 26 -b GLOBAL.list -ref $REF -anc $ANC -out ${ANGSD}structure_MDS/GLOBAL \
    -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 0 -C 50 -baq 1 \
    -minMapQ 20 -minQ 20 -minInd 34 -setMinDepth 272 -setMaxDepth 630 -doCounts 1 \
    -GL 1 -doMajorMinor 1 -doMaf 1 -skipTriallelic 1 -SNP_pval 1e-6 -doGeno 8 -doPost 1

This command not only generates the required `*.mafs.gz`, but also the `*.geno.gz`, which contains posterior probabilities of all possible genotypes required for estimating genetic distance with [ngsDistv1.0.10](https://github.com/mfumagalli/ngsTools). In addition, a `pops.label` file denoting the population of origin (one entry on a new line for each samples) is necessary for estimating genetic distance.  

In [None]:
NSITES=$(zcat GLOBAL.mafs.gz | tail -n +2 | wc -l)
echo $NSITES

cat GLOBAL.list | sed 's%/path/to/bams/%%g' | sed 's/_autosomes_nodup.bam//g' > ${ANGSD}GLOBAL_mds_list

ngsDist -verbose 1 -geno ${ANGSD}structure/GLOBAL.geno.gz -probs \
    -n_ind 34 -n_sites $NSITES -labels ${ANGSD}mds_list -o ${ANGSD}structure/GLOBAL_dist

A total of 3,283,545 sites were used in the whole-genome and X,XXX,XXX were used in the neutral data set.  

Next we find the percent of variance explained by each axis.  

### SNP- and SV-based Inference of Structure with multiple dimensional scaling (MDS)
The high F<sub>ST</sub> inferred from SNP data corresponds to high differentiation observed in our MDS with over 97% and 84% of population variance represented along one dimension.  

In [None]:
mds = pd.read_csv('angsd/structureS/GLOBAL_neutral.mds', sep='\t')
svmds = pd.read_csv('graphtyper/fairy_mds/GLOBAL_sv.mds', sep='\t')
pops = pd.read_csv('angsd/structure_MDS/mds_pops.tsv', sep='\t', header=None)

color = ['gold'] * 19 + ['steelblue'] * 54

fig, axes = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False)

axes[0].scatter(mds['D1_97.5736950999342'], mds['D2_0.142602294031928'], c=color)
axes[0].axvline(x=sum(mds['D1_97.5736950999342']) / len(mds['D1_97.5736950999342']), color='grey', linewidth=0.25)
axes[0].axhline(y=sum(mds['D2_0.142602294031928']) / len(mds['D2_0.142602294031928']), color='grey', linewidth=0.25)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[0].set_xlabel('Dimension 1 (97.6%)', fontsize=18)
axes[0].set_ylabel('Dimension 2 (0.14%)', fontsize=18)
axes[0].set_title('A)', fontsize=20, loc='left')

axes[1].scatter(svmds['D1_89.9822458572729'], svmds['D2_0.646326592188632'], c=color)
axes[1].axvline(x=sum(svmds['D1_89.9822458572729']) / len(svmds['D1_89.9822458572729']), color='grey', linewidth=0.25)
axes[1].axhline(y=sum(svmds['D2_0.646326592188632']) / len(svmds['D2_0.646326592188632']), color='grey', linewidth=0.25)
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[1].set_xlabel('Dimension 1 (89.98%)', fontsize = 18)
axes[1].set_ylabel('Dimension 2 (0.65%)', fontsize = 18)
axes[1].set_title('B)', fontsize=20, loc='left')

plt.savefig('plots/GLOBAL_MDS.png', dpi=300, bbox_inches='tight')

## Population Structure
### SNP-based Estimates of F<sub>ST</sub>
Plot of F<sub>ST</sub> between putatively neutral sites between Australian fairy tern and tara iti.  

# Note that we're using the theta-statistics from later file? Or merge?

In [None]:
fst = pd.read_csv('angsd/distance/GLOBAL_whole-genome_stat2_50KBwindow_10KBstep.tsv', sep='\t', usecols=['chr', 'midPos', 'Nsites', 'FST'])

fst['x'] = range(len(fst))

plt.figure(figsize=(20,5))
palette=['grey', 'black']
ax=sns.scatterplot(data=fst, x='x', y='FST', hue='chr', palette=palette, alpha=0.5, s=3, legend=False)

ax.set_title('A)', fontsize=20, loc='left')
ax.set_xlabel('Chromosome', fontsize=20)
ax.set_ylabel('$F_{ST}$', fontsize=18)
ax.set_xlim(0, 108800)
ax.set_ylim(0, 1)
ax.set_xticks([])

plt.savefig('plots/whole-genome_Fst_allChr.png', dpi=300, bbox_inches='tight')

In [None]:
neutralfst = pd.read_csv('angsd/distance/GLOBAL_neutral_stat2_50KBwindow_10KBstep.tsv', sep='\t', usecols=['chr', 'midPos', 'Nsites', 'FST'])

neutralfst['x'] = range(len(neutralfst))

plt.figure(figsize=(20,5))
palette=['grey', 'black']
ax=sns.scatterplot(data=neutralfst, x='x', y='FST', hue='chr', palette=palette, alpha=0.5, s=3, legend=False)

ax.set_title('C)', fontsize=20, loc='left')
ax.set_xlabel('Chromosome', fontsize=20)
ax.set_ylabel('$F_{ST}$', fontsize=18)
ax.set_xlim(0, 101500)
ax.set_ylim(0, 1)
ax.set_xticks([])

plt.savefig('plots/neutral_Fst_allChr.png', dpi=300, bbox_inches='tight')

In [None]:
chr1fst = neutralfst[(neutralfst['chr']=='CM020437.1_RagTag')]

chr1fst['x'] = range(len(chr1fst))

plt.figure(figsize=(14,7))
palette=['black', 'grey']
ax=sns.scatterplot(data=chr1fst, x='x', y='FST', color='black', alpha=0.5, s=3, legend=False)

ax.set_xlabel('Chromosome Position', fontsize=14)
ax.set_ylabel('$F_{ST}$', fontsize=14)
ax.set_xlim(0, 20800)
ax.set_ylim(0, 1)

plt.savefig('plots/neutral_Fst_Chr1.png')