# Actinipterygii order-level MDS

This notebook performs MDS on an order-level basis to construct point locations. Each
group of points is then grafted onto the scaffold constructed in the previous notebook,
`130 Fish species tree scaffold.ipynb`.

The first iteration of this tried to use the tree to build a monophyletic grouping by
finding the MRCA of all the genera in an order. This does not seem to work correctly, so
we're going to use the main distance matrix constructed in 104/105 and just grab taxa
from that. 

In [1]:
# Packages.
import pandas as pd
import numpy as np
from sklearn.manifold import MDS
import plotly.express as px

In [2]:
# Load in the main distance matrix and the taxonomy list we made earlier.
dist_matrix = pd.read_csv("output/Actinopterygii_tree_distance_matrix_py.csv", index_col=0)
taxonomy_list = pd.read_csv("output/Actinopterygii_genus_order_family_taxon.csv", index_col=0)

In [3]:
# Change the index of the taxonomy_list to be the taxon name.
taxonomy_list.index = taxonomy_list['taxon']
taxonomy_list.index.name = 'taxon'

In [4]:
# Make a list of all the fish orders. At some point, it would be useful to
# have a Misof-style tree of fish orders in phylogenetic order from most
# basal to most derived, but for now we'll just use the order they appear in the
# taxonomy list.
fish_orders = taxonomy_list['order'].unique()

In [8]:
# Let's make an output dir for the order-level mds results.
import pathlib

mds_by_order_output_dir = pathlib.Path('output/mds_by_order')

mds_by_order_output_dir.mkdir(exist_ok=True)

## MDS for each order

We need to run MDS for each order individually. Let's make a list of all the orders, then make a function that runs MDS on one order. We'll then loop over all the orders.

In [5]:
# Let's make a quick table with the number of taxa in each order.
order_counts = taxonomy_list['order'].value_counts().reset_index()
order_counts.columns = ['order', 'count']
order_counts = order_counts.sort_values('count', ascending=False)
print(order_counts)

                   order  count
0            Perciformes   5731
1           Siluriformes   1891
2          Cypriniformes   1878
3          Characiformes   1019
4     Cyprinodontiformes    809
5        Scorpaeniformes    547
6      Pleuronectiformes    328
7      Tetraodontiformes    267
8         Atheriniformes    250
9         Anguilliformes    228
10       Syngnathiformes    210
11            Gadiformes    197
12          Clupeiformes    185
13         Gymnotiformes    176
14          Osmeriformes    142
15          Beloniformes    131
16        Myctophiformes    130
17          Lophiiformes    113
18          Stomiiformes    103
19     Osteoglossiformes    103
20         Salmoniformes     89
21          Aulopiformes     78
22          Beryciformes     77
23          Mugiliformes     64
24         Ophidiiformes     64
25       Gobiesociformes     53
26      Synbranchiformes     49
27             Zeiformes     28
28      Acipenseriformes     26
29     Batrachoidiformes     20
30     G

In [9]:
# Pufferfishes (Tetraodontiformes) is in the middle, so we'll use that as a test.

current_order = 'Tetraodontiformes'  # Change this to process a different order

def do_mds_for_order(current_order):

    print(f"Processing order: {current_order}")

    # Get the list of taxa in this order.
    taxa_in_order = taxonomy_list[taxonomy_list['order'] == current_order]['taxon'].tolist()
    print(f"Number of taxa in {current_order}: {len(taxa_in_order)}")

    # Now filter the distance matrix to only include those taxa.
    filtered_matrix = dist_matrix.loc[taxa_in_order, taxa_in_order]
    print(f"Filtered matrix shape for {current_order}: {filtered_matrix.shape}")

    # Now run MDS on this distance matrix.
    print("Running MDS...", end='', flush=True)
    mds = MDS(n_components=2 , max_iter=4000 , eps = 10**-6, dissimilarity='precomputed', n_jobs=-1, verbose=10)
    df_mds = mds.fit(filtered_matrix)
    print("done.")

    df_mds = pd.DataFrame( mds.embedding_ , index = filtered_matrix.index , columns = list('xy')) 
    df_mds.index.name = 'taxon'

    # Let's add the order and family information back in. Merge based on 'taxon' in both dataframes.
    # First make sure the index is named 'taxon' in both dataframes.
    taxonomy_list.index.name = 'taxon'
    df_mds = df_mds.merge(taxonomy_list[['order', 'family']], left_index=True, right_index=True)

    # Save the MDS results to a CSV file.
    output_path = mds_by_order_output_dir / f"{current_order}_2D_mMDS_sklearn.csv"

    df_mds.to_csv(output_path)

do_mds_for_order(current_order)

Processing order: Tetraodontiformes
Number of taxa in Tetraodontiformes: 267
Filtered matrix shape for Tetraodontiformes: (267, 267)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.


[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.7s remaining:    1.7s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.7s finished


## Loop over all orders

Works, now run it on everything. This takes about 15 minutes on an AlienWare area 51.

In [None]:
for current_order in fish_orders:
    do_mds_for_order(current_order)

Processing order: Lepisosteiformes
Processing order: Lepisosteiformes
Number of taxa in Lepisosteiformes: 7
Filtered matrix shape for Lepisosteiformes: (7, 7)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Amiiformes
Processing order: Amiiformes
Number of taxa in Amiiformes: 1
Filtered matrix shape for Amiiformes: (1, 1)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.6s remaining:    1.6s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.6s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Osmeriformes
Processing order: Osmeriformes
Number of taxa in Osmeriformes: 142
Filtered matrix shape for Osmeriformes: (142, 142)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.4s remaining:    1.4s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.4s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Argentiniformes
Processing order: Argentiniformes
Number of taxa in Argentiniformes: 8
Filtered matrix shape for Argentiniformes: (8, 8)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.2s remaining:    1.2s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Esociformes
Processing order: Esociformes
Number of taxa in Esociformes: 12
Filtered matrix shape for Esociformes: (12, 12)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.1s remaining:    1.1s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Salmoniformes
Processing order: Salmoniformes
Number of taxa in Salmoniformes: 89
Filtered matrix shape for Salmoniformes: (89, 89)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.1s remaining:    1.1s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Lampriformes
Processing order: Lampriformes
Number of taxa in Lampriformes: 16
Filtered matrix shape for Lampriformes: (16, 16)
Running MDS...done.
Processing order: Ophidiiformes
Processing order: Ophidiiformes
Number of taxa in Ophidiiformes: 64
Filtered matrix shape for Ophidiiformes: (64, 64)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.2s remaining:    1.2s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished


done.
Processing order: Perciformes
Processing order: Perciformes
Number of taxa in Perciformes: 5731
Filtered matrix shape for Perciformes: (5731, 5731)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed: 11.3min remaining: 11.3min
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed: 11.9min finished


done.
Processing order: Acanthuriformes
Processing order: Acanthuriformes
Number of taxa in Acanthuriformes: 7
Filtered matrix shape for Acanthuriformes: (7, 7)
Running MDS...done.
Processing order: Spariformes
Processing order: Spariformes
Number of taxa in Spariformes: 1
Filtered matrix shape for Spariformes: (1, 1)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Tetraodontiformes
Processing order: Tetraodontiformes
Number of taxa in Tetraodontiformes: 267
Filtered matrix shape for Tetraodontiformes: (267, 267)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.5s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Lophiiformes
Processing order: Lophiiformes
Number of taxa in Lophiiformes: 113
Filtered matrix shape for Lophiiformes: (113, 113)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.5s remaining:    1.5s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.5s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Centrarchiformes
Processing order: Centrarchiformes
Number of taxa in Centrarchiformes: 1
Filtered matrix shape for Centrarchiformes: (1, 1)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.4s remaining:    1.4s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.5s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Scorpaeniformes
Processing order: Scorpaeniformes
Number of taxa in Scorpaeniformes: 547
Filtered matrix shape for Scorpaeniformes: (547, 547)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.8s remaining:    1.8s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.8s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    2.9s remaining:    2.9s


done.
Processing order: Gasterosteiformes
Processing order: Gasterosteiformes
Number of taxa in Gasterosteiformes: 20
Filtered matrix shape for Gasterosteiformes: (20, 20)
Running MDS...

[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    4.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Synbranchiformes
Processing order: Synbranchiformes
Number of taxa in Synbranchiformes: 49
Filtered matrix shape for Synbranchiformes: (49, 49)
Running MDS...done.
Processing order: Pleuronectiformes
Processing order: Pleuronectiformes
Number of taxa in Pleuronectiformes: 328
Filtered matrix shape for Pleuronectiformes: (328, 328)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    1.4s remaining:    1.4s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.4s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Cichliformes
Processing order: Cichliformes
Number of taxa in Cichliformes: 7
Filtered matrix shape for Cichliformes: (7, 7)
Running MDS...done.
Processing order: Atheriniformes
Processing order: Atheriniformes
Number of taxa in Atheriniformes: 250
Filtered matrix shape for Atheriniformes: (250, 250)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.4s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.5s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Beloniformes
Processing order: Beloniformes
Number of taxa in Beloniformes: 131
Filtered matrix shape for Beloniformes: (131, 131)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished


done.
Processing order: Cyprinodontiformes
Processing order: Cyprinodontiformes
Number of taxa in Cyprinodontiformes: 809
Filtered matrix shape for Cyprinodontiformes: (809, 809)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    5.5s remaining:    5.5s


done.
Processing order: Gobiesociformes
Processing order: Gobiesociformes
Number of taxa in Gobiesociformes: 53
Filtered matrix shape for Gobiesociformes: (53, 53)
Running MDS...done.
Processing order: Mugiliformes
Processing order: Mugiliformes
Number of taxa in Mugiliformes: 64
Filtered matrix shape for Mugiliformes: (64, 64)
Running MDS...

[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    7.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished


done.
Processing order: Syngnathiformes
Processing order: Syngnathiformes
Number of taxa in Syngnathiformes: 210
Filtered matrix shape for Syngnathiformes: (210, 210)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s


done.
Processing order: Batrachoidiformes
Processing order: Batrachoidiformes
Number of taxa in Batrachoidiformes: 20
Filtered matrix shape for Batrachoidiformes: (20, 20)
Running MDS...done.
Processing order: Beryciformes
Processing order: Beryciformes
Number of taxa in Beryciformes: 77
Filtered matrix shape for Beryciformes: (77, 77)
Running MDS...done.
Processing order: Cetomimiformes
Processing order: Cetomimiformes
Number of taxa in Cetomimiformes: 13
Filtered matrix shape for Cetomimiformes: (13, 13)
Running MDS...done.
Processing order: Stephanoberyciformes
Processing order: Stephanoberyciformes
Number of taxa in Stephanoberyciformes: 15
Filtered matrix shape for Stephanoberyciformes: (15, 15)
Running MDS...done.
Processing order: Gadiformes
Processing order: Gadiformes
Number of taxa in Gadiformes: 197
Filtered matrix shape for Gadiformes: (197, 197)
Running MDS...

[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_j

done.
Processing order: Zeiformes
Processing order: Zeiformes
Number of taxa in Zeiformes: 28
Filtered matrix shape for Zeiformes: (28, 28)
Running MDS...done.
Processing order: Percopsiformes
Processing order: Percopsiformes
Number of taxa in Percopsiformes: 12
Filtered matrix shape for Percopsiformes: (12, 12)
Running MDS...done.
Processing order: Polymixiiformes
Processing order: Polymixiiformes
Number of taxa in Polymixiiformes: 4
Filtered matrix shape for Polymixiiformes: (4, 4)
Running MDS...done.
Processing order: Myctophiformes
Processing order: Myctophiformes
Number of taxa in Myctophiformes: 130
Filtered matrix shape for Myctophiformes: (130, 130)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.


done.
Processing order: Aulopiformes
Processing order: Aulopiformes
Number of taxa in Aulopiformes: 78
Filtered matrix shape for Aulopiformes: (78, 78)
Running MDS...done.
Processing order: Ateleopodiformes
Processing order: Ateleopodiformes
Number of taxa in Ateleopodiformes: 6
Filtered matrix shape for Ateleopodiformes: (6, 6)
Running MDS...done.
Processing order: Stomiiformes
Processing order: Stomiiformes
Number of taxa in Stomiiformes: 103
Filtered matrix shape for Stomiiformes: (103, 103)
Running MDS...

[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s


done.
Processing order: Gymnotiformes
Processing order: Gymnotiformes
Number of taxa in Gymnotiformes: 176
Filtered matrix shape for Gymnotiformes: (176, 176)
Running MDS...done.
Processing order: Siluriformes
Processing order: Siluriformes
Number of taxa in Siluriformes: 1891
Filtered matrix shape for Siluriformes: (1891, 1891)
Running MDS...

[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:   42.3s remaining:   42.3s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:   52.8s finished


done.
Processing order: Characiformes
Processing order: Characiformes
Number of taxa in Characiformes: 1019
Filtered matrix shape for Characiformes: (1019, 1019)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:   16.0s remaining:   16.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:   19.2s finished


done.
Processing order: Cypriniformes
Processing order: Cypriniformes
Number of taxa in Cypriniformes: 1878
Filtered matrix shape for Cypriniformes: (1878, 1878)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:   56.5s remaining:   56.5s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:  1.2min finished


done.
Processing order: Gonorynchiformes
Processing order: Gonorynchiformes
Number of taxa in Gonorynchiformes: 14
Filtered matrix shape for Gonorynchiformes: (14, 14)
Running MDS...done.
Processing order: Clupeiformes
Processing order: Clupeiformes
Number of taxa in Clupeiformes: 185
Filtered matrix shape for Clupeiformes: (185, 185)
Running MDS...

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s


done.
Processing order: Elopiformes
Processing order: Elopiformes
Number of taxa in Elopiformes: 7
Filtered matrix shape for Elopiformes: (7, 7)
Running MDS...done.
Processing order: Notacanthiformes
Processing order: Notacanthiformes
Number of taxa in Notacanthiformes: 9
Filtered matrix shape for Notacanthiformes: (9, 9)
Running MDS...done.
Processing order: Anguilliformes
Processing order: Anguilliformes
Number of taxa in Anguilliformes: 228
Filtered matrix shape for Anguilliformes: (228, 228)
Running MDS...

[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished


done.
Processing order: Saccopharyngiformes
Processing order: Saccopharyngiformes
Number of taxa in Saccopharyngiformes: 5
Filtered matrix shape for Saccopharyngiformes: (5, 5)
Running MDS...done.
Processing order: Albuliformes
Processing order: Albuliformes
Number of taxa in Albuliformes: 8
Filtered matrix shape for Albuliformes: (8, 8)
Running MDS...done.
Processing order: Osteoglossiformes
Processing order: Osteoglossiformes
Number of taxa in Osteoglossiformes: 103
Filtered matrix shape for Osteoglossiformes: (103, 103)
Running MDS...done.
Processing order: Acipenseriformes
Processing order: Acipenseriformes
Number of taxa in Acipenseriformes: 26
Filtered matrix shape for Acipenseriformes: (26, 26)
Running MDS...done.
Processing order: Polypteriformes
Processing order: Polypteriformes
Number of taxa in Polypteriformes: 14
Filtered matrix shape for Polypteriformes: (14, 14)
Running MDS...done.


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parall

## Plotting

What do these look like? Do one at a time, selecting what we want.

In [13]:
current_order = 'Anguilliformes'  # Change this to visualize a different order

df_mds = pd.read_csv(mds_by_order_output_dir / f"{current_order}_2D_mMDS_sklearn.csv", index_col=0)
# Create the scatter plot.

fig = px.scatter(df_mds, x='x', y='y', color='family', hover_name=df_mds.index)
fig.update_layout(title=f"2D mMDS of {current_order} Genera", xaxis_title="MDS1", yaxis_title="MDS2")
fig.update_layout(height=800, width=800)
fig.show()

# Overlapping points

For each order, let's see how many points overlap in the MDS plot.

In [14]:
# Load in the CSV for each one and count the number of coincident points (identical x,y coordinates).

for current_order in fish_orders:
    order_path = mds_by_order_output_dir / f"{current_order}_2D_mMDS_sklearn.csv"
    df_mds = pd.read_csv(order_path, index_col=0)
    coord_counts = df_mds.groupby(['x', 'y']).size()
    num_coincident = (coord_counts > 1).sum()
    total_points = len(df_mds)
    print(f"{current_order}: {num_coincident} coincident points out of {total_points} total points.")

Lepisosteiformes: 0 coincident points out of 7 total points.
Amiiformes: 0 coincident points out of 1 total points.
Osmeriformes: 0 coincident points out of 142 total points.
Argentiniformes: 0 coincident points out of 8 total points.
Esociformes: 0 coincident points out of 12 total points.
Salmoniformes: 0 coincident points out of 89 total points.
Lampriformes: 0 coincident points out of 16 total points.
Ophidiiformes: 0 coincident points out of 64 total points.
Perciformes: 0 coincident points out of 5731 total points.
Acanthuriformes: 0 coincident points out of 7 total points.
Spariformes: 0 coincident points out of 1 total points.
Tetraodontiformes: 0 coincident points out of 267 total points.
Lophiiformes: 0 coincident points out of 113 total points.
Centrarchiformes: 0 coincident points out of 1 total points.
Scorpaeniformes: 0 coincident points out of 547 total points.
Gasterosteiformes: 0 coincident points out of 20 total points.
Synbranchiformes: 0 coincident points out of 49 