# Phytoplankton Group Annotation v3

Updated annotation from Yubin for phytoplankton-specific taxa. Data from:
- 20230822_Supplementary_Table_PPG_Assignment_V1.docx
- 20230823_Dinoflagellate_Supplementary_Table_2.docx
- Table S1_gomez_a review on dinoflagellates.txt

Created 3 seperate tables to recombine into one phytoplankton table:
- plastids (represents 18S groups except for dinoflagellates)
- 16S phytoplankton (cyanobacteria)
- dinoflagellates


Lexi Jones-Kellett

Date Created: 02/01/24

Last Edited: 08/12/25

In [1]:
import csv
import pandas as pd
import numpy as np

from config import * 

Open current taxonomy file

In [2]:
taxonomy = []
with open(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        taxonomy.append(row)
        
taxonomy[0]

['ASV_ID', 'Taxonomy', 'Broad_Plankton_Group']

In [3]:
df = pd.DataFrame([i[1:] for i in taxonomy[1:]],columns=taxonomy[0][1:],index=[i[0] for i in taxonomy[1:]])
df = pd.DataFrame(taxonomy[1:],columns=taxonomy[0]) # adjust index
df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group
0,e9f376c867c2a1506d5b0996f9839a20,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
1,86cd70d2450e3f44f4a7c543f53008ee,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
2,0e775fc1584ed21db02c44461c5e120e,d__Bacteria; p__Proteobacteria; c__Alphaproteo...,Non-autotrophic Prokaryote
3,d089e76cb77366eba904ed413876e849,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
4,1ce3b5c6d85ce967f8677e23e3e9b0be,d__Bacteria; p__Proteobacteria; c__Alphaproteo...,Non-autotrophic Prokaryote
...,...,...,...
39616,abf2e4102d9b84d8db80f0ca5de000f4,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate
39617,0a691617faeddaae308a6c1678b5e72a,Eukaryota; Opisthokonta; Metazoa; Cnidaria; Cn...,Metazoa
39618,ede7aace70c4e3cb7bdfb5bd0cfd3132,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate
39619,a046065f49af925d69c1e4326d06003e,Eukaryota; Opisthokonta; Metazoa; Cnidaria; Cn...,Metazoa


## Annotate Plastids 

The plastid group should represent all eukaryotes except for dinoflagellates. Do not want to double count with 18S phytoplankton ASVs.

In [4]:
print(len(df.loc[df['Taxonomy'].str.contains(':plas')]['Taxonomy'])) # was string 'plas;'
print(len(np.unique(df.loc[df['Taxonomy'].str.contains(':plas')]['Taxonomy'])))
print(np.unique(df.loc[df['Taxonomy'].str.contains(':plas')]['Taxonomy']))

3511
138
['Eukaryota:plas'
 'Eukaryota:plas; Alveolata:plas; Dinoflagellata:plas; Dinophyceae:plas; Gymnodiniales:plas; Kareniaceae:plas; Karenia:plas; Karenia_brevis:plas'
 'Eukaryota:plas; Alveolata:plas; Dinoflagellata:plas; Dinophyceae:plas; Gymnodiniales:plas; Kareniaceae:plas; Karenia:plas; Karenia_mikimotoi:plas'
 'Eukaryota:plas; Archaeplastida:plas'
 'Eukaryota:plas; Archaeplastida:plas; Chlorophyta:plas'
 'Eukaryota:plas; Archaeplastida:plas; Chlorophyta:plas; Chlorodendrophyceae:plas; Chlorodendrales:plas; Chlorodendraceae:plas; Tetraselmis:plas'
 'Eukaryota:plas; Archaeplastida:plas; Chlorophyta:plas; Chlorodendrophyceae:plas; Chlorodendrales:plas; Chlorodendraceae:plas; Tetraselmis:plas; Tetraselmis_cordiformis:plas'
 'Eukaryota:plas; Archaeplastida:plas; Chlorophyta:plas; Chlorodendrophyceae:plas; Chlorodendrales:plas; Chlorodendraceae:plas; Tetraselmis:plas; Tetraselmis_sp.:plas'
 'Eukaryota:plas; Archaeplastida:plas; Chlorophyta:plas; Chloropicophyceae:plas; Chloropical

In [5]:
plas_df = df.loc[df['Taxonomy'].str.contains(':plas')]
plas_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group
120,2f52bfb075d038a1b72a4dd9d53a01e8,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton
135,a90268a55b92cdf24a7be460a63b6f60,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton
165,3a9e3efa6d0f0df69668b602c9bac618,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton
209,c11382dbe93c5cf9e98818644c10014b,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton
229,34178fed607a631f7a37c6b2b70ca571,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton
...,...,...,...
23967,7e7f22c537fc8f61dbbdef992facaa11,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton
23970,14dd57563a34ccd98d0bcdd5b7e21246,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton
23971,4ca831b4a5bad16c7d25cf9f6dc27f44,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton
23976,659f218bcb141dbcce6fb58c95039935,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton


In [6]:
plas_df['Phytoplankton_Broad_Group'] = 'Plastid'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  plas_df['Phytoplankton_Broad_Group'] = 'Plastid'


In [7]:
plas_df['Phytoplankton_Group'] = ''
plas_df.loc[plas_df['Taxonomy'].str.contains('Chlorophyta:plas'),'Phytoplankton_Group'] = 'Chlorophyte'
plas_df.loc[plas_df['Taxonomy'].str.contains('Rhodophyta:plas'),'Phytoplankton_Group'] = 'Seaweed'
plas_df.loc[plas_df['Taxonomy'].str.contains('Bacillariophyta:plas'),'Phytoplankton_Group'] = 'Diatom'
plas_df.loc[plas_df['Taxonomy'].str.contains('Dinoflagellata:plas'),'Phytoplankton_Group'] = 'Dinoflagellate'
plas_df.loc[plas_df['Taxonomy'].str.contains('Prymnesiophyceae:plas'),'Phytoplankton_Group'] = 'Prymnesiophyte' #phaeo & cocc
plas_df.loc[plas_df['Taxonomy'].str.contains('Pelagophyceae:plas'),'Phytoplankton_Group'] = 'Pelagophyte'
plas_df.loc[plas_df['Taxonomy'].str.contains('Dictyochophyceae:plas'),'Phytoplankton_Group'] = 'Dictyochophyte'
plas_df.loc[plas_df['Taxonomy'].str.contains('Rhizaria:plas'),'Phytoplankton_Group'] = 'Rhizaria' # Rhizaria:plas may be symbionts. Not "traditional phytoplankton"
plas_df.loc[plas_df['Taxonomy'].str.contains('Excavata:plas'),'Phytoplankton_Group'] = 'Excavata' #ask Jed about these
plas_df.loc[plas_df['Taxonomy'].str.contains('Prasinodermophyta:plas'),'Phytoplankton_Group'] = 'Prasinodermophyte' # found deeper in water column because of C4 
plas_df.loc[plas_df['Taxonomy'].str.contains('Chrysophyceae:plas'),'Phytoplankton_Group'] = 'Chrysophyte'
plas_df.loc[plas_df['Taxonomy'].str.contains('Cryptophyta:plas'),'Phytoplankton_Group'] = 'Cryptophyte'
plas_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  plas_df['Phytoplankton_Group'] = ''
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
120,2f52bfb075d038a1b72a4dd9d53a01e8,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
135,a90268a55b92cdf24a7be460a63b6f60,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
165,3a9e3efa6d0f0df69668b602c9bac618,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
209,c11382dbe93c5cf9e98818644c10014b,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
229,34178fed607a631f7a37c6b2b70ca571,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
...,...,...,...,...,...
23967,7e7f22c537fc8f61dbbdef992facaa11,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
23970,14dd57563a34ccd98d0bcdd5b7e21246,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
23971,4ca831b4a5bad16c7d25cf9f6dc27f44,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
23976,659f218bcb141dbcce6fb58c95039935,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte


In [8]:
print(np.unique(plas_df['Broad_Plankton_Group']))
print(len(np.unique(plas_df.loc[plas_df['Phytoplankton_Group'].eq('')]['Taxonomy'])))
print(np.unique(plas_df.loc[plas_df['Phytoplankton_Group'].eq('')]['Taxonomy']))

['Dinoflagellate' 'Phytoplankton']
18
['Eukaryota:plas' 'Eukaryota:plas; Archaeplastida:plas'
 'Eukaryota:plas; Archaeplastida:plas; Streptophyta:plas'
 'Eukaryota:plas; Archaeplastida:plas; Streptophyta:plas; Embryophyceae:plas; Embryophyceae_X:plas; Embryophyceae_XX:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Pavlovophyceae:plas; Pavlovales:plas; Pavlovaceae:plas; Pavlova:plas; Pavlova_sp.:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Rappephyceae:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Rappephyceae:plas; Pavlomulinales:plas; Pavlomulinaceae:plas; Pavlomulina:plas; Pavlomulina_ranunculiformis:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Rappephyceae:plas; Rappephyceae_X:plas; Rappephyceae_XX:plas; Rappephyceae_XXX:plas; Rappephyceae_XXX_sp.:plas'
 'Eukaryota:plas; Stramenopiles:plas; Ochrophyta:plas'
 'Eukaryota:plas; Stramenopiles:plas; Ochrophyta:plas; Bolidophyceae:plas; Parmales:pla

Remove some groups from the table for the phytoplankton analyis:
- Rhizaria (not conventional pp)
- Excavata (? negligible in dataset)
- Dinoflagellates (do not want to double count with the 18S dinos)
- Seaweed

In [9]:
plas_df[plas_df['Phytoplankton_Group'].eq('Excavata')]

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
13592,c4d2fdc3c4c63b146e26efe3e676d3cb,Eukaryota:plas; Excavata:plas; Discoba:plas; E...,Phytoplankton,Plastid,Excavata
16259,8b3a44a2fb9c3e08972953d3ac8ae9f6,Eukaryota:plas; Excavata:plas; Discoba:plas; E...,Phytoplankton,Plastid,Excavata


In [10]:
plas_df = plas_df[plas_df['Phytoplankton_Group'].ne('Rhizaria')]
plas_df = plas_df[plas_df['Phytoplankton_Group'].ne('Excavata')]
plas_df = plas_df[plas_df['Phytoplankton_Group'].ne('Dinoflagellate')]
plas_df = plas_df[plas_df['Phytoplankton_Group'].ne('Seaweed')]

print(np.unique(plas_df['Broad_Plankton_Group']))
print(len(plas_df))
print(len(np.unique(plas_df.loc[plas_df['Phytoplankton_Group'].eq('')]['Taxonomy'])))
print(np.unique(plas_df.loc[plas_df['Phytoplankton_Group'].eq('')]['Taxonomy']))

['Phytoplankton']
3402
18
['Eukaryota:plas' 'Eukaryota:plas; Archaeplastida:plas'
 'Eukaryota:plas; Archaeplastida:plas; Streptophyta:plas'
 'Eukaryota:plas; Archaeplastida:plas; Streptophyta:plas; Embryophyceae:plas; Embryophyceae_X:plas; Embryophyceae_XX:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Pavlovophyceae:plas; Pavlovales:plas; Pavlovaceae:plas; Pavlova:plas; Pavlova_sp.:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Rappephyceae:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Rappephyceae:plas; Pavlomulinales:plas; Pavlomulinaceae:plas; Pavlomulina:plas; Pavlomulina_ranunculiformis:plas'
 'Eukaryota:plas; Hacrobia:plas; Haptophyta:plas; Rappephyceae:plas; Rappephyceae_X:plas; Rappephyceae_XX:plas; Rappephyceae_XXX:plas; Rappephyceae_XXX_sp.:plas'
 'Eukaryota:plas; Stramenopiles:plas; Ochrophyta:plas'
 'Eukaryota:plas; Stramenopiles:plas; Ochrophyta:plas; Bolidophyceae:plas; Parmales:plas; Triparmac

In [11]:
plas_df.loc[plas_df['Phytoplankton_Group'].eq(''),'Phytoplankton_Group'] = 'Unknown Eukaryote Chloroplast'
np.unique(plas_df['Phytoplankton_Group'])

array(['Chlorophyte', 'Chrysophyte', 'Cryptophyte', 'Diatom',
       'Dictyochophyte', 'Pelagophyte', 'Prasinodermophyte',
       'Prymnesiophyte', 'Unknown Eukaryote Chloroplast'], dtype=object)

In [12]:
#plas_df.to_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_plastid_phytoplankton.csv',index=False)

## Annotate 16S phytoplankton

"20230822_Supplementary_Table_PPG_Assignment_V1.docx"

In [13]:
pp_df = df.loc[(df['Broad_Plankton_Group'] == 'Phytoplankton')]

# remove plastids
#pp_df = pp_df.loc[~pp_df['Taxonomy'].str.contains('plas;')] 
pp_df = pp_df.loc[~pp_df['Taxonomy'].str.contains(':plas')]

print(len(pp_df))
print(len(np.unique(pp_df['Taxonomy'])))
print(np.unique(pp_df['Taxonomy']))

425
19
['d__Bacteria; p__Cyanobacteria; c__Cyanobacteriia'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Chroococcidiopsaceae; g__uncultured; s__uncultured_bacterium'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Microcystaceae; g__Atelocyanobacterium_(UCYN-A)'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Microcystaceae; g__Atelocyanobacterium_(UCYN-A); s__Candidatus_Atelocyanobacterium'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Microcystaceae; g__Synechocystis_CCALA_700; s__uncultured_bacterium'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Nostocaceae'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Nostocaceae; g__Richelia_HH01; s__Richelia_intracellularis'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f_

In [14]:
pp_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group
0,e9f376c867c2a1506d5b0996f9839a20,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
1,86cd70d2450e3f44f4a7c543f53008ee,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
3,d089e76cb77366eba904ed413876e849,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
11,27ca2b8f287c3a7e4d864cc870cf67b7,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
21,7a1011dfbc8b253cb0032a00e74454e9,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
...,...,...,...
23604,e945de40b7009c66ef4a603497de8fd0,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
23824,7914f2cb180e56767913c618e10584ca,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
23848,4b5700f02013b72be1c6b38ee82bccb5,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton
23850,cf6481db05a7943d19332115012f0cf8,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton


In [15]:
pp_df['Phytoplankton_Broad_Group'] = 'Cyanobacteria'
pp_df['Phytoplankton_Group'] = ''

In [16]:
# Picocyanobacteria
pp_df.loc[pp_df['Taxonomy'].str.contains('Prochlorococcus'),'Phytoplankton_Group'] = 'Picocyanobacteria'
pp_df.loc[pp_df['Taxonomy'].str.contains('Synechococcus'),'Phytoplankton_Group'] = 'Picocyanobacteria'
pp_df.loc[pp_df['Taxonomy'].str.contains('Synechococcales'),'Phytoplankton_Group'] = 'Picocyanobacteria'

## Diazotroph
pp_df.loc[pp_df['Taxonomy'].str.contains('UCYN'),'Phytoplankton_Group'] = 'Diazotroph'
pp_df.loc[pp_df['Taxonomy'].str.contains('Trichodesmium'),'Phytoplankton_Group'] = 'Diazotroph'
pp_df.loc[pp_df['Taxonomy'].str.contains('Richelia'),'Phytoplankton_Group'] = 'Diazotroph'
#pp_df.loc[df['Taxonomy'].str.contains('Crocosphaera'),'Phytoplankton_Group'] = 'Diazotroph' # none in this dataset

print(len(np.unique(pp_df.loc[pp_df['Phytoplankton_Group'].eq('')]['Taxonomy'])))
print(np.unique(pp_df.loc[pp_df['Phytoplankton_Group'].eq('')]['Taxonomy']))

5
['d__Bacteria; p__Cyanobacteria; c__Cyanobacteriia'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Chroococcidiopsaceae; g__uncultured; s__uncultured_bacterium'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Microcystaceae; g__Synechocystis_CCALA_700; s__uncultured_bacterium'
 'd__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Nostocaceae']


In [17]:
pp_df[pp_df['Taxonomy'].eq('d__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Chroococcidiopsaceae; g__uncultured; s__uncultured_bacterium')]

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
23577,fe2272feb5ca26b071064b5b94b4a159,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,


In [18]:
pp_df[pp_df['Taxonomy'].eq('d__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Microcystaceae; g__Synechocystis_CCALA_700; s__uncultured_bacterium')]

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
6837,f66d397e6dfc0e59ada0c237f0d31c8b,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,
15747,bebd8483a2fd2e0ab2d71db69c11b93d,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,
19668,9534e7ddf0f2f873d004bd0842ac2717,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,


In [19]:
pp_df[pp_df['Taxonomy'].eq('d__Bacteria; p__Cyanobacteria; c__Cyanobacteriia; o__Cyanobacteriales; f__Nostocaceae')]

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
9549,c910e90a48a5f0789b196d65dd20b589,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,


#### "Other Cyanobacteria"

Chroococcidiopsaceae: photosynthetic, coccoidal bacterium, and the only genus in the order Chroococcidiopsidales and in the family Chroococcidiopsidaceae (negligible abundance here)

Synechocystis_CCALA_700: ? (negligible abundances for the 3 ASVs)

Nostocaceae: These bacteria sometimes contain photosynthetic pigments in their cytoplasm to perform photosynthesis Some are nitrogen fixing. (the unknown Nostocaceae is negligible in abundance here)

In [20]:
pp_df.loc[pp_df['Phytoplankton_Group'].eq(''),'Phytoplankton_Group'] = 'Unknown Cyanobacteria'
np.unique(pp_df['Phytoplankton_Group'])

array(['Diazotroph', 'Picocyanobacteria', 'Unknown Cyanobacteria'],
      dtype=object)

In [21]:
pp_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
0,e9f376c867c2a1506d5b0996f9839a20,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
1,86cd70d2450e3f44f4a7c543f53008ee,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
3,d089e76cb77366eba904ed413876e849,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
11,27ca2b8f287c3a7e4d864cc870cf67b7,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
21,7a1011dfbc8b253cb0032a00e74454e9,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
...,...,...,...,...,...
23604,e945de40b7009c66ef4a603497de8fd0,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
23824,7914f2cb180e56767913c618e10584ca,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
23848,4b5700f02013b72be1c6b38ee82bccb5,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
23850,cf6481db05a7943d19332115012f0cf8,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria


In [22]:
#pp_df.to_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_16S_phytoplankton.csv',index=False)

## Annotate dinoflagellates

"20230823_Dinoflagellate_Supplementary_Table_2.docx"

In [22]:
dino_df = df.loc[(df['Broad_Plankton_Group'] == 'Dinoflagellate')]
dino_df = dino_df.loc[~dino_df['Taxonomy'].str.contains(':plas')] # remove plastids
dino_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group
23984,58ff3b080834d0d5d0ad00c0b93f6cdb,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate
23990,f1ec76ed24d2f190db8ccfc3c8829853,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate
23995,587f06540735b3a03a706a18ac60ca8a,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate
23997,0021da5e24a76f1d7754222e977e2c4e,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate
24000,c104ba5a3c3ed119dbab8240b1ba1972,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate
...,...,...,...
39609,5440c5a1225f28b53adce26ad76ef79b,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate
39612,daf6f5d5d6492c130e0d457362e9273f,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate
39614,01da90a3b913ed1e9e85c6c87f88ccac,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate
39616,abf2e4102d9b84d8db80f0ca5de000f4,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate


Trophic strategies (or can be a combo of these):
- Parasite
- Heterotroph
- Photoautotroph
- Mixotroph

In [23]:
dino_df['Phytoplankton_Broad_Group'] = 'Dinoflagellate'
dino_df['Trophic_Strategy'] = ''
dino_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Trophic_Strategy
23984,58ff3b080834d0d5d0ad00c0b93f6cdb,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate,Dinoflagellate,
23990,f1ec76ed24d2f190db8ccfc3c8829853,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate,Dinoflagellate,
23995,587f06540735b3a03a706a18ac60ca8a,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate,Dinoflagellate,
23997,0021da5e24a76f1d7754222e977e2c4e,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,
24000,c104ba5a3c3ed119dbab8240b1ba1972,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate,Dinoflagellate,
...,...,...,...,...,...
39609,5440c5a1225f28b53adce26ad76ef79b,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate,Dinoflagellate,
39612,daf6f5d5d6492c130e0d457362e9273f,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate,Dinoflagellate,
39614,01da90a3b913ed1e9e85c6c87f88ccac,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,
39616,abf2e4102d9b84d8db80f0ca5de000f4,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,


In [24]:
print(len(dino_df))
print(len(np.unique(dino_df['Taxonomy'])))
print(np.unique(dino_df['Taxonomy']))

5659
176
['Eukaryota; Alveolata; Dinoflagellata'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX; Abedinium; Abedinium_dasypus'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX; Cucumeridinium; Cucumeridinium_lira'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX; Dinophyceae_XXX; Dinophyceae_XXX_sp.'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophysiales'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophysiales; Amphisoleniaceae; Amphisolenia; Amphisolenia_bidentata'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophysiales; Dinophysiaceae'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophysiales; Oxyphysiaceae; Phalachroma'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Gonyaulacales'
 'Eukaryota; Alve

In [25]:
# Annotation from Yubin's chart

dino_df.loc[dino_df['Taxonomy'].str.contains('Syndiniales'),'Trophic_Strategy'] = 'Parasite'
dino_df.loc[dino_df['Taxonomy'].str.contains('Balechina'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Gyrodinium'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Gymnodinium'),'Trophic_Strategy'] = 'Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Lepidodinium'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Margalefidinium'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Chytriodinium'),'Trophic_Strategy'] = 'Parasite_Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Ceratoperidinium_falcatum'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Warnowiaceae'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Takayama'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Karlodinium'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Karenia'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Kareniaceae'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'

dino_df.loc[dino_df['Taxonomy'].str.contains('Blastodinium_contortum'),'Trophic_Strategy'] = 'Parasite_Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Blastodinium_crassum'),'Trophic_Strategy'] = 'Parasite_Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Blastodinium_mangini'),'Trophic_Strategy'] = 'Parasite_Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Blastodinium_spinulosum'),'Trophic_Strategy'] = 'Parasite_Photoautotroph'

# All 'Blastodinium' that are not the 4 species above are parasitic mixotrophs
dino_df.loc[(dino_df['Trophic_Strategy'].eq('')) & dino_df['Taxonomy'].str.contains('Blastodinium'),'Trophic_Strategy'] = 'Parasite_Mixotroph' 

dino_df.loc[dino_df['Taxonomy'].str.contains('Azadinium'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Heterocapsa'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Podolampadaceae'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Heterodinium'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Alexandrium'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Goniodoma'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Gonyaulax'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Tripos'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Prorocentrum'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Torodinium'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Biecheleria'),'Trophic_Strategy'] = 'Photoautotroph_Mixotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Phalachroma'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Ellobiopsis'),'Trophic_Strategy'] = 'Parasite_Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Kofoidinium'),'Trophic_Strategy'] = 'Heterotroph'

# Annotation from Gomez chart
dino_df.loc[dino_df['Taxonomy'].str.contains('Abedinium_dasypus'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Amphisolenia'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Lingulodinium'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Ceratocorys'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Pyrocystis'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Fragilidium'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Pyrophacus'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Paragymnodinium'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Protoperidinium'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Scrippsiella'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Thoracosphaera_heimii'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Pelagodinium'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Symbiodinium'),'Trophic_Strategy'] = 'Photoautotroph'

# THere are other "Noctiluca" annotations that have different trophic strategies (e.g. Kofoidinium is heterotrophic)
dino_df.loc[dino_df['Taxonomy'].str.contains('Noctiluca_scintillans'),'Trophic_Strategy'] = 'Heterotroph'
dino_df.loc[dino_df['Taxonomy'].str.contains('Spatulodinium'),'Trophic_Strategy'] = 'Photoautotroph'
dino_df.loc[dino_df['Taxonomy'].eq('Eukaryota; Alveolata; Dinoflagellata; Noctilucophyceae; Noctilucales; Noctilucaceae; Noctiluca'),'Trophic_Strategy'] = 'Heterotroph'

print(len(np.unique(dino_df.loc[dino_df['Trophic_Strategy'].eq('')]['Taxonomy'])))
print(np.unique(dino_df.loc[dino_df['Trophic_Strategy'].eq('')]['Taxonomy']))

20
['Eukaryota; Alveolata; Dinoflagellata'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX; Cucumeridinium; Cucumeridinium_lira'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophyceae_X; Dinophyceae_XX; Dinophyceae_XXX; Dinophyceae_XXX_sp.'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophysiales'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Dinophysiales; Dinophysiaceae'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Gonyaulacales'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Gymnodiniales'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Gymnodiniales; Gymnodiniaceae'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Gymnodiniales; Gymnodiniaceae; Polykrikos; Polykrikos_geminatus'
 'Eukaryota; Alveolata; Dinoflagellata; Dinophyceae; Peridiniales'
 'Eukaryota; Alveola

In [26]:
print(len(dino_df.loc[dino_df['Trophic_Strategy'].eq('')]['Taxonomy']))
print(dino_df.loc[dino_df['Trophic_Strategy'].eq('')]['Taxonomy'])

1607
23990    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
24000    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
24001    Eukaryota; Alveolata; Dinoflagellata; Dinophyc...
24027    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
24040    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
                               ...                        
39554    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
39556    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
39558                 Eukaryota; Alveolata; Dinoflagellata
39593    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
39609    Eukaryota; Alveolata; Dinoflagellata; Dinophyceae
Name: Taxonomy, Length: 1607, dtype: object


ASVs with too broad of a taxonomy are annotated to have an "Unknown" trophic strategy. Can check if any of the ASVs have significant abundance.

In [29]:
dino_df.loc[dino_df['Trophic_Strategy'].eq(''),'Trophic_Strategy'] = 'Unknown'

In [27]:
#dino_df.to_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_non_plastid_dinos_trophic_strategy.csv',index=False)

# Save master phytoplankton table

In [28]:
plas_df = pd.read_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_plastid_phytoplankton.csv')
plas_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
0,2f52bfb075d038a1b72a4dd9d53a01e8,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
1,a90268a55b92cdf24a7be460a63b6f60,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
2,3a9e3efa6d0f0df69668b602c9bac618,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
3,c11382dbe93c5cf9e98818644c10014b,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
4,34178fed607a631f7a37c6b2b70ca571,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
...,...,...,...,...,...
3397,7e7f22c537fc8f61dbbdef992facaa11,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
3398,14dd57563a34ccd98d0bcdd5b7e21246,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
3399,4ca831b4a5bad16c7d25cf9f6dc27f44,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
3400,659f218bcb141dbcce6fb58c95039935,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte


In [29]:
print(np.unique(plas_df['Broad_Plankton_Group']))
print(np.unique(plas_df['Phytoplankton_Group']))

['Phytoplankton']
['Chlorophyte' 'Chrysophyte' 'Cryptophyte' 'Diatom' 'Dictyochophyte'
 'Pelagophyte' 'Prasinodermophyte' 'Prymnesiophyte'
 'Unknown Eukaryote Chloroplast']


In [30]:
pp_16S = pd.read_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_16S_phytoplankton.csv')
pp_16S

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
0,e9f376c867c2a1506d5b0996f9839a20,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
1,86cd70d2450e3f44f4a7c543f53008ee,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
2,d089e76cb77366eba904ed413876e849,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
3,27ca2b8f287c3a7e4d864cc870cf67b7,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
4,7a1011dfbc8b253cb0032a00e74454e9,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
...,...,...,...,...,...
420,e945de40b7009c66ef4a603497de8fd0,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
421,7914f2cb180e56767913c618e10584ca,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
422,4b5700f02013b72be1c6b38ee82bccb5,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria
423,cf6481db05a7943d19332115012f0cf8,d__Bacteria; p__Cyanobacteria; c__Cyanobacteri...,Phytoplankton,Cyanobacteria,Picocyanobacteria


In [36]:
print(np.unique(pp_16S['Broad_Plankton_Group']))
print(np.unique(pp_16S['Phytoplankton_Group']))

['Phytoplankton']
['Diazotroph' 'Picocyanobacteria' 'Unknown Cyanobacteria']


In [31]:
dino_df = pd.read_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_non_plastid_dinos_trophic_strategy.csv')
dino_df

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Trophic_Strategy
0,58ff3b080834d0d5d0ad00c0b93f6cdb,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate,Dinoflagellate,Parasite
1,f1ec76ed24d2f190db8ccfc3c8829853,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate,Dinoflagellate,Unknown
2,587f06540735b3a03a706a18ac60ca8a,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate,Dinoflagellate,Parasite
3,0021da5e24a76f1d7754222e977e2c4e,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
4,c104ba5a3c3ed119dbab8240b1ba1972,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate,Dinoflagellate,Unknown
...,...,...,...,...,...
5654,5440c5a1225f28b53adce26ad76ef79b,Eukaryota; Alveolata; Dinoflagellata; Dinophyceae,Dinoflagellate,Dinoflagellate,Unknown
5655,daf6f5d5d6492c130e0d457362e9273f,Eukaryota; Alveolata; Dinoflagellata; Syndinia...,Dinoflagellate,Dinoflagellate,Parasite
5656,01da90a3b913ed1e9e85c6c87f88ccac,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Heterotroph
5657,abf2e4102d9b84d8db80f0ca5de000f4,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Heterotroph


In [32]:
print(np.unique(dino_df['Broad_Plankton_Group']))
print(np.unique(dino_df['Trophic_Strategy']))

['Dinoflagellate']
['Heterotroph' 'Mixotroph' 'Parasite' 'Parasite_Heterotroph'
 'Parasite_Mixotroph' 'Parasite_Photoautotroph' 'Photoautotroph'
 'Photoautotroph_Mixotroph' 'Unknown']


Subset dinoflagellate table so only the autotrophic ASVs are retained.

In [33]:
auto_dino = dino_df.loc[dino_df['Trophic_Strategy'].str.contains('Photoautotroph')]
auto_dino

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Trophic_Strategy
3,0021da5e24a76f1d7754222e977e2c4e,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
7,04bb4879ae86944b8984acaabc6967cb,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
8,973d656acbfcaa9a45325de371c89a5d,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
16,15b4e28e2e78e651ff42b1c6c4e7fb5c,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
22,1af82836b76bafcb5c93ec82c7ef68cb,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
...,...,...,...,...,...
5514,a5df7d2966b0227ef63d7305b268b702,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
5558,05a96f7af2184d3e2c1639514aaaa3f5,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
5598,aae3c59707710a2f53d26b6a727950e7,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph
5608,455f5695a94dbd7d00403fa5e99d5dee,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Photoautotroph_Mixotroph


Reformat dinoflagellate table to match the others

In [34]:
auto_dino = auto_dino.drop('Trophic_Strategy', axis=1)
auto_dino['Phytoplankton_Group'] = 'Dinoflagellate'
auto_dino

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
3,0021da5e24a76f1d7754222e977e2c4e,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
7,04bb4879ae86944b8984acaabc6967cb,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
8,973d656acbfcaa9a45325de371c89a5d,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
16,15b4e28e2e78e651ff42b1c6c4e7fb5c,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
22,1af82836b76bafcb5c93ec82c7ef68cb,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
...,...,...,...,...,...
5514,a5df7d2966b0227ef63d7305b268b702,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
5558,05a96f7af2184d3e2c1639514aaaa3f5,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
5598,aae3c59707710a2f53d26b6a727950e7,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
5608,455f5695a94dbd7d00403fa5e99d5dee,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate


Combine phytoplankton tables

In [35]:
all_pp = pd.concat([plas_df,pp_16S,auto_dino])
all_pp

Unnamed: 0,ASV_ID,Taxonomy,Broad_Plankton_Group,Phytoplankton_Broad_Group,Phytoplankton_Group
0,2f52bfb075d038a1b72a4dd9d53a01e8,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
1,a90268a55b92cdf24a7be460a63b6f60,Eukaryota:plas; Hacrobia:plas; Haptophyta:plas...,Phytoplankton,Plastid,Prymnesiophyte
2,3a9e3efa6d0f0df69668b602c9bac618,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
3,c11382dbe93c5cf9e98818644c10014b,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
4,34178fed607a631f7a37c6b2b70ca571,Eukaryota:plas; Stramenopiles:plas; Ochrophyta...,Phytoplankton,Plastid,Dictyochophyte
...,...,...,...,...,...
5514,a5df7d2966b0227ef63d7305b268b702,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
5558,05a96f7af2184d3e2c1639514aaaa3f5,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
5598,aae3c59707710a2f53d26b6a727950e7,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate
5608,455f5695a94dbd7d00403fa5e99d5dee,Eukaryota; Alveolata; Dinoflagellata; Dinophyc...,Dinoflagellate,Dinoflagellate,Dinoflagellate


In [36]:
#all_pp.to_csv(highcov_dir + '1.20230126_G4_ASV_ID_Taxonomy_Broad_Plankton_Groups_all_phytoplankton.csv',index=False)