The full functionality of `seaflowpy` can be found through the various submodules, e.g. `seaflowpy.db`, `seaflowpy.evt`. However, for convenience a few of the most commonly-used functions and classes are exposed at the package level.

* **seaflowpy.EVT**  
Class for EVT particle data

* **seaflowpy.find_evt_files**  
Function to recursively find EVT/OPP file paths within a directory

* **seaflowpy.concat_evts**  
Function to create a EVT single Pandas DataFrame from a list of EVT objects

The code below will use the test dataset in this repository at `./tests/testcruise/`. The files in this directory hold raw EVT data, but in this example workflow we'll treat the files as though they actually contain filtered OPP data. For example, we may have already filtered EVT to OPP data on the command-line with `filterevt`.

A note on terminology: In this package the phrase EVT can have two subtley different meanings.  

* Any binary file or Python data structure which holds SeaFlow particle data, regardless of whether raw or filtered.
* In the context of a filtering workflow, EVT refers to the raw, unfocused/unfiltered version of particle data, distinct from OPP data which refers to the filtered/focused particles.  

When filtering, we talk about converting EVT data to OPP data. We may read a raw EVT file into Python as a `seaflowpy.EVT` object and the raw particle data is stored as a pandas DataFrame in the `EVT.evt` attribute. We then filter the raw particle data with `seaflowpy.EVT.filter` and this filtered particle data is accessible as a pandas DataFrame in the `EVT.opp` atttribute. This is essentially what `filterevt` does.

But when we read filtered OPP files from disk, `seaflowpy.EVT` treats them in the same way it would treat reading raw EVT files. Particle data is stored in the new `EVT` object as a pandas DataFrame in the `EVT.evt` attribute (even though we know that this is the OPP data).

In [1]:
import seaflowpy as sfp

In [8]:
opp_files = sfp.find_evt_files("./tests/testcruise_opp")
opp_files

['./tests/testcruise_opp/2014_185/2014-07-04T00-00-02+00-00.opp.gz',
 './tests/testcruise_opp/2014_185/2014-07-04T00-03-02+00-00.opp.gz']

Let's read the EVT files into memory. In many cases we don't plan on using all 10 channels of particle data, so here we'll select only three of the possible ten channels (columns). This can significantly speed up data import when transforming (exponentiating log data) and lowers the memory footprint.

In [9]:
# The possible column names to choose from
sfp.EVT.all_columns

['time',
 'pulse_width',
 'D1',
 'D2',
 'fsc_small',
 'fsc_perp',
 'fsc_big',
 'pe',
 'chl_small',
 'chl_big']

In [10]:
opps = []
for f in opp_files:
    opps.append(sfp.EVT(f, transform=True, columns=["fsc_small", "chl_small", "pe"]))

Now we have some EVT objects in `opps`. We can print an one of the `EVT` objects in `opps` to get a quick summary of it's content.

In [11]:
print opps[0]

{
  "path": "./tests/testcruise_opp/2014_185/2014-07-04T00-00-02+00-00.opp.gz", 
  "file_id": "2014_185/2014-07-04T00-00-02+00-00", 
  "evt_count": 345, 
  "opp_count": 0, 
  "notch1": null, 
  "notch2": null, 
  "offset": null, 
  "origin": null, 
  "width": null, 
  "columns": [
    "fsc_small", 
    "pe", 
    "chl_small"
  ]
}


The underlying particle data can be accessed as a pandas DataFrame in the `evt` attribute.

In [12]:
opps[0].evt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 345 entries, 0 to 344
Data columns (total 3 columns):
fsc_small    345 non-null float64
pe           345 non-null float64
chl_small    345 non-null float64
dtypes: float64(3)
memory usage: 8.2 KB


In [13]:
opps[0].evt.head()

Unnamed: 0,fsc_small,pe,chl_small
0,5.910258,1.207901,7.622597
1,2.859557,1.269581,2.958641
2,1.569201,1.399968,3.347019
3,2.330406,1.275528,5.276099
4,2.386083,1.172052,7.77406


Let's assume this data set has already been analyzed and population classifications exist in a directory called `./tests/testcruise_opp`. We can add these per-particle classifications to our `EVT` objects.

In [15]:
vct_files = sfp.find_vct_files("./tests/testcruise_vct")
vct_files

['./tests/testcruise_vct/2014_185/2014-07-04T00-00-02+00-00.vct.gz',
 './tests/testcruise_vct/2014_185/2014-07-04T00-03-02+00-00.vct']

In [16]:
vcts = []
for f in vct_files:
    vcts.append(sfp.VCT(f))

sfp.combine_evts_vcts(opps, vcts)  # remember the opps contains EVT objects

In [22]:
# A new "pop" column has been added
opps[0].evt.head()

Unnamed: 0,fsc_small,pe,chl_small,pop
0,5.910258,1.207901,7.622597,prochloro
1,2.859557,1.269581,2.958641,prochloro
2,1.569201,1.399968,3.347019,prochloro
3,2.330406,1.275528,5.276099,prochloro
4,2.386083,1.172052,7.77406,prochloro


In [21]:
# Get per-population particle statistics
opps[0].evt.groupby(by=["pop"]).describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,chl_small,fsc_small,pe
pop,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
beads,count,36.0,36.0,36.0
beads,mean,61.036754,64.869435,690.968765
beads,std,20.508398,15.970808,186.070659
beads,min,25.312976,28.523328,340.508047
beads,25%,49.357957,59.772211,645.037259
beads,50%,60.999744,62.600107,691.476145
beads,75%,67.779372,66.897961,738.465171
beads,max,135.056675,127.487836,1269.157805
picoeuks,count,12.0,12.0,12.0
picoeuks,mean,323.516201,203.238659,1.780989
