# Bloom Filtering

Here we will describe how to run the bloom filter.  First we'll start off by importing the necessary packages, including a few custom methods in `bloom` to perform the actual filter.

In [1]:
import os
import skbio
from biom import Table, load_table
from biom.util import biom_open
from bloom import remove_seqs, trim_seqs

Now we will define all of the input and output file paths.  All of the raw data and can be found in `data_dir`.
The resulting filtered data will be saved to `results_dir`.  The `seqs_file` points to the bloomed sequences that we wish to remove, and the `biom_file` points to the biom table built by deblur.  

In [2]:
data_dir = '../data/'
results_dir = '../results'
seqs_file = '30_seqs.fna'
biom_file = 'ercolini.feces.clean.withtax.biom'
filtered_file = 'filtered.fna'

seqs_file = os.path.join(data_dir, seqs_file)
biom_file = os.path.join(data_dir, biom_file)
filtered_file = os.path.join(results_dir, filtered_file)

And we'll unzip the zip file to extract the sequences.

In [3]:
!gunzip ../data/30_seqs.fna.gz

Now we will read in all of the scikit-bio sequences, in addition to the deblurred biom table.
Remember that all of the feature ids in the deblurred table are the actual 16S V4 sequences.
So we will be filtering out the rows in this table according to the sequence found in the 
`30_seqs.fna.gz` file.

In [4]:
seqs = skbio.read(seqs_file, format='fasta')
table = load_table(biom_file)

When removing sequences, we need to make sure that all of the sequences are of the same length.
So we'll just trim to the shortest sequence found in the biom table.

In [5]:
length = min(map(len, table.ids(axis='observation')))
seqs = trim_seqs(seqs, length=length)

Now we'll actually remove these bloom sequences as follows.

In [6]:
outtable = remove_seqs(table, seqs)

In [7]:
with biom_open(filtered_file, 'w') as f:
    outtable.to_hdf5(f, "filterbiomseqs")

This can be also run as the command `filter_seqs_from_biom.py`, which is also available upon install.  In conclusion, the bloom filtering is relatively straightforward to run.