# Filter By Groups Demo

This example demonstrates how to filter structures with specified groups (residues). Groups are specified by their one, two or three letter codes e.g. "F", "MG", "ATP".

For full list, please refer to [PDB Chemical Component Dictionary](https://www.wwpdb.org/data/ccd)

## Imports

In [1]:
from pyspark import SparkConf, SparkContext
from mmtfPyspark.io import MmtfReader
from mmtfPyspark.filters import containsGroup

## Configure Spark

In [2]:
conf = SparkConf().setMaster("local[*]") \
                      .setAppName("FilterByGroupsDate")
sc = SparkContext(conf = conf)

## Read in MMTF Files

In [7]:
path = "full path to your MMTF file directory"

pdb = MmtfReader.readSequenceFile(path, sc)

## Filter by groups and count

In [8]:
count = pdb.filter(containsGroup("ATP","MG"))\
         .count()

print(f"Number of structure with ATP + MG : {count}")

Number of structure with ATP + MG : 595


## Terminate Spark 

In [9]:
sc.stop()