# Wild Type Query Demo

This Demo shows how to read in some Hadoop sequence file and filter the pdb structures by it's wild type.

![RCSB PDB](https://cdn.rcsb.org/rcsb-pdb/v2/common/images/Logo_wwpdb.png)

## Imports

In [1]:
from pyspark import SparkConf, SparkContext
from mmtfPyspark.io import MmtfReader
from mmtfPyspark.webFilters import WildTypeQuery

## Configure Spark

In [2]:
conf = SparkConf().setMaster("local[*]") \
                  .setAppName("wildTypeQuery")
sc = SparkContext(conf = conf)

## Read in Haddop Sequence Files and filter by WildType

In [4]:
path = "../../resources/mmtf_reduced_sample/"

pdb = MmtfReader.read_sequence_file(path, sc) \
                .filter(WildTypeQuery(includeExpressionTags = True, percentSequenceCoverage = WildTypeQuery.SEQUENCE_COVERAGE_95))

## Count results and show top 5 structures

In [5]:
count = pdb.count()

print(f"Number of structures after filtering : {count}")

pdb.top(5)

Number of structures after filtering : 1441


[('1GBS', <mmtfPyspark.io.mmtfStructure.mmtfStructure at 0x7f222c72aa20>),
 ('1GAX', <mmtfPyspark.io.mmtfStructure.mmtfStructure at 0x7f2227ddd748>),
 ('1GAR', <mmtfPyspark.io.mmtfStructure.mmtfStructure at 0x7f2227ce5ac8>),
 ('1GAL', <mmtfPyspark.io.mmtfStructure.mmtfStructure at 0x7f2227cf5ac8>),
 ('1GAJ', <mmtfPyspark.io.mmtfStructure.mmtfStructure at 0x7f2227d01be0>)]

## Terminate Spark

In [6]:
sc.stop()