# Simple Query Demo

![pdbj](https://pdbj.org/content/default.svg)

PDBj Mine 2 RDB keyword search query and MMTF filtering using pdbid.

[PDBj Mine Search Website](https://pdbj.org/mine)

## Imports

In [1]:
from pyspark import SparkConf, SparkContext
from mmtfPyspark.webFilters import PdbjMine
from mmtfPyspark.datasets import PdbjMineService
from mmtfPyspark.io import mmtfReader

## Configure Spark Context

In [2]:
conf = SparkConf().setMaster("local[*]") \
                  .setAppName("SimpleQuerySearch")
    
sc = SparkContext(conf = conf)

## Read in MMTF files from local directory

In [3]:
path = "../../resources/mmtf_full_sample/"

pdb = mmtfReader.read_sequence_file(path, sc)

## Apply a SQL search on PDBj using a filter

Very simple query; this gets the pdbids for all entries modified since 2016-06-28 with a resulution better than 1.5 A

In [4]:
sql = "select pdbid from brief_summary where modification_date >= '2016-06-28' and resolution < 1.5"

search = PdbjMine(sql)
count = pdb.filter(search).keys().count()
print(f"Number of entries using sql to filter: {count}")

Number of entries using sql to filter: 5


## Apply a SQL search on PDBj and get a dataset

In [5]:
dataset = PdbjMineService.getDataset(sql)
dataset.show(5)
search = PdbjMine(dataset = dataset)
count = pdb.filter(search).keys().count()
print(f"Number of entries using dataset to filter: {count}")


+-----+
|pdbid|
+-----+
| 2yke|
| 2yny|
| 2ynz|
| 2z12|
| 2z18|
+-----+
only showing top 5 rows

Number of entries using dataset to filter: 5


## Terminate Spark Context

In [6]:
sc.stop()