# Secondary Structure Element Demo

This demo shows how to get a dataset of secondary structure elements

## Imports

In [1]:
from pyspark.sql import SparkSession
from mmtfPyspark.io import mmtfReader
from mmtfPyspark.mappers import StructureToPolymerChains
from mmtfPyspark.filters import ContainsLProteinChain
from mmtfPyspark.datasets import secondaryStructureElementExtractor

#### Configure Spark 

In [2]:
spark = SparkSession.builder.appName("SecondaryStructureElementDemo").getOrCreate()

2022-01-23 16:31:56 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


2022-01-23 16:32:02 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


## Download protein (1STP)

### Note: Need to use SparkContext as parameter to download Mmtf files

In [3]:
pdb = mmtfReader.download_mmtf_files(['1STP']).cache()

## Map protein to polymer chains and apply LProteinChain filter

In [4]:
pdb = pdb.flatMap(StructureToPolymerChains()) \
         .filter(ContainsLProteinChain())

## Extract secondary structure element 'E'

In [5]:
ds = secondaryStructureElementExtractor.get_dataset(pdb, 'E', 6)

ds.show(50, False)



+-------------+-----+
|sequence     |label|
+-------------+-----+
|TFIVTA       |E    |
|ALTGTYE      |E    |
|VLTGRY       |E    |
|TALGWTVAWK   |E    |
|NAHSATTWSGQYV|E    |
|INTQWLLTS    |E    |
|TLVGHDTFT    |E    |
+-------------+-----+



                                                                                

## Terminate Spark

In [6]:
spark.stop()