# Rosetta Filter API

@Author: 吴炜坤 @email：weikun.wu@xtalpi.com

更多参考: https://new.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Filters/Filters-RosettaScripts



本章节将详细介绍Pyrosetta中一些常用的filter的使用，并给出示例。请读者根据自己需求，需要使用时进行查询即可。

In [1]:
from pyrosetta.rosetta.protocols.rosetta_scripts import *
from pyrosetta import *
init()

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=1549073982 seed_offset=0 real_seed=1549073982 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal mode, seed=1549073982 R

### 1. SimpleMetricFilter（简单介绍）
基于SimpleMetric计算的值判断是否保留构象的过滤器。

In [2]:
from pyrosetta.rosetta.protocols.simple_filters import SimpleMetricFilter
from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector
from pyrosetta.rosetta.core.simple_metrics.metrics import SasaMetric
from pyrosetta.rosetta.protocols.simple_filters import comparison_type

# 读取pose
pose = pose_from_pdb('./data/1ubq_clean.pdb')
print(pose.pdb_info())

[0mcore.chemical.GlobalResidueTypeSet: {0} [0mFinished initializing fa_standard residue type set.  Created 984 residue types
[0mcore.chemical.GlobalResidueTypeSet: {0} [0mTotal time to initialize 0.640261 seconds.
[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
PDB file name: ./data/1ubq_clean.pdb
 Pose Range  Chain    PDB Range  |   #Residues         #Atoms

0001 -- 0076    A 0001  -- 0076  |   0076 residues;    01234 atoms
                           TOTAL |   0076 residues;    01234 atoms



In [3]:
# 定义SimpleMetrics计算器
sasa_sel = ResidueIndexSelector('1-76')  # 比如计算1-76号残基每个残基的sasa值
sasa_metrics = SasaMetric(sasa_sel)

In [4]:
# 定义SimpleMetricFilter
sasa_filter = SimpleMetricFilter()
sasa_filter.set_simple_metric(sasa_metrics)  # 设定SimpleMetrics
sasa_filter.set_cutoff(500)  # 设定截断半径;
sasa_filter.set_comparison_type(comparison_type.gt) # gt 等于great than, filter的判断逻辑
sasa_filter.apply(pose)

[0mprotocols.simple_filters.SimpleMetricFilter: {0} [0m4738.4 gt 500 ?
[0mprotocols.simple_filters.SimpleMetricFilter: {0} [0mFilter passed: 1


True

**点评**:其实在python操作中，完全没必要去设定SimpleMetricFilter。直接根据SimpleMetric返回的内容进行判断True or False。这种python语言中是非常容易实现的。
此处仅做一个简单的案例，阐明SimpleMetricFilter的基本作用。

### 2. Basic Filters
此部分根据官方的Filter文档介绍，ResidueCount和NetCharge的用法。

#### 2.1 ResidueCount
根据残基类型、残基性质、Pack状态的计数/计频filter，可设置过滤阈值。当多个性质或类型被设置时，处理的逻辑是“或”。

In [5]:
from pyrosetta.rosetta.protocols.simple_filters import ResidueCountFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
res_count_filter = ResidueCountFilter()
res_count_filter.add_residue_property_by_name('POLAR')
res_count_filter.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB


41.0

In [6]:
from pyrosetta.rosetta.core.chemical import ResidueTypeSet
from pyrosetta.rosetta.core.chemical import ChemicalManager
from pyrosetta.rosetta.core.conformation import ResidueFactory

# 获取ResidueTypeSet
chm = ChemicalManager.get_instance()
residue_type_sets = chm.residue_type_set("fa_standard")

# 定义Filter
res_count_filter = ResidueCountFilter()
res_count_filter.add_residue_type_by_name(residue_type_sets, 'ALA')
res_count_filter.score(pose)

2.0

#### 2.2 NetCharge
基于蛋白序列总电荷值的过滤器，NetCharge设定LYS和ARG残基电荷值为+1，酸性残基ASP和GLU电荷值为-1。

In [7]:
from pyrosetta.rosetta.protocols.simple_filters import NetChargeFilter
netcharge = NetChargeFilter()
netcharge.apply(pose)

[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  LYS 6
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  LYS 11
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  GLU 16
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  GLU 18
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  ASP 21
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  GLU 24
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  LYS 27
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  LYS 29
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  ASP 32
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  LYS 33
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  GLU 34
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  -1  ASP 39
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  ARG 42
[0mprotocols.simple_filters.NetChargeFilter: {0} [0mAA:  +1  LYS 48
[0mprotocols.simple_

True

### 3. Energy/Score Filters

#### 3.1 ScoreTypeFilter
基于某特定打分项的Filter，如果没有指定打分的能量项，将默认对总能进行判断过滤。

In [8]:
from pyrosetta.rosetta.protocols.score_filters import ScoreTypeFilter
from pyrosetta.rosetta.core.scoring import ScoreType
from pyrosetta import create_score_function

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 创建打分函数
ref2015 = create_score_function('ref2015')

# 定义Filter
st_filter = ScoreTypeFilter()
st_filter.set_scorefxn(ref2015)
st_filter.set_score_type(ScoreType.fa_atr)  # 对范德华吸引势能量项打分，更多请参见ScoreType类型。
st_filter.set_threshold(-400)
st_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mcore.scoring.etable: {0} [0mStarting energy table calculation
[0mcore.scoring.etable: {0} [0msmooth_etable: changing atr/rep split to bottom of energy well
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing lj etables (maxdis = 6)
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing solvation etables (max_dis = 6)
[0mcore.scoring.etable: {0} [0mFinished calculating energy tables.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
[0mbasic.io

False

#### 3.2 TaskAwareScoreType
TaskAwareScoreType过滤器与ScoreTypeFilter最大的区别在于，只对那些TaskOperation中可被Repack的部分进行能量评估。

mode：可选"total", "average", or "individual"

此Filter可以对Interface上的残基进行特定的过滤，特别结合individual模式可以识别出异常的Residue或Rotamer

In [9]:
from pyrosetta.rosetta.protocols.simple_filters import TaskAwareScoreTypeFilter
from pyrosetta.rosetta.core.scoring import ScoreType
from pyrosetta import create_score_function
from pyrosetta.rosetta.core.pack.task import TaskFactory
from pyrosetta.rosetta.core.pack.task.operation import PreventRepackingRLT
from pyrosetta.rosetta.core.pack.task.operation import OperateOnResidueSubset
from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector
from pyrosetta.rosetta.core.pack.task import TaskFactory

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 选择氨基酸范围
select_pos = ResidueIndexSelector('2,3,4,5,6,7,8,9,10,11,12,13')
# 使用OperateOnResidueSubset生成TaskOperations
packing_taskop = OperateOnResidueSubset(PreventRepackingRLT(), select_pos, False)

# 创建打分函数
ref2015 = create_score_function('ref2015')

# 创建tf
tf = TaskFactory()
tf.push_back(packing_taskop)

# 定义Filter
tast_filter = TaskAwareScoreTypeFilter()
tast_filter.bb_bb(True)  # 考虑骨架的能量项
tast_filter.score_type(ScoreType.fa_atr)
tast_filter.scorefxn(ref2015)
tast_filter.task_factory(tf)
tast_filter.threshold(-1.0)
tast_filter.unbound(False)  # 必须手动设置为False
tast_filter.mode('individual')  # 单独过滤每一个打分项
tast_filter.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB


0.0

#### 3.3 BindingStrain
在结合态的单体的能量张力的Filter, 此Filter可以自动检测对称性。

ps: 看了下源码，这个Filter其实就是把两个刚体组分拉开，然后进行repack。然后计算bind状态下的能量-unbind状态下的能量差。

如果能量差的绝对值越大，说明bind状态以unbind状态下的能量差较大。

In [10]:
from pyrosetta.rosetta.protocols.protein_interface_design.filters import BindingStrainFilter
from pyrosetta.rosetta.core.pack.task.operation import PreventRepackingRLT
from pyrosetta.rosetta.core.select.residue_selector import ChainSelector
init('-ex1 -ex2 -corrections::beta_nov16')

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')

# 创建打分函数
beta_16 = create_score_function('beta_nov16')
receptor_chain = ChainSelector('A')

# 创建tf
no_repack_receptor_op = OperateOnResidueSubset(PreventRepackingRLT(), receptor_chain)
tf = TaskFactory()
tf.push_back(no_repack_receptor_op)

# 定义Filter
bsf = BindingStrainFilter()
bsf.scorefxn(beta_16)
bsf.threshold(0)
bsf.jump(1)   # 定义binder与receptor之间的jump值。
bsf.task_factory(tf)
bsf.compute(complex_pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2 -corrections::beta_nov16 -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=1905387258 seed_offset=0 real_seed=1905387258 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal m

-11.974517803396395

#### 3.4 ConstraintScore(有bug.不起效)
从ConstraintGenerators产生的一系列constraints计算的打分项的Filter

注意: 
1. Generators产生的约束必须通过AddConstraintsMover已经添加到Pose中
2. 对应的score term打分必须开启。


In [11]:
# 通过ConstraintGenerators产生约束
from pyrosetta.rosetta.protocols.simple_moves import VirtualRootMover

# load pose from 1ubq_clean.pdb
pose = pose_from_pdb("./data/1ubq_clean.pdb")

# Score reweight
score = create_score_function('ref2015')
score.set_weight(ScoreType.atom_pair_constraint, 1.0) # reweight score

# 定义Filter
from pyrosetta.rosetta.protocols.constraint_generator import TerminiConstraintGenerator
termin_cst = TerminiConstraintGenerator()
termin_cst.set_min_distance(8)
termin_cst.set_max_distance(20)
termin_cst.set_sd(1.0)
termin_cst.set_id('test_nc')

# add TerminiConstraintGenerator to pose;
from pyrosetta.rosetta.protocols.constraint_generator import AddConstraints
add_cst = AddConstraints()
add_cst.add_generator(termin_cst)
add_cst.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.constraint_generator.TerminiConstraintGenerator: {0} [0mConstraining atoms  atomno= 2 rsd= 1  and  atomno= 2 rsd= 76 , min_distance=8 max_distance=20
[0mprotocols.constraint_generator.AddConstraints: {0} [0mAdding 1 constraints generated by ConstraintGenerator named test_nc


In [12]:
from pyrosetta.rosetta.protocols.constraint_filters import ConstraintScoreFilter
from pyrosetta.rosetta.protocols.relax import FastRelax

# 破坏NC构象代码(转为线性肽):
for i in range(1, pose.total_residue()+1):
    pose.set_phi(i, -150)
    pose.set_psi(i, 150)

# 定义Filter
cst_score_filter = ConstraintScoreFilter()
cst_score_filter.set_user_defined_name('test_nc')
cst_score_filter.apply(pose)

[0mprotocols.constraint_filters.ConstraintScoreFilter: {0} [0m
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 atom_pair_constraint         1.000       0.000       0.000
 coordinate_constraint        1.000       0.000       0.000
 angle_constraint             1.000       0.000       0.000
 dihedral_constraint          1.000       0.000       0.000
 res_type_constraint          1.000       0.000       0.000
 backbone_stub_constraint     1.000       0.000       0.000
---------------------------------------------------
 Total weighted score:                        0.000


False

#### 3.5 ScorePoseSegmentFromResidueSelectorFilter
该filter可以根据用户指定的ResidueSelector的范围进行能量打分并过滤。比如可以针对特殊region或某条链进行打分。

in_context选项: 可以选择是否在打分前，将selection的区域提取到一个单独的Pose中。

In [13]:
from pyrosetta.rosetta.protocols.fold_from_loops.filters import ScorePoseSegmentFromResidueSelectorFilter
from pyrosetta.rosetta.core.select.residue_selector import ChainSelector

# 选择链
chain_A = ChainSelector('A')

# 定义Filter
score_from_selector_filter = ScorePoseSegmentFromResidueSelectorFilter()
score_from_selector_filter.residue_selector(chain_A)
score_from_selector_filter.in_context(True)
score_from_selector_filter.scorefxn(ref2015)
score_from_selector_filter.compute(pose)

[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mbeta_nov16.wts[0m


1709.9821163242043

#### 3.6 ReadPoseExtraScoreFilter
从Pose中的ExtraScore信息中提取score，并且设置是否进行过滤。

In [14]:
# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# set ExtraScore to pose:
from pyrosetta.rosetta.core.pose import setPoseExtraScore
setPoseExtraScore(pose, 'test_score', '100')

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB


下面来进行score提取并过滤。

In [15]:
# 定义Filter
from pyrosetta.rosetta.protocols.simple_filters import ReadPoseExtraScoreFilter
extra_score_filter = ReadPoseExtraScoreFilter()
extra_score_filter.set_term_name('test_score') # 要过滤的score term
extra_score_filter.set_threshold(300)  # returns false if the score is greater than this threshold
extra_score_filter.apply(pose)

True

#### 3.7 Delta（完全没必要使用!）
计算filter中的值与input结构能量差值，简单来说就是指定一个Filter后，比对native和当前pose的差异值。

（略），在python中直接比较native pose和pose的值并不困难。

### 4. Distance Filter

#### 4.1 ResidueDistance
计算两个残基之间距离，以每个残基的邻原子作为计算（通常为C-β原子），此Filter支持PDB编号或Pose编号。

In [16]:
from pyrosetta.rosetta.protocols.simple_filters import ResidueDistanceFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义filter
res1 = '5'
res2 = '10'
two_res_dis = ResidueDistanceFilter(res1, res2, distance_threshold=10)
two_res_dis.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.simple_filters.ResidueDistanceFilter: {0} [0mDistance between residues 5 and 10 is 10.8498


False

#### 4.2 AtomicContact
判定两个残基之间在cutoff distance范围内，是否存在原子相互作用？

In [17]:
from pyrosetta.rosetta.protocols.simple_filters import AtomicContactFilter
is_atom_between_res = AtomicContactFilter(res1=1, res2=5, distance=10.0, sidechain=True, backbone=True, protons=False)
is_atom_between_res.apply(pose)

True

#### 4.3 AtomicContactCount(xmlobject)
计算两个残基之间contact的数量，此filter运行设置taskoperation，此时filter只统计packable残基侧链上的碳原子contact数量。

这个filter有3种运行模式:
1. "All" mode: 计算所有侧链碳原子的contact的数量。（适合单链结构计算使用）
2. "jump" mode: 计算所有复合物界面原子contact的数量。（适合相互作用界面使用）
3. "chain" mode: 计算链之间的原子contact的数量。（适合两两链之间计算使用）

In [18]:
from pyrosetta.rosetta.protocols import rosetta_scripts 
# "All" mode
# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <AtomicContactCount name="all_atomic_contact" partition="none" distance="4.5"/>
</FILTERS>
''')
all_atomic_contact_filter = xml.get_filter('all_atomic_contact')
# all_atomic_contact_filter.compute(pose) # 输出太多，用户请自行运行

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mParsed script:
<ROSETTASCRIPTS>
	<FILTERS>
		<AtomicContactCount distance="4.5" name="all_atomic_contact" partition="none"/>
	</FILTERS>
	<PROTOCOLS/>
</ROSETTASCRIPTS>
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mbeta_nov16.wts[0m
[0mcore.scoring.etable: {0} [0mStarting energy table calculation
[0mcore.scorin

In [19]:
# 定义Filter
# "jump" mode
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <AtomicContactCount name="all_atomic_contact" partition="jump" distance="4.5" jump="1"/>
</FILTERS>
''')
all_atomic_contact_filter = xml.get_filter('all_atomic_contact')

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')
# all_atomic_contact_filter.compute(complex_pose) # 输出有点多，用户自行运行

[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mParsed script:
<ROSETTASCRIPTS>
	<FILTERS>
		<AtomicContactCount distance="4.5" jump="1" name="all_atomic_contact" partition="jump"/>
	</FILTERS>
	<PROTOCOLS/>
</ROSETTASCRIPTS>
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mbeta_nov16.wts[0m
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mDefined filter named "all_atomic_contact" of type AtomicContactCount
[0mprotocols.rosetta_scripts.ParsedProtocol: {0} [0mPars

In [20]:
# 定义Filter
# "chain" mode
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <AtomicContactCount name="all_atomic_contact" partition="jump" distance="4.5" jump="1"/>
</FILTERS>
''')
all_atomic_contact_filter = xml.get_filter('all_atomic_contact')

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')
# all_atomic_contact_filter.compute(complex_pose) 输出有点多，用户自行运行

[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mParsed script:
<ROSETTASCRIPTS>
	<FILTERS>
		<AtomicContactCount distance="4.5" jump="1" name="all_atomic_contact" partition="jump"/>
	</FILTERS>
	<PROTOCOLS/>
</ROSETTASCRIPTS>
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mbeta_nov16.wts[0m
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mDefined filter named "all_atomic_contact" of type AtomicContactCount
[0mprotocols.rosetta_scripts.ParsedProtocol: {0} [0mPars

#### 4.4 AtomicDistance
计算指定两个原子之间的距离是否在cutoff距离之内呢？

In [21]:
from pyrosetta.rosetta.protocols.simple_filters import AtomicDistanceFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 获取atom type基本信息:
atom_NZ_index = pose.residue(11).atom_index("NZ")
atom_type = pose.residue(11).atom_type(atom_NZ_index)
print(atom_type)

atom_NZ_index = pose.residue(34).atom_index("C")
atom_type = pose.residue(34).atom_type(atom_NZ_index)
print(atom_type)

# 定义Filter
# 原子的AtomType: atom_desig1, atom_desig2
# res1、res2: 残基的PDB名，
# distance_filter = AtomicDistanceFilter(res1=11, res2=34, atom_desig1='NZ', atom_desig2='OE1')
distance_filter = AtomicDistanceFilter(11, 34, 'Nlys', 'CObb', True, True, 3.0)
print(distance_filter.score(pose))
print(distance_filter.apply(pose))

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
Atom Type: Nlys
	element: N
	Lennard Jones: radius=1.80245 wdepth=0.161725
	Lazaridis Karplus: lambda=3.5 volume=16.514 dgfree=-20.8646
	properties: DONOR 
Extra Parameters: 1.75 1.55 0.79 1.55 1.44 1.5 1.55 -20 -10.695 -1.145 -20 -0.62 0 0 0 1.85 8.52379 0.025 0.01 0.005 -289.292 -0.697267 -1933.88 -1.56243 -93.2613 93.2593 0.00202205 715.165 74.6559 -74.6539 0.00268963 -1282.36 0.633 -0.367 0.926 -0.537 0.633 -0.367

Atom Type: CObb
	element: C
	Lennard Jones: radius=1.91666 wdepth=0.141799
	Lazaridis Karplus: lambda=3.5 volume=13.221 dgfree=3.10425
	properties: 
Extra Parameters: 2.14 1.7 0.72 1.7 1.89 1.76 1.65 0 0 0 1 0.51 0 0 0 2 8.81363 0.025 0.01 0.005 147.227 -0.811304 -8117.41 -2.17625 -85.8924 85.8904 0.00196363 900.14 168.481 -168.287 0.00113765 -6725.43 0 0 0 0 0 0

8.27940468874423
False


#### 4.5 TerminusDistance(xmlobject)
计算N端或C端的残基是否位于蛋白-蛋白相互作用界面上，使用一级序列上的距离进行衡量。这个filter的意义在于不希望flexible的N或C端有氨基酸在相互作用界面上。

In [22]:
from pyrosetta.rosetta.protocols import rosetta_scripts 

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')

# 定义Filter
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <TerminusDistance name="nc_filter" jump_number="1" distance="5"/>
</FILTERS>
''')

terminus_distance_filter = xml.get_filter('nc_filter')
terminus_distance_filter.apply(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_binder.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 148 186
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYD
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta

True

### 5. Sequence analysis

#### 5.1 LongestContinuousPolarSegment
侦查Pose一级序列上，极性氨基酸残基最大连续长度的Filter。

选项:
- exclude_chain_termini： false表示极性区域能够延展到N端或C端的将被计算；true表示不被计算（默认为true，仅内部的极性残基块被计算）
- count_gly_as_polar： true表示gly会被考虑为极性氨基酸，（默认为true）
- filter_out_high ：true表示高于cutoff设定值的极性残基长度的pose会被reject掉；false表示低于cutoff会被reject（默认为true）
- cutoff：最长极性残基长度的阈值，默认值为5
- residue_selector：氨基酸选择器，应预先定义(可选)

In [23]:
from pyrosetta.rosetta.protocols.simple_filters import LongestContinuousPolarSegmentFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
lps = LongestContinuousPolarSegmentFilter()
lps.set_exclude_chain_termini(True)
# lps.residue_selector() # 需要时使用
# lps.filter_out_high(False)  # 需要时使用
lps.set_count_gly_as_polar(False)
lps.set_cutoff(10)
lps.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB


5.0

#### 5.2 LongestContinuousApolarSegment
侦查Pose一级序列上，非极性氨基酸残基最大连续长度的Filter。

In [24]:
from pyrosetta.rosetta.protocols.simple_filters import LongestContinuousApolarSegmentFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
lps = LongestContinuousApolarSegmentFilter()
lps.set_exclude_chain_termini(True)
# lps.residue_selector() # 需要时使用
# lps.filter_out_high(False)  # 需要时使用
lps.set_count_gly_as_polar(False)
lps.set_cutoff(10)
lps.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB


5.0

#### 5.3 SequenceDistanceFilter
计算两个序列之间的hamming distance。

https://zh.wikipedia.org/wiki/%E6%B1%89%E6%98%8E%E8%B7%9D%E7%A6%BB

In [25]:
from pyrosetta.rosetta.protocols.simple_filters import SequenceDistance

# 突变序列
mut_seq = pose.sequence()
mut_seq.replace('Q','C')
print(mut_seq)

# 定义Filter
seq_dis_filter = SequenceDistance()
seq_dis_filter.target_seq(mut_seq)
seq_dis_filter.threshold(10)
seq_dis_filter.score(pose)

MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


73.0

### 6. Geometry

#### 6.1 Torsion
基于二面角角度的Filter：
- lower和upper:最低阈值、最高阈值
- resnum：pdb或rosetta numbering
- torsion："phi"、"psi"
- task_operations：输出的residue可以是在task_operations定义过的那些残基，所有可以design的残基将被输出。不能同时使用task_operations和resnnum两个选项！ 不设置torsion选项将会输出所有phi和psi；不设置resnum会report所有残基。

In [26]:
from pyrosetta.rosetta.protocols.protein_interface_design.filters import Torsion

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
torsion_filter = Torsion()
torsion_filter.lower(110)
torsion_filter.upper(180)
torsion_filter.resnum(4)
torsion_filter.torsion('phi')
torsion_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.protein_interface_design.filters.Torsion: {0} [0mResidue F4A	 phi -115.991


False

#### 6.2 HelixKink
Helix Kink是连续螺旋中的一个短暂的转角结构，这种结构会使螺旋发生大角度的“弯折”。此过滤器即判断螺旋的弯曲程度。

In [27]:
from pyrosetta.rosetta.protocols.fldsgn.filters import HelixKinkFilter
from pyrosetta.rosetta.protocols.moves import DsspMover

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# DsspMover确定二级结构
DsspMover().apply(pose)

# 定义Filter
hk_filter = HelixKinkFilter()
hk_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.DsspMover: {0} [0mLEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEELLLLL
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0m Pose does not have HBOND_SET. Checking hbonds will be skipped.
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0mHelix 1, res 23-34, is bended angle=19.7744
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0mis OK.
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0mHelix 2, res 38-40, is bended angle=0
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0mis OK.
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0mHelix 3, res 57-59, is bended angle=0
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0mis OK.
[0mprotocols.fldsgn.filters.HelixKinkFilter: {0} [0m Filter success !


True

#### 6.3 Geometry(xmlobject)
基于键的几何性质和omega角度的过滤器，判断蛋白骨架中是否有异常的二面角、键角。

In [28]:
from pyrosetta.rosetta.protocols import rosetta_scripts 

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <Geometry name="geometry_filter"
      omega="165"
      cart_bonded="20"
      count_bad_residues="true" />
</FILTERS>
''')

geometry_filter = xml.get_filter('geometry_filter')
geometry_filter.apply(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mParsed script:
<ROSETTASCRIPTS>
	<FILTERS>
		<Geometry cart_bonded="20" count_bad_residues="true" name="geometry_filter" omega="165"/>
	</FILTERS>
	<PROTOCOLS/>
</ROSETTASCRIPTS>
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mbeta_nov16.wts[0m
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mDefined filt

False

#### 6.4 PreProlineFilter
螺旋空间（Abego，ABEGO type A）在pro之前一般不倾向形成，但Rosetta可能捕捉不到该信息。在默认模式下，该filter会检查所有pro残基之前的所有残基，计算那些非B类型和非E类型（越少越好）。
1. use_statistical_potential：true表示使用基于拉式图构象空间bicublic spline fit会被用来评估该扭转角。false表示使用不合理扭转角的bin范围内的残基被计数

In [29]:
from pyrosetta.rosetta.protocols.denovo_design.filters import PreProlineFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
prepro_filter = PreProlineFilter()
prepro_filter.set_use_statistical_potential(False)
# prepro_filter.set_selector() # 设置残基选择器
prepro_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mbasic.io.database: {0} [0mDatabase file opened: protocol_data/denovo_design/preproline_normalized.gz
[0mprotocols.denovo_design.PreProlineFilter: {0} [0mX start: -180 delta: 11.25
[0mprotocols.denovo_design.PreProlineFilter: {0} [0mY start: -180 delta: 11.25
[0mprotocols.denovo_design.PreProlineFilter: {0} [0mSpline for preproline residues has been trained from protocol_data/denovo_design/preproline_normalized.gz.
[0mprotocols.denovo_design.PreProlineFilter: {0} [0mProlines in pose: 3 Bad pre-proline torsions: 0


True

#### 6.5 SecondaryStructure
基于二级结构的Filter。比较pose的二级结构与定义的二级结构间的差异。报告N_MATCHING / N_TOTAL的值：
- N_MATCHING 为选取部分的残基中有多少与定义理想的二级结构一致
- N_TOTAL 为选取的残基总数。

若set_use_dssp选项为false（默认为false），必须实现为pose计算二级结构信息，例如使用DsspMover；设定为true则会自动调用DSSP计算pose二级结构。

该Filter定义二级结构的输入有多种来源:
1. 用户定义的二级结构类型
2. blueprint文件(也是用户定义的二级结构类型)

In [30]:
from pyrosetta.rosetta.protocols.fldsgn.filters import SecondaryStructureFilter

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
# 使用定义的二级结构:
ss_contain_filter = SecondaryStructureFilter()
ss_contain_filter.filtered_ss('HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH')

# 使用blueprint:
# ss_contain_filter.set_blueprint($bp_file_name)

ss_contain_filter.set_use_dssp(True)
ss_contain_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.denovo_design.residue_selectors.PairedSheetResidueSelector: {0} [0mCould not determine strand pairings! You must specify them using the "sheet_topology" option or attach a StructureData object to the pose. No residues will be selected.
[0mprotocols.fldsgn.filters.SecondaryStructureFilter: {0} [0mSS filter fail: current/filtered = L/H at position 1
[0mprotocols.fldsgn.filters.SecondaryStructureFilter: {0} [0mSS filter fail: current/filtered = E/H at position 2
[0mprotocols.fldsgn.filters.SecondaryStructureFilter: {0} [0mSS filter fail: current/filtered = E/H at position 3
[0mprotocols.fldsgn.filters.SecondaryStructureFilter: {0} [0mSS filter fail: current/filtered = E/H at position 4
[0mprotocols.fldsgn.filters.SecondaryStructureFilter: {0} [0mSS filter fail: current/filtered = E/H at position 5
[0mprotocols.fldsgn.filters.SecondaryStructureFilter: 

False

#### 6.6 SecondaryStructureCoun(xmlobject)
基于单个二级结构元件计数的filter。计算给定类型的DSSP定义的二级结构的类型的数目。

- filter_helix, filter_sheet, filter_loop: true 分别表示在helix，sheet，loop二级结构上进行过滤
- filter_helix_sheet：filter on helix and sheet
- num_helix，num_sheet，num_loop，num_helix_sheet：需要多少数目的对应二级结构才能通过该filter
- min_helix_length，max_helix_length：最少和最大的helix氨基酸数目才会被当作一个helix，默认值4和9999
- min_sheet_length，max_sheet_length，min_loop_length，max_loop_length与上一条类似
- return_total：true表示将过滤的二级结构元件的总数目记录到score文件。默认为0.
- residue_selector：氨基酸选择器
- min_element_resis：一个二级结构原件的最少残基数目（作为计数基础），默认为1.

In [31]:
from pyrosetta.rosetta.protocols import rosetta_scripts 
from pyrosetta.rosetta.protocols.moves import DsspMover

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')
DsspMover().apply(pose)

# 定义Filter
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <SecondaryStructureCount name="ss_count_filter"
        filter_helix_sheet = "false"
        filter_helix="true" 
        filter_sheet="true" 
        filter_loop="true"
        num_helix="2" 
        num_sheet="2" 
        num_loop="2"
        min_helix_length="4" 
        max_helix_length="999"
        min_sheet_length="3" 
        max_sheet_length="999"
        min_loop_length="1" 
        max_loop_length="999"
        return_total="true"
        min_element_resis="1" />
</FILTERS>
''')

ss_count_filter = xml.get_filter('ss_count_filter')
ss_count_filter.compute(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.DsspMover: {0} [0mLEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEELLLLL
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mParsed script:
<ROSETTASCRIPTS>
	<FILTERS>
		<SecondaryStructureCount filter_helix="true" filter_helix_sheet="false" filter_loop="true" filter_sheet="true" max_helix_length="999" max_loop_length="999" max_sheet_length="999" min

14

#### 6.7 SecondaryStructureHasResidue(xmlobject)
计算二级结构元件部分位点是否含有N个或更多的某特定残基。在de novo设计中，用于检查各个二级结构是否存在至少一个疏水残基等用途。

In [32]:
from pyrosetta.rosetta.protocols import rosetta_scripts 
from pyrosetta.rosetta.protocols.moves import DsspMover

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 定义Filter
xml = rosetta_scripts.XmlObjects.create_from_string("""
<TASKOPERATIONS>
    <LayerDesign name="layer_core_boundary" layer="core_boundary" verbose="False" use_sidechain_neighbors="True" />
</TASKOPERATIONS>
<FILTERS>
    <SecondaryStructureHasResidue name="ss_contributes_core" 
        secstruct_fraction_threshold="1.0"
        res_check_task_operations="layer_core_boundary" 
        required_restypes="VILMFYW"
        nres_required_per_secstruct="1" 
        filter_helix="1" 
        filter_sheet="1"
        filter_loop="0" 
        min_helix_length="4"
        min_sheet_length="3"
        min_loop_length="1"/>
</FILTERS>""")

ss_contributes_core = xml.get_filter('ss_contributes_core')
# ss_contributes_core.compute(pose)  # 输出较多，读者请自行运行

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: {0} [0mParsed script:
<ROSETTASCRIPTS>
	<TASKOPERATIONS>
		<LayerDesign layer="core_boundary" name="layer_core_boundary" use_sidechain_neighbors="True" verbose="False"/>
	</TASKOPERATIONS>
	<FILTERS>
		<SecondaryStructureHasResidue filter_helix="1" filter_loop="0" filter_sheet="1" min_helix_length="4" min_loop_length="1" min_sheet_length="3" 

#### 6.8 LoopAnalyzerFilter
使用 LoopAnalyzerMover 计算与loop相关的一些metrics:
- loop骨架的omega是否异常
- loop骨架中是否有chainbreak
- loop骨架中的rama二面角能量项是否超出20个能量单位（非常不合理）

In [33]:
from pyrosetta.rosetta.protocols.loops.filters import LoopAnalyzerFilter
from pyrosetta.rosetta.protocols.loops import Loop, Loops

# 读取结构
pose = pose_from_pdb('./data/1ubq_clean.pdb')

# 设置Loop的区域:
# define a loop;
loop = Loop(44, 56, 45) # start_res, end_res, cut_res
loops = Loops()
loops.add_loop(loop)  # add to loops object;

loop_analyzer = LoopAnalyzerFilter()
loop_analyzer.set_loops(loops)
loop_analyzer.report_sm(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/1ubq_clean.pdb' automatically determined to be of type PDB
[0mprotocols.loops.filters.LoopAnalyzerFilter: {0} [0mrunning LoopAnalyzerFilter
[0mprotocols.analysis.LoopAnalyzerMover: {0} [0mrunning LoopAnalyzerMover
[0mprotocols.analysis.LoopAnalyzerMover: {0} [0mLoopAnalyzerMover will consider these positions (Rosetta numbering) - remember that it includes an extra residue on both sides of each loop, conditions permitting: 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
[0mprotocols.evaluation.ChiWellRmsdEvaluatorCreator: {0} [0mEvaluation Creator active ...


-32.30644512437591

#### 6.9 HelixPairing
用于判断Pose中每两个螺旋的Packing几何特征是否正常。该Filter首先计算一个pose的二级结构，然后利用二级结构去找到螺旋部分。

对于螺旋pairing，该filter提供三个参数，dist，cross和align，一个pose的这些参数低于设定阈值的话将被reject掉。

- dist: 两个螺旋中点的距离;
- cross: 两个螺旋之间的packing角度，该角度由helix vector间的夹角决定。helix vector由每段螺旋的C->N端几何中心xyz坐标相减计算得到。
- align: 当两股螺旋之间存在beta strands时，计算cross时，先将helix vector投影在beta strands上，再进行cross angle的计算。

最关键的参数设置是helix_pairings，通用匹配字符串为: "helix_id1-helix_id2.Type"：
- helix_id1/2: 指定螺旋的序号；
- Type：可选A或P，代表平行或反平行；

举例, 如果我希望检查第一个螺旋与第二个螺旋之间的packing，需要设置为:"1-2.A"，含义是:1-2螺旋之间的packing，并且1和2螺旋之间的排布是反平行的。

如果设置多段螺旋: "1-2.A;2-3.A;1-3.P", 不同的packing检查之间用分号隔开。

In [34]:
from pyrosetta.rosetta.protocols.fldsgn.filters import HelixPairingFilter

# 读取三螺旋拓扑结构
pose = pose_from_pdb('./data/denovo_helix.pdb')

# 定义Filter
hpair_filter = HelixPairingFilter()
hpair_filter.helix_pairings('1-2.A;2-3.A')
# hpair_filter.dist(15)
# hpair_filter.cross_angle(45)
# hpair_filter.align_angle(25)
# hpair_filter.bend_angle(20)
hpair_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_helix.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 6 44
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 6 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 44 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 6 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 44 CYD
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0mHelix 1is bent, angle=30.2685
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0m Filter condition:
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0m bend ( intra helix ) <= 20
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0m dist <= 15
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0m cross <= 45
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0m align <= 25
[0mprotocols.fldsgn.filters.HelixPairingFilter: {0} [0m#### H

False

#### 6.10 HSSTriplet
评估给定的helix-strand-strand三联（HSS triplet）结构。计算strand pair与helix之间的距离以及sheet平面和helix之间的角度。若距离计算值处于min_dist和max_dist选项设定值且角度计算值处于min_angle和max_angle选项设定值，则返回true。

关键参数（默认）:
- min_dist="(7.5 &Real)" 
- max_dist="(13.0 &Real)" 
- min_angle="(-12.5 &Real)" 
- max_angle="(90.0 &Real)"

关键的参数设置是add_hsstriplets，通用匹配字符串为: "helix_id1,strand_id1-strand_id2"：
helix_id: 指定螺旋的序号；
strand_id1/2：指定beta片的序号；

如果设置HSS pakcing: "1,2-3;2,3-4", 不同的packing检查之间用分号隔开。

In [35]:
from pyrosetta.rosetta.protocols.fldsgn.filters import HSSTripletFilter

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter
hss_filter = HSSTripletFilter()
hss_filter.add_hsstriplets('1,1-2')
# hss_filter.filter_max_angle(90)
# hss_filter.filter_min_angle(-12.5)
# hss_filter.filter_max_dist(13)
# hss_filter.filter_min_dist(7.5)
hss_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.fldsgn.filters.HSSTripletFilter: {0} [0mHelix:1 Strand1:1 Strand2:2 hsheet_dist=9.14642, hs_angle=-150.393, hs_dist1=9.10143, hs_dist2=9.1914
[0mprotocols.fldsgn.filters.HSSTripletFilter: {0} [0m Filter failed !


False

### 7. Packing/Connectivity

#### 7.1 AverageDegree
计算与一个残基选择部分的定义距离内残基的平均连通度。

当Rosetta用于复合物界面的设计中时，可能引起过度优化，看似“完美的”Rotamer的构象其实在单体中并不稳定。

使用此Filter能够区分从天然复合物中区分没有相互作用的design。

参数:
- threshold：至少需要存在多少个氨基酸在选择部分氨基酸的范围内(9.4)
- distance_threshold：定义计算距离的范围大小(8.0)
- task_operations定义针对哪些残基进行该计算

In [36]:
from pyrosetta.rosetta.protocols.protein_interface_design.filters import AverageDegreeFilter
from pyrosetta.rosetta.core.pack.task import TaskFactory
from pyrosetta.rosetta.protocols.simple_task_operations import RestrictToInterface

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')

# ppi task_factory
tf = TaskFactory()
tf.push_back(RestrictToInterface())

# 定义Filter
average_degree_filter = AverageDegreeFilter()
average_degree_filter.task_factory(tf)
average_degree_filter.distance_threshold(8)
average_degree_filter.threshold(9.4)
average_degree_filter.compute(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_binder.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 148 186
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYD
[0mprotocols.protein_interface_design.filters.AverageDegreeFilter: {0} [0mConnectivity of ALA19 is 11
[0mprotocols.protein_interface_design.filters.AverageDegreeFilter: {0} [0mConnectivity of LYS22 is 9
[0mprotocols.protein_interface_design.filters.AverageDegreeFilter: {0} [0mConnectivity of ILE23 is 11
[0mprotocols.protein_interface_design.filters.AverageDegreeFilter: {0} [0mConnectivity of ASP25 is 9
[0mprotocols.protein_interface_design.filters.AverageDegreeFilter: {0} [0mConnectivity of SER2

9.871794871794872

#### 7.2 PackStat
基于packing统计量的filter

参数:
- threshold：最低阈值
- chain：在计算packstate之前从哪一个jump开始分离复合物。0表示不分离
- repeats：重复计算次数

In [37]:
from pyrosetta.rosetta.protocols.simple_filters import PackStatFilter

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter
packsat_filter = PackStatFilter()
packsat_filter.repeats_ = 5
packsat_filter.chain_ = 0 # 不是复合物结构这个案例。如有需要按照jump num进行设置。
packstate = packsat_filter.compute(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.filters.PackStatFilter: {0} [0mrepeat 1: packscore: 0.737467
[0mprotocols.filters.PackStatFilter: {0} [0mrepeat 2: packscore: 0.777977
[0mprotocols.filters.PackStatFilter: {0} [0mrepeat 3: packscore: 0.714652
[0mprotocols.filters.PackStatFilter: {0} [0mrepeat 4: packscore: 0.76972
[0mprotocols.filters.PackStatFilter: {0} [0mrepeat 5: packscore: 0.790474


#### 7.3 Holes
寻找packing中的空腔。仍然是使用will sheffler的packing代码（packstat）计算蛋白内部空腔的体积大小。但是这一允许传入氨基酸选择器，仅使用部分氨基酸（蛋白部分进行计算）。但是值得注意是，这里的计算仍然是把pose作为整体进行计算，只是report得分的时候，只有在选择器中的原子才会被计算总和。（Holes打分是对单个原子/残基进行计算得分值得总和。）

结果如果正说明比天然蛋白结构(PDB库)更多空腔，负说明更少空腔。

**特别注意: 此处需要额外编译安装dalphaball**:
- 此处提供data文件夹中两种二进制的dalphaball（MacOS、Ubuntu）

In [38]:
from pyrosetta.rosetta.core.select.residue_selector import LayerSelector
from pyrosetta.rosetta.protocols.simple_filters import HolesFilter

# 初始化DAlphaBall
DAlphaBall_path = './data/DAlphaBall.macgcc'
init(f'-holes:dalphaball {DAlphaBall_path}')

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 选择内核层氨基酸
layer = LayerSelector()
layer.set_use_sc_neighbors(True)
layer.set_layers(1, 0, 0)  # pick core
layer.set_ball_radius(2.0)
layer.set_cutoffs(3.5, 1.5)  # >= 4 neighbor defined as core residuie. for miniprotein.

# 定义Filter
void_filter = HolesFilter()
void_filter.set_threshold(0)
void_filter.set_residue_selector(layer) # 设置selector
void_score = void_filter.compute(pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -holes:dalphaball ./data/DAlphaBall.macgcc -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=1495719393 seed_offset=0 real_seed=1495719393 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: 

#### 7.4 InterfaceHoles
在蛋白-蛋白接触界面上计算空腔，使用Will Sheffler's packstat的脚本。报告的打分是指bound和unbound构象的holes得分的差值。需要开启-holes:dalphaball选项。

In [39]:
from pyrosetta.rosetta.core.select.residue_selector import LayerSelector
from pyrosetta.rosetta.protocols.protein_interface_design.filters import InterfaceHolesFilter

# 初始化DAlphaBall
DAlphaBall_path = './data/DAlphaBall.macgcc'
init(f'-holes:dalphaball {DAlphaBall_path}')

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')

# 定义Filter
jump_num = 1
interface_void_filter = InterfaceHolesFilter(jump_num, 200)
interface_void_filter.score(complex_pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -holes:dalphaball ./data/DAlphaBall.macgcc -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=-1486178363 seed_offset=0 real_seed=-1486178363 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init

-0.5898811825867023

#### 7.5 ResInInterface
基于界面上残基总数的filter。

In [40]:
from pyrosetta.rosetta.protocols.simple_filters import ResiduesInInterfaceFilter

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')

# 定义Filter
residues_cutoff = 20
jump_num = 1
interface_resnum_filter = ResiduesInInterfaceFilter(residues_cutoff, jump_num)
interface_resnum_filter.score(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_binder.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 148 186
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYD
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m


39.0

#### 7.6 ShapeComplementarity
基于界面形状互补的filter。计算 Lawrence & Coleman形状互补系数（系数范围0.6-0.8），系数越大说明两个刚体的匹配程度越高。

In [41]:
from pyrosetta.rosetta.protocols.simple_filters import ShapeComplementarityFilter

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')

# 定义Filter
jump_num = 1
sc_filter = ShapeComplementarityFilter()
sc_filter.jump_id(jump_num) # 设置刚体的jump
sc_filter.multicomp(True)
sc_filter.filtered_sc(0.6)  # 阈值;
sc_filter.score(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_binder.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 148 186
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYD
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/sc/sc_radii.lib


0.7477233707904816

#### 7.7 SSShapeComplementarity
基于二级结构形状互补系数的filter，具体做法是将Pose中的每一段连续的二级结构分离，并与剩余部分计算ShapeComplementarity。

数值参考:
For antibody-antigen interfaces, a value of 0.65-0.67 is typical, while complementarity among intra-protein secondary structure elements is typically higher, on the order of 0.7-0.8.

注意: 目前不支持sheet结构。

In [42]:
from pyrosetta.rosetta.protocols.denovo_design.filters import SSShapeComplementarityFilter

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter
ss_helix = SSShapeComplementarityFilter()
ss_helix.compute(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.denovo_design.SSShapeComplementarityFilter: {0} [0mSUM=373.625; area=458.515; sc=0.814857; num_res=15
[0mprotocols.denovo_design.SSShapeComplementarityFilter: {0} [0mSUM=410.377; area=48.0544; sc=0.764812; num_res=1
[0mprotocols.denovo_design.SSShapeComplementarityFilter: {0} [0mSUM=429.074; area=26.9846; sc=0.692857; num_res=2
[0mprotocols.denovo_design.SSShapeComplementarityFilter: {0} [0mSUM=522.793; area=129.851; sc=0.721743; num_res=3
[0mprotocols.denovo_design.SSShapeComplementarityFilt

0.7873935628335386

### 8. Burial

#### 8.1 TotalSasa
基于pose总溶剂可及表面积的filter。高于设定阈值（threshold），返回true。

参数:
- upper_threshold：最大溶剂可及表面积
- hydrophobic：仅计算与疏水性残基相关的可及表面积
- polar：仅计算与极性残基相关的可及表面积
- task_operations：仅报告可以被pack的残基部分（有taskop指定）的SASA值。若没有指定，则会计算所有残基的SASA
- report_per_residue_sasa：报告单个残基的SASA

In [43]:
from pyrosetta.rosetta.protocols.simple_filters import TotalSasaFilter

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter（定义了疏水面积）
tsasa = TotalSasaFilter(lower_threshold=1200, hydrophobic=True, polar=False, upper_threshold=2000, report_per_residue_sasa=True)
tsasa.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.simple_filters.TotalSasaFilter: {0} [0mASP1 A HYDROPHOBIC SASA : 30.7547
[0mprotocols.simple_filters.TotalSasaFilter: {0} [0mSER2 A HYDROPHOBIC SASA : 4.61109
[0mprotocols.simple_filters.TotalSasaFilter: {0} [0mLEU3 A HYDROPHOBIC SASA : 56.5835
[0mprotocols.simple_filters.TotalSasaFilter: {0} [0mHIS4 A HYDROPHOBIC SASA : 41.266
[0mprotocols.simple_filters.TotalSasaFilter: {0} [0mILE5 A HYDROPHOBIC SASA : 1.53703
[0mprotocols.simple_filters.TotalSasaFilter: {0} [0mASN6 A HYDROPHOBIC SASA : 

1484.7443573729436

#### 8.2 InterfaceSasaFilter
基于蛋白-蛋白相互作用界面上溶剂可及表面积的filter。高于设定阈值，返回true。

参数:
- upper_threshold：最大溶剂可及表面积
- jump：用于计算SASA的界面jump
- sym_dof_names：对于存在对称定义的pose，进一步指定计算何界面的SASA

In [44]:
from pyrosetta.rosetta.protocols.simple_filters import InterfaceSasaFilter
from pyrosetta.rosetta.core.pose.metrics import CalculatorFactory, simple_calculators

# interface sasa 计算器初始化
calculator_factory = CalculatorFactory.Instance()
if not calculator_factory.check_calculator_exists("sasa"):
    sasa_calculator = simple_calculators.SasaCalculatorLegacy()
    calculator_factory.register_calculator("sasa", sasa_calculator)

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')    

# 定义Filter
jump_num = 1
cutoff = 1200
dsasa = InterfaceSasaFilter(cutoff)  # 1200A
dsasa.add_jump(jump_num)
dsasa.score(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_binder.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 148 186
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYD
[0mprotocols.moves.RigidBodyMover: {0} [0mTranslate: Jump (before): RT 0.475346 -0.778522 0.409816 -0.138636 0.393705 0.908722 -0.868807 -0.488773 0.0792154 17.3442 -1.09276 -32.123
[0mprotocols.moves.RigidBodyMover: {0} [0mTranslate: Jump (after):  RT 0.475346 -0.778522 0.409816 -0.138636 0.393705 0.908722 -0.868807 -0.488773 0.0792154 501.418 498.401 -750.578


1605.8146206639158

#### 8.3 ResidueBurial
简单来说就是计算目标残基的某相互作用距离（distance选项）下范围的其他残基数。若设定neighbors为1，既仅仅检查蛋白-蛋白相互作用界面附近是否存在残基。

参数:
- residue_fraction_buried：被taskop定义为designable的总残基的分数，默认为0.0001

In [45]:
from pyrosetta.rosetta.protocols.simple_filters import ResidueBurialFilter

# 读取复合物结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')    

# 定义Filter
rb_filter = ResidueBurialFilter()
# rb_filter.neighbors(1)  # 仅检查界面附近是否存在残基
rb_filter.residue('30') # 检查残基的pose id.
# rb_filter.residue_fraction_buried(1.0)  # 设置为1代表全部残基需要包埋，才能通过filter
rb_filter.apply(complex_pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_binder.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 148 186
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 148 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 186 CYD
[0mprotocols.simple_filters.ResidueBurialFilter: {0} [0mResidue: 30 is serialized to: 30
[0mprotocols.simple_filters.ResidueBurialFilter: {0} [0mchain span 1 142
[0mprotocols.simple_filters.ResidueBurialFilter: {0} [0mNumber of interface neighbors of residue THR30 is 2


True

#### 8.4 BuriedSurfaceArea
计算一个pose或selection的包埋的表面积。若低于设定阈值，返回false。该filter仅适用在L型或D型天然氨基酸，其他类型一律为0。

参数:
- select_only_FAMILYVW： true表示仅计算FAMILYVW这些残基；false表示所有的残基都会被计算。会与residue_selector进行取交集进行计算。
- filter_out_low ：默认为true，表示pose/selection计算值低于阈值将被reject掉
- cutoff_buried_surface_area ：默认为500
- atom_mode：默认为“all_atoms”，也可以是"hydrophobic_atoms"、"polar_atoms"

In [46]:
from pyrosetta.rosetta.core.pose.metrics import CalculatorFactory, simple_calculators
from pyrosetta.rosetta.protocols.simple_filters import BuriedSurfaceAreaFilter

# 初始化计算器.
calculator_factory = CalculatorFactory.Instance()
if not calculator_factory.check_calculator_exists("sasa"):
    sasa_calculator = simple_calculators.SasaCalculatorLegacy()
    calculator_factory.register_calculator("sasa", sasa_calculator)

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter
bsa = BuriedSurfaceAreaFilter()
bsa.set_filter_out_low(True)
bsa.set_cutoff_buried_surface_area(500)
bsa.set_atom_mode("hydrophobic_atoms")
bsa.set_select_only_FAMILYVW(True)
# bsa.residue_selector()  # 定义残基选择器
bsa.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mRES	BURIED AREA (A^2)
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mASP1	0.0
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mSER2	0.0
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mLEU3	126.417
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mHIS4	0.0
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mILE5	180.463
[0mprotocols.simple_filters.BuriedSurfaceAreaFilter: {0} [0mASN6	0.0


1410.7419778812703

#### 8.5 ExposedHydrophobics
对每个疏水残基进行的SASA计算(A, F, I, M, L, W, V, Y)。score返回溶剂暴露的疏水残基数目和暴露程度。对于每一个疏水残基，SAS高于设定cutoff值（默认20），则将 SASA - sasa_cutoff的计算值加和到score上。若最终score低于定义的threshold，返回true。

In [47]:
from pyrosetta.rosetta.protocols.denovo_design.filters import ExposedHydrophobicsFilter

# 读取拓扑结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter
expose_hdro_filter = ExposedHydrophobicsFilter()
expose_hdro_filter.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.DsspMover: {0} [0mLEEEEEELLEEEEEELLLHHHHHHHHHHHHHHHL
[0mprotocols.denovo_design.ExposedHydrophobicsFilter: {0} [0mExposedHydrophobics value=97.5573


97.55734626536554

### 9. Comparison

#### 9.1 Rmsd
计算当前pose与参照ref_pose的rmsd值的filter。
注意: pose和ref_pose必须长度一致。

In [48]:
from pyrosetta.rosetta.protocols.protein_interface_design.filters import RmsdFilter
from pyrosetta.rosetta.core.select.residue_selector import ChainSelector

# 读取PDB;
pose = pose_from_pdb('./data/pose.pdb')
ref_pose = pose_from_pdb('./data/ref_pose.pdb')

# 定义Filter
rmsd_filter = RmsdFilter()
rmsd_filter.reference_pose(ref_pose)
rmsd_filter.set_selection(ChainSelector(1)) # 定义残基选择器
rmsd_filter.superimpose(False)
rmsd_filter.compute(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mcore.import_pose.import_pose: {0} [0mFile './data/ref_pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 3 23
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 3 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 23 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 3 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 23 CYD
[0mcore.conformation.Conformation: {0} [0mFound

2.5254119698867075

#### 9.2 IRmsd(xmlobject)
计算interface上的RMSD，包含interface上的所有骨架原子。interface残基包括在两边界面8埃范围内的所有残基。

通常用于评估docking结果的两个构象之间的复合物结构差异。

In [49]:
from pyrosetta.rosetta.protocols import rosetta_scripts 
from pyrosetta import pose_from_pdb, init

init('-native ./data/denovo_binder.pdb')

# 读取结构
complex_pose = pose_from_pdb('./data/denovo_binder.pdb')  

# 定义Filter
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <IRmsd name="irmsd" jump="1" threshold="5" />
</FILTERS>
''')
irmsd_filter = xml.get_filter('irmsd')
irmsd_filter.compute(complex_pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -native ./data/denovo_binder.pdb -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=-1513252079 seed_offset=0 real_seed=-1513252079 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal m

4.731549552161596e-07

#### 9.3 RmsdFromResidueSelectorFilter
与Rmsd filter类似，只不过对参照pose和当前计算pose分别提供ResidueSelectors。

**特别注意: ref_pose和pose的长度可以不一样，但是selector选择的序列长度必须一致**

In [50]:
from pyrosetta.rosetta.protocols.fold_from_loops.filters import RmsdFromResidueSelectorFilter
from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector

# 读取PDB;
pose = pose_from_pdb('./data/pose.pdb')
ref_pose = pose_from_pdb('./data/ref_pose.pdb')

# selection
ri_sel = ResidueIndexSelector('1-10')

# 定义Filter
sel_rmsd_filter = RmsdFromResidueSelectorFilter()
sel_rmsd_filter.CA_only(True)
sel_rmsd_filter.reference_pose(ref_pose)
sel_rmsd_filter.reference_selector(ri_sel)
sel_rmsd_filter.query_selector(ri_sel)
sel_rmsd_filter.superimpose(True)
sel_rmsd_filter.compute(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mcore.import_pose.import_pose: {0} [0mFile './data/ref_pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 3 23
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 3 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 23 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 3 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 23 CYD
[0mcore.conformation.Conformation: {0} [0mFound

0.4574945569038391

#### 9.4 SequenceRecovery
对比参照pose，当前pose有多大的序列回复率的filter。用户通过task_operations提供可design的残基范围。command line可输入-in:file:native 将用户输入的pose文件作为参照，否则将使用起始pose作为参照。

参数:
- rate_threshold：最低通过阈值（至少需要到多大的回复率）
- mutation_threshold：突变的最大个数
- report_mutations：默认false，不会report。true表示不再使用rate作为filter，而使用突变个数。

In [51]:
from pyrosetta.rosetta.protocols.protein_interface_design.filters import SequenceRecoveryFilter
from pyrosetta.rosetta.core.pack.task import TaskFactory

# 读取PDB;
pose = pose_from_pdb('./data/pose.pdb')
ref_pose = pose_from_pdb('./data/ref_pose.pdb')

# 定义tf(可以是默认的)
tf = TaskFactory()

# 定义Filter
sr_filter = SequenceRecoveryFilter()
sr_filter.mutation_threshold(999)
sr_filter.reference_pose(ref_pose)
sr_filter.task_factory(tf) # 设置packable残基
sr_filter.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mcore.import_pose.import_pose: {0} [0mFile './data/ref_pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 3 23
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 3 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 23 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 3 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 23 CYD
[0mcore.conformation.Conformation: {0} [0mFound

0.34375

### 10. bonding

#### 10.1 ChainBreak(不work)
基于pose中链断裂数目的filter。这里的break指的是当某键长偏离平均键长（1.33）+/- tolerance (默认0.13为tolerance)

In [52]:
from pyrosetta.rosetta.protocols.simple_filters import ChainBreak

# 读取结构
pose = pose_from_pdb('./data/break_pose.pdb')

# 定义Filter
chain_break_filter = ChainBreak()
chain_break_filter.chain_num(1)  # 指定检查的链号
chain_break_filter.compute(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/break_pose.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.simple_filters.ChainBreak: {0} [0mWill check peptide bond lengths between 1 to 33
[0mprotocols.simple_filters.ChainBreak: {0} [0mbond length tolerance value is:0.13


0

#### 10.2 HbondsToResidue
基于某残基的氢键连接数目的filter。计算与某残基形成氢键的残基数目，且每一个氢键必须要超过一定的energy_cutoff值。对于骨架间的氢键，需要开放bb_bb选项。

In [53]:
from pyrosetta.rosetta.protocols.protein_interface_design.filters import HbondsToResidueFilter

# 读取结构
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter
hbond_filter = HbondsToResidueFilter()
hbond_filter.set_resnum(5)
hbond_filter.set_bb_bb(True) # 是否包含主链-主链氢键？
hbond_filter.set_sidechain(True) # 检查侧链氢键
hbond_filter.set_from_same_chain(True) # 统计同一条链残基上的氢键
hbond_filter.set_from_other_chains(False) # 统计其他链对resnum残基的氢键
# hbond_filter.set_selector() # 当设置时，只有选择的区域用于与resnum残基进行氢键检查。
hbond_filter.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/denovo_hee.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: {0} [0mFound disulfide between residues 18 20
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYS
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 18 CYD
[0mcore.conformation.Conformation: {0} [0mcurrent variant for 20 CYD
[0mprotocols.protein_interface_design.filters.HbondsToResidueFilter: {0} [0mNo scorefunction loaded.  Getting global default scorefunction.
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mprotocols.protein_interface_design.design_utils: {0} [0m         5        12         I         F     0.000    -2.211     0.000     0.000     4.712
         5        12         I         F     0.000    -2.211     0.000     0.000     4.712
[0mprotocols.protein_interface_design.filters.HbondsTo

True

#### 10.3 SimpleHbondsToAtom
基于某原子的氢键数目的filter。检查目标原子是否存在至少n_partners指定的氢键partner

In [1]:
from pyrosetta.rosetta.protocols import rosetta_scripts
from pyrosetta import init, pose_from_pdb

# 读取复合物结构
init()
pose = pose_from_pdb('./data/denovo_hee.pdb')

# 定义Filter, 26号谷氨酸残基的原子O
xml = rosetta_scripts.XmlObjects.create_from_string('''
<FILTERS>
    <SimpleHbondsToAtomFilter name="atom_hbonds_filter" n_partners="1" hb_e_cutoff="-0.5"
                              target_atom_name="O" res_num="26"/>

</FILTERS>
''')

atom_hbonds_filter = xml.get_filter('atom_hbonds_filter')
atom_hbonds_filter.apply(pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=817792676 seed_offset=0 real_seed=817792676 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal mode, seed=817792676 RG_t

True

#### 10.4 PeptideInternalHbondsFilter
在一个pose或selection中的氢键数目，exclusion_distance可以设定排除在一级序列一定范围内的残基氢键统计。

In [55]:
from pyrosetta.rosetta.core.select.residue_selector import ChainSelector
from pyrosetta.rosetta.protocols.cyclic_peptide import PeptideInternalHbondsFilter

# 读取结构
init()
pose = pose_from_pdb('./data/t6c.40.92.pdb')
pep_selector = ChainSelector(1)

# 定义Filter
total_hbonds = PeptideInternalHbondsFilter()
total_hbonds.set_hbond_cutoff(2)  # 3 hbond;
total_hbonds.set_exclusion_distance(1)
total_hbonds.set_residue_selector(pep_selector)
total_hbonds.score(pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=-1780545172 seed_offset=0 real_seed=-1780545172 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal mode, seed=-178054517

4.0

#### 10.5 BuriedUnsatHbonds
基于被包埋的不饱和氢键的最大数目的filter。

更多信息: https://new.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Filters/filter_pages/BuriedUnsatHbondsFilter

以下提供无bug版本:

In [56]:
from pyrosetta.rosetta.protocols.simple_filters import BuriedUnsatHbondFilter
# 初始化DAlphaBall
DAlphaBall_path = './data/DAlphaBall.macgcc'
init(f'-holes:dalphaball {DAlphaBall_path}')

# 读取结构
pose = pose_from_pdb('./data/t6c.40.92.pdb')

# 定义Filter
cutoff = 0
byhf = BuriedUnsatHbondFilter(cutoff)
byhf.set_report_all_heavy_atom_unsats(True)
byhf.set_residue_surface_cutoff(20)
byhf.set_ignore_surface_res(True)
byhf.set_dalphaball_sasa()
byhf.set_probe_radius(1.1)  # a probe_radius≈1.1 best correlates with new_buns_all_heavy
byhf.compute(pose)

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -holes:dalphaball ./data/DAlphaBall.macgcc -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=1814558810 seed_offset=0 real_seed=1814558810 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: 

0.0

#### 10.6 OversaturatedHbondAcceptorFilter
过饱和氢键受体的filter。超过一个供体的氢键受体属于此类。当一个氢键受体多于1个供体时，是物理不真实的。

重要参数:
- max_allowed_oversaturated: 最大允许过饱和数，默认为0。既通过的pose不含有任何过饱和氢键受体。
- set_consider_mainchain_only: 是否只考虑主链氢键？

In [57]:
from pyrosetta.rosetta.core.select.residue_selector import ChainSelector
from pyrosetta.rosetta.protocols.cyclic_peptide import OversaturatedHbondAcceptorFilter

# 读取结构
pose = pose_from_pdb('./data/t6c.40.92.pdb')
pep_selector = ChainSelector(1)

# 定义Filter
overhbond = OversaturatedHbondAcceptorFilter()
overhbond.set_consider_mainchain_only(False)
overhbond.set_max_allowed_oversaturated(0)
overhbond.set_acceptor_selector(pep_selector)
overhbond.set_donor_selector(pep_selector)
overhbond.score(pose)

[0mcore.import_pose.import_pose: {0} [0mFile './data/t6c.40.92.pdb' automatically determined to be of type PDB
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m


0.0