-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added peakfinding utility #8
Merged
Merged
Changes from 1 commit
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
adc21f3
added peakfinding utility
75d07f6
fix typo
822189f
added option for map weights
b6799b6
add separate entry point for diff maps peaks
591265b
sort outpute by `peakz`
f2941e4
fix obsolete parser reference
2f16522
fix sort for empty results
2316281
add -d as alternate to --distance-cutoff
8bc77b4
move z-scores up in columns
322ac09
add default sigma cutoff 1.5/3.0 regular/diff maps
442a87e
fixes for broken parse args + sort by abs peakz
39a0a36
remove legacy __main__ clause
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,235 @@ | ||
import warnings | ||
import pandas as pd | ||
import numpy as np | ||
import gemmi | ||
|
||
|
||
def peak_report( | ||
structure, | ||
grid, | ||
sigma_cutoff, | ||
min_volume = 0., | ||
min_score = 0., | ||
min_peak = 0., | ||
distance_cutoff = 4., | ||
use_long_names = False, | ||
negate=False, | ||
): | ||
""" | ||
Build a report summarizing peaks in a map which are in the vicinity of atoms in the structure. | ||
|
||
For example, | ||
|
||
```python | ||
structure = gemmi.read_pdb(pdb_file_name) | ||
mtz = gemmi.read_mtz_file(mtz_file_name) | ||
grid = mtz.transform_f_phi_to_map( | ||
"ANOM", "PHANOM", sample_rate=3.0 | ||
) | ||
|
||
report = peak_report( | ||
structure, grid, | ||
sigma_cutoff=10., | ||
) | ||
``` | ||
will find peaks in an anomalous difference map above 10 sigma. | ||
|
||
For difference maps, it might make sense to use a pattern such as, | ||
|
||
```python | ||
report = peak_report(structure, grid, sigma_cutoff=3.5), | ||
report = pd.concat(( | ||
report, | ||
peak_report(structure, grid, sigma_cutoff=3.5, negate=True), | ||
)) | ||
``` | ||
which will find positive and negative difference map peaks above | ||
3.5 sigma and concatenate them into the same report. | ||
|
||
|
||
Parameters | ||
---------- | ||
structure : gemmi.Structure | ||
The structure which will be searched for atoms near the electron density peaks. | ||
grid : gemmi.FloatGrid | ||
This is an electron density map in the form of a gemmi.FloatGrid instance. | ||
sigma_cutoff : float | ||
The z-score cutoff at which pixels are thresholded for peak finding. | ||
min_volume : float (optional) | ||
The minimum volume of peaks which defaults to zero. | ||
min_score : float (optional) | ||
The minimum score of peaks which defaults to zero. See gemmi.find_blobs_by_flood_fill | ||
min_peak : float (optional) | ||
The minimum peak height of peaks which defaults to zero. See gemmi.find_blobs_by_flood_fill | ||
distance_cutoff : float (optional) | ||
This is the radius around atoms within which peaks will be kept. The default is 4 Angstroms. | ||
Making this number large may impact performance. | ||
use_log_names : bool (optional) | ||
Optionally use more descriptive column names for the report. These may contain characters that | ||
make them less pleasant to work with in pandas. The default is False. | ||
negate : bool (optional) | ||
Optionally find peaks in the negative electron density. This can be useful for difference maps. | ||
The default is False. | ||
|
||
Returns | ||
------- | ||
peak_report : pd.DataFrame | ||
A dataframe summarizing the locations of found peaks and how they correspond to atoms in the structure. | ||
""" | ||
|
||
if len(structure) > 1: | ||
warnings.warn( | ||
f"Multi-model PDBs are not supported. Using first model from file {pdb_file}.", | ||
UserWarning | ||
) | ||
|
||
cell = structure.cell | ||
model = structure[0] | ||
|
||
#Compute z-score cutoff | ||
mean,sigma = np.mean(grid),np.std(grid) | ||
cutoff = mean + sigma_cutoff * sigma | ||
|
||
#In gemmi peaks are blobs. So it goes. | ||
#This returns a list of `gemmi.Blob` objects | ||
blobs = gemmi.find_blobs_by_flood_fill( | ||
grid, | ||
cutoff=cutoff, | ||
min_volume=min_volume, | ||
min_score=min_score, | ||
min_peak=min_peak, | ||
negate=negate, | ||
) | ||
|
||
#This neighbor search object can find the atoms closest to query positions | ||
ns = gemmi.NeighborSearch(model, structure.cell, distance_cutoff).populate() | ||
|
||
peaks = [] | ||
for blob in blobs: | ||
#This is a list of weird pointer objects. It is safest to convert them `gemmi.CRA` objects (see below) | ||
marks = ns.find_atoms(blob.centroid) | ||
if len(marks) == 0: | ||
continue | ||
|
||
cra = dist = None | ||
for mark in marks: | ||
image_idx = mark.image_idx | ||
_cra = mark.to_cra(model) | ||
_dist = cell.find_nearest_pbc_image(blob.centroid, _cra.atom.pos, mark.image_idx).dist() | ||
if cra is None: | ||
dist = _dist | ||
cra = _cra | ||
elif _dist < dist: | ||
dist = _dist | ||
cra = _cra | ||
|
||
record = { | ||
"chain" : cra.chain.name, | ||
"seqid" : cra.residue.seqid.num, | ||
"residue" : cra.residue.name, | ||
"atom" : cra.atom.name, | ||
"dist" : dist, | ||
"peak" : blob.peak_value, | ||
"peakz" : (blob.peak_value-mean)/sigma, | ||
"score" : blob.score, | ||
"scorez" : (blob.score-mean)/sigma, | ||
"cenx" : blob.centroid.x, | ||
"ceny" : blob.centroid.y, | ||
"cenz" : blob.centroid.x, | ||
"coordx" : cra.atom.pos.x, | ||
"coordy" : cra.atom.pos.y, | ||
"coordz" : cra.atom.pos.z, | ||
} | ||
if negate: | ||
negative_keys = ['peak', 'peakz', 'score', 'scorez'] | ||
for k in negative_keys: | ||
record[k] = -record[k] | ||
peaks.append(record) | ||
|
||
long_names = { | ||
"chain" : "Chain", | ||
"seqid" : "SeqID", | ||
"residue" : "Residue", | ||
"name" : "Atom Name", | ||
"dist" : "Dist (Å)", | ||
"peak" : "Peak Value", | ||
"peakz" : "Peak Value (Z-score)", | ||
"score" : "Peak Score", | ||
"scorez" : "Peak Score (Z-score)", | ||
"cenx" : "Centroid (x)", | ||
"ceny" : "Centroid (y)", | ||
"cenz" : "Centroid (z)", | ||
"coordx" : "Coord (x)", | ||
"coordy" : "Coord (y)", | ||
"coordz" : "Coord (z)", | ||
} | ||
|
||
out = pd.DataFrame.from_records(peaks) | ||
|
||
if use_long_names: | ||
out.rename(columns = long_names) | ||
return out | ||
|
||
def parse_args(): | ||
from argparse import ArgumentParser | ||
|
||
program_description = """ | ||
Search an electron density maps for | ||
peaks in the vicinity of a structure. | ||
""" | ||
parser = ArgumentParser(description=program_description) | ||
|
||
# Required / Common options | ||
parser.add_argument("-f", "--structure-factor-key", type=str, | ||
required=True, help="column label of the structure factor you want to use.") | ||
parser.add_argument("-p", "--phase-key", type=str, | ||
required=True, help="column label of the phase you want to use.") | ||
parser.add_argument("mtz_file") | ||
parser.add_argument("pdb_file") | ||
parser.add_argument("-o", "--csv-out", type=str, default=None, help="output the report to a csv file") | ||
parser.add_argument("-z", "--sigma-cutoff", required=True, type=float, | ||
help="the z-score cutoff for voxels to be included in the peak search.") | ||
parser.add_argument("-d", "--difference-map", action='store_true', | ||
help="search for negative peaks as well as positive.") | ||
|
||
# More esoteric options | ||
parser.add_argument("--sample-rate", type=float, default=3., | ||
help="change fft oversampling from the default (3).") | ||
parser.add_argument("--min-volume", type=float, default=0., | ||
help="the minimum volume of peaks in voxels with default zero.") | ||
parser.add_argument("--min-score", type=float, default=0., | ||
help="the minimum score of peaks with default zero.") | ||
parser.add_argument("--min-peak", type=float, default=0., | ||
help="the minimum peak value with default zero.") | ||
parser.add_argument("--distance-cutoff", type=float, default=4., | ||
help="the distance cutoff of nearest neighbor search with default of 4 angstroms.") | ||
parser.add_argument("--use-long-names", action='store_true', | ||
help="use more verbose column names in the peak report.") | ||
parser = parser.parse_args() | ||
return parser | ||
|
||
def main(): | ||
parser = parse_args() | ||
structure = gemmi.read_pdb(parser.pdb_file) | ||
mtz = gemmi.read_mtz_file(parser.mtz_file) | ||
grid = mtz.transform_f_phi_to_map( | ||
parser.structure_factor_key, parser.phase_key, sample_rate=parser.sample_rate | ||
) | ||
|
||
out = peak_report( | ||
structure, grid, | ||
sigma_cutoff=parser.sigma_cutoff, | ||
) | ||
if parser.difference_map: | ||
out = pd.concat(( | ||
out, | ||
peak_report(structure, grid, sigma_cutoff=parser.sigma_cutoff, negate=True), | ||
)) | ||
|
||
if parser.csv_out is not None: | ||
out.to_csv(parser.csv_out) | ||
|
||
print(out.to_csv()) | ||
|
||
if __name__=="__main__": | ||
main() | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth removing this since it no longer works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I guess it works because you have default values... feel free to leave it if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed it. we're providing a CLI entry point. no reason to leave it there imo.