Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added peakfinding utility #8

Merged
merged 12 commits into from
May 5, 2022
Merged

added peakfinding utility #8

merged 12 commits into from
May 5, 2022

Conversation

kmdalton
Copy link
Member

@kmdalton kmdalton commented May 4, 2022

This adds a command line script that finds peaks in maps and matches them to atoms. I think this works fine but might benefit from some reviewers.

@kmdalton
Copy link
Member Author

kmdalton commented May 4, 2022

Here is the documentation for this program from ArgParse:

$ efxtools.findpeaks -h
usage: efxtools.findpeaks [-h] -f STRUCTURE_FACTOR_KEY -p PHASE_KEY [-o CSV_OUT] -z SIGMA_CUTOFF [-d] [--sample-rate SAMPLE_RATE] [--min-volume MIN_VOLUME] [--min-score MIN_SCORE]
                          [--min-peak MIN_PEAK] [--distance-cutoff DISTANCE_CUTOFF] [--use-long-names]
                          mtz_file pdb_file

Search an electron density maps for peaks in the vicinity of a structure.

positional arguments:
  mtz_file
  pdb_file

optional arguments:
  -h, --help            show this help message and exit
  -f STRUCTURE_FACTOR_KEY, --structure-factor-key STRUCTURE_FACTOR_KEY
                        column label of the structure factor you want to use.
  -p PHASE_KEY, --phase-key PHASE_KEY
                        column label of the phase you want to use.
  -o CSV_OUT, --csv-out CSV_OUT
                        output the report to a csv file
  -z SIGMA_CUTOFF, --sigma-cutoff SIGMA_CUTOFF
                        the z-score cutoff for voxels to be included in the peak search.
  -d, --difference-map  search for negative peaks as well as positive.
  --sample-rate SAMPLE_RATE
                        change fft oversampling from the default (3).
  --min-volume MIN_VOLUME
                        the minimum volume of peaks in voxels with default zero.
  --min-score MIN_SCORE
                        the minimum score of peaks with default zero.
  --min-peak MIN_PEAK   the minimum peak value with default zero.
  --distance-cutoff DISTANCE_CUTOFF
                        the distance cutoff of nearest neighbor search with default of 4 angstroms.
  --use-long-names      use more verbose column names in the peak report.

@kmdalton
Copy link
Member Author

kmdalton commented May 4, 2022

Here's an example of the output from search a time-resolved difference map for peaks int he vicinity of a photoactive yellow protein structure:

$ efxtools.findpeaks -d -f "DeltaF" -p "PhiFoff" -z 4. ~/opt/careless-examples/pyp/difference.mtz ~/opt/careless-examples
/pyp/reference_data/off.pdb
,chain,seqid,residue,atom,dist,peak,peakz,score,scorez,cenx,ceny,cenz,coordx,coordy,coordz
0,A,126,HC4,C1',1.2493168828617782,0.0013570099836215377,5.57138723710649,0.00048139700120323315,1.9764402188353645,50.99162808199085,0.14171747032899862,50.99162808199085,14.87,0.383,-18.885
0,A,126,HC4,O4',0.17460930882807968,-0.0014174100942909718,-5.81936802555327,-0.0008162189510563102,-3.3510968244874473,49.12977049929481,1.6215186512756787,49.12977049929481,17.939,-1.579,-17.197
1,A,69,CYS,SG,0.28868905398404665,-0.0014208063948899508,-5.833311995042771,-0.0004891447823862661,-2.0082497778773467,2.1446618366752164,11.735229937338932,2.1446618366752164,11.068,3.807,-18.673

@kmdalton
Copy link
Member Author

kmdalton commented May 4, 2022

here's an example of finding peaks in a sulfur SAD data set:

$ efxtools.findpeaks -d -f "ANOM" -p "PHANOM" -z 9. anomdiff.mtz 7L84.pdb 
,chain,seqid,residue,atom,dist,peak,peakz,score,scorez,cenx,ceny,cenz,coordx,coordy,coordz
0,A,30,CYS,SG,0.9028443597197562,0.021081814542412758,20.68339866625151,0.06569293045265796,64.45142885483993,69.35342890626659,40.82992499825315,69.35342890626659,38.145,10.11,16.56
1,A,6,CYS,SG,0.9259183053499811,0.015859980136156082,15.560249395727116,0.04290730550386347,42.09641934036914,9.083987661833884,57.736759956405635,9.083987661833884,30.985,18.615,25.406
2,A,105,MET,SD,0.02963401331055362,0.02007979154586792,19.70031245852985,0.03479196606565453,34.13444810797952,74.0172179014407,27.566140083276427,74.0172179014407,34.338,12.096,8.716
3,A,64,CYS,SG,0.12395125219700674,0.018734920769929886,18.38085779975302,0.028753104026761695,28.209711848192708,45.13910723597603,25.552650647223444,45.13910723597603,45.246,25.595,1.937
4,A,12,MET,SD,0.048727818838572926,0.01866207644343376,18.309390125921137,0.027840489464246637,27.314344384104427,59.895988209784264,45.775576736139655,59.895988209784264,33.562,19.462,14.187
5,A,80,CYS,SG,0.0978436754314195,0.01839478313922882,18.047148279490152,0.02519126636778784,24.71518778162443,46.93369958915004,26.89821535063044,46.93369958915004,46.838,26.878,1.837
6,A,94,CYS,SG,0.1269839606226892,0.015390253625810146,15.099400038661836,0.01836266616065573,18.015638265470745,38.86686352213625,23.35517604416916,38.86686352213625,38.949,23.267,1.651
7,A,76,CYS,SG,0.10574698245905027,0.014169059693813324,13.901284909928124,0.01565690939245131,15.3610163960691,38.280269770019906,23.439406911944587,38.280269770019906,38.286,23.543,-0.236

Copy link
Member

@JBGreisman JBGreisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me -- I think adding support for a weights column will be useful for how we commonly work with difference maps. Other than that, this seems good as is.

@JBGreisman JBGreisman added the enhancement Improvement to existing feature label May 4, 2022
@kmdalton
Copy link
Member Author

kmdalton commented May 4, 2022

@JBGreisman i added a parameter for weights:

  -w WEIGHT_KEY, --weight-key WEIGHT_KEY                                           
                        column label of any weights you wish to apply to the map.  

Copy link
Member

@JBGreisman JBGreisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome -- thanks. The weight key was the only missing feature that seemed useful to me.

@dennisbrookner
Copy link
Contributor

Great stuff! Just ran it successfully. A couple of user-friendliness things:

  • It seems that the peaks are output in whatever order Gemmi happened to notice them, which (unless I'm missing something) doesn't have a clear human-readable meaning. By default, the csv should probably be sorted, either by residue number or (my preference) z-score.
  • I think there should be a sensible default picked for z-score, so that doesn't have to always be specified? But happy to be overruled on that.
  • Unless there's a set of use cases that I'm forgetting about, I think that finding negative peaks as well as positive should be the default behavior. I would essentially always want that. Implementation would be simplest by making the -d flag a Boolean, I guess.

@dennisbrookner
Copy link
Contributor

I keep going back and forth on column names here. As much as the keystrokes annoy me, I do think requiring explicit specification of column names for amplitudes and phases is the right call.

I think where it becomes a user-friendliness thing is that this forces me to answer the question, "what are the columns called in this particular dataset?" In practice, I just do this with rs.mtzdump in the command line. I wonder if there's a way to streamline this process for the user? E.g., if the user forgets to include -f and/or -p, print out the names of the columns in the .mtz that are of the correct types?

@JBGreisman
Copy link
Member

  • I definitely support sorting the peaks by height as @dennisbrookner suggests
  • z-score of +/- 3.0 seems reasonable with -d, and 1.5 without. That seems to be pretty sensible for difference map vs. not.
  • I hesitate making the default assume a difference map though.

@dennisbrookner
Copy link
Contributor

Maybe I'm being sloppy with my nomenclature here; my implication was that one is far more likely to use this code on an FoFc map than a 2FoFc map. In the FoFc case, you certainly want negative peaks. And even if someone did run this code on a 2FoFc map and forgot to turn off the negative peak functionality, I don't think that would cause any issues - there are no negative peaks in a 2FoFc map.

@kmdalton
Copy link
Member Author

kmdalton commented May 5, 2022

I am perhaps being selfish here, but I will personally use this on anomalous difference maps much more frequently than isomorphous difference maps. In that case, any negative peaks are meaningless artifacts. I can make separate entry points

  • efxtools.find_peaks
  • efxtools.find_difference_peaks

It doesn't really save keystrokes but it would tab complete and you wouldn't have to remember the -d flag. Thoughts?

@JBGreisman
Copy link
Member

I support that. An alternative option to clarify behavior might be to only include --difference-map as the flag (no -d abbreviation). Given that --distance-cutoff is also an argument, that might clarify what -d refers to.

However, I do also support your proposal to split it into two commandline tools.

@dennisbrookner
Copy link
Contributor

Seconded (thirded?) that splitting into two separate tab-completable command line tools makes sense and will minimize confusion.

@kmdalton
Copy link
Member Author

kmdalton commented May 5, 2022

I

  1. refactored to use two entry points efxtools.find_peaks and efxtools.find_difference_peaks
  2. sort values by the absolute value of the peak z-score. admittedly this will do funny things if you supply a negative sigma cutoff. however, i cannot imagine why one would do that.
  3. fixed a bunch of parser arguments that were being ignored.

can you both have another look?

@kmdalton kmdalton requested a review from JBGreisman May 5, 2022 18:02
@dennisbrookner
Copy link
Contributor

Looks great, thanks Kevin!

Comment on lines 277 to 279

if __name__=="__main__":
main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth removing this since it no longer works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I guess it works because you have default values... feel free to leave it if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed it. we're providing a CLI entry point. no reason to leave it there imo.

@JBGreisman
Copy link
Member

Left a small comment that you can feel free to ignore, but otherwise this looks good to me

@kmdalton kmdalton merged commit 5411b2c into main May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants