-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1015 binary cif support #1040
1015 binary cif support #1040
Conversation
d93a5e3
to
2c96719
Compare
Thanks again - will have a proper look asap, just to note some of our examples already failing due to this, e.g. EDIT: To clarify, failing because they can't grab the mmtf file, not because of changes in this PR! |
I did not implement the alphaCarbondsOnly flag, to this might be it. |
Okay - looks good, I've gone through all the parser examples and they all work except for:
I can dig some more into these but thought I'd flag first as it might be something simple when you're familiar with the code. |
Apologies, first one that wasn't working is parser/map (edited above, previously said ccp4, but adding this comment in case you're following by email too) |
Oh, and you should make yourself the authore for pdbe-datasource.ts! |
Good catch @fredludlow ! The issue with the 4UJD.cif.gz file was due to how compression is handled. The streamer returns an ArrayBuffer, which needs to be converted to a string to be processed by the CIF parsing library. The second one is a bug with handling altlocs (they were not processed correctly in fact) Both were pretty major issues. Maybe we should add more tests to better cover this code? |
Can confirm both those are now working for me. I've got a local PDB mirror and am running a script to try NGL.autoLoad on every mmCIF formatted entry - if this works there may still be other classes of bug, but it would definitely be reassuring. Happy for you to merge this in the meantime (and thank you again!) |
Hmm, 7a4p is causing issues |
That's a tricky file: one of the chain (identifier |
7a4p was the ony one that threw an error / rejected the promise. There were approx 250 entries where the spacegroup was either undefined or another one that isn't recognized ( For reference, script is here: https://gist.github.com/fredludlow/e0a2a4af29d902350c872162315538d1 |
Thanks @fredludlow that's so useful! |
CIF reader from Mol* is wrapped in async calls which require to make the _parse function async in the binary cif parser.
mmCif files use the struct_conf table to define the alpha helices whereas the sheets are defined in the struct_sheet_range table. Alphafold modelCif files contain every DSSP assignation in the struct_conf table using DSSP mmcif codes (such as `TURN_TY1_P1`)
This table is available in "Updated mmcif files" distributed by PDBe. In this commit, the list of bonds defined for each residue is stored in a new dictionary in a ChemCompMap object. Bonds in the mmcif file are defined using atom names (e.g. CA), which need to be converted in indices in the atomList from a given residue type. The atomList contain list of indices of AtomTypes from the structure AtomMap. Given an atom index from the atomAlist, the AtomMap.get(idx) method returns an AtomType object that contains the atomname property.
Previously, the default was to use mmtf format server by RCSB. This format was containing the full connectivity, which is currently missing from bcif files distributed by RCSB. PDBe distributes "Updated" mmcif files, containing this data. The same content is available in their bcif files.
Jest cannot import code from ES modules which is the case of the modules from MolStar (not bundled). Jest code fails with some indications about tweaking jest config using the transformIgnorePattern property. After much trials and research I was not able to make it work and decided to switch the test runner to vitest, which solved the issue.
valueKind has 3 values: 0 if present, 1 if not present ('.' in Cif), 2 if unknown ('?' in Cif)
When splitting `(1,2,6,10,23,24)` against `(`, the first item is an empty string. The fix consists in filtering-out falsy values from the split array.
The CIF library returns `0` when a string column is converted to an int array. The fix here is to map the string array from the column using the String.charCodeAt() function.
In that case the `chainIndexDict` does not have the corresponding key. This fix still creates the corresponding `Entity` but with an empty chain list.
63dbf30
to
cc34e1d
Compare
I'm not sure if the data source is set up properly for this. The following doesn't work for me (on a development server): new Stage(...).loadFile('rscb://5z6y'); This tries to access |
The protocol part (http:// vs https://) comes from the current location (i.e. the server that serves the current page, her your development server). We should make this always https then (I think it's already the case for PDBe). |
Oh, I see. I think that would be good. Regarding the compression: I was referring to the RCSB model API, which offers several endpoints to make requests to. I was wondering if instead of using links like |
@panda-byte #1043 has been merged and published as v2.3.1 with the http/https fix for rcsb |
This PR adds support for Binary Cif files parsing and changes the RCSB data source provider to use this new format instead of the deprecated MMTF format.
Changes made
pdbe
as a new datasource. Data can be loaded from PDBe using the pseudo protocolpdbe://4hhb
which downloads a binary cif (uncompressed) with the full connectivityjest
library as a test runner has been replaced withvitest
. This was due to a bug with Jest when parsing the cif-parser file. The import from molstar is not an import of bundled code, but an import of ES module which is not suported natively by node and requires a transformer. But Jest do not transform files from node_modules. Albeit trying various approaches, I could not make it work and resolved to using vitest which worked out of the box.Fixes
Comments
Small benchmark, using the pdb 5z6y (relatively small strucutre GFP):
(*) RCSB response is gzipped
Despite the claim that bcif achieves better compression, it seems that there are still some caveats and generally speaking forcing the transition from bcif to mmtf creates regressions (also some improvements for specific use cases where the extra data content is relevant)