<a href="https://colab.research.google.com/github/glevans/PDB_Notebooks/blob/main/GemmiRecipes/Validating_mmCIF_format_with_Gemmi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Setting up Notebook

The below installs a software package called `gemmi` index from [PyPi](https://pypi.org/).

There are a versatile set of options available from the `gemmi`  Python library.

These options greatly aid when working with structural biology data.

In [1]:
!pip install gemmi
import gemmi

Collecting gemmi
  Downloading gemmi-0.7.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (2.3 kB)
Downloading gemmi-0.7.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gemmi
Successfully installed gemmi-0.7.3


The below installation of **Gemmi-program** makes more options available.

Some of these options are not available from the `gemmi` Python library.

This weblink indicates options available *via* commandline
[Gemmi-program](https://gemmi.readthedocs.io/en/latest/program.html)

In this jupyter notebook we will access these additional options by using `!gemmi`.

Examples using **Gemmi-program**:


```
# Get list of all tags for the types of data and metadata in a '.CIF' file
!gemmi tags structure.cif >> structure_tags.txt
```

In [2]:
!pip install gemmi-program

Collecting gemmi-program
  Downloading gemmi_program-0.7.3-py2.py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB)
Downloading gemmi_program-0.7.3-py2.py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gemmi-program
Successfully installed gemmi-program-0.7.3


In [3]:
# More useful software packages

import requests
import os

## 2. Retrieve useful files

In [4]:
def download_file(url, save_as):
    """
    Downloads the mmCIF dictionary file from the given URL and saves it locally.

    Parameters:
    - url (str): URL of the mmCIF dictionary file.
    - save_as (str): Local filename to save the downloaded file.

    Returns:
    - str: Path to the saved file.
    """
    response = requests.get(url)
    response.raise_for_status()  # Raise an error for bad status codes

    with open(save_as, 'wb') as f:
        f.write(response.content)

    print(f"File downloaded and saved as '{save_as}'")
    return


In [5]:
download_file("https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_ddl.dic", "mmcif_ddl.dic")
download_file("https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/", "mmcif_pdbx_v50.dic")
download_file("https://www.ebi.ac.uk/pdbe/entry-files/download/1xxx.cif", "1xxx.cif")

File downloaded and saved as 'mmcif_ddl.dic'
File downloaded and saved as 'mmcif_pdbx_v50.dic'
File downloaded and saved as '1xxx.cif'


In [6]:
# Define the original and new file names
old_filename = "mmcif_pdbx_v50.dic"
new_filename = "mmcif_pdbx.dic"

# Check if the target file already exists
if os.path.exists(new_filename):
    print(f"Target file '{new_filename}' already exists. Rename aborted.")
else:
    # Attempt to rename the file
    try:
        os.rename(old_filename, new_filename)
        print(f"Renamed '{old_filename}' to '{new_filename}' successfully.")
    except FileNotFoundError:
        print(f"Source file '{old_filename}' not found.")
    except Exception as e:
        print(f"Error renaming file: {e}")

Renamed 'mmcif_pdbx_v50.dic' to 'mmcif_pdbx.dic' successfully.


## 3. Helpful insight - Gemmi dictionary options

In [7]:
!gemmi validate -help

Usage: gemmi validate [options] FILE [...]

Options:
  -h, --help       Print usage and exit.
  -V, --version    Print version and exit.
  -v, --verbose    Verbose output.
  -q, --quiet      Show only errors.
  -f, --fast       Syntax-only check.
  -s, --stat       Show token statistics
  -r, --recursive  Recurse directories and process all CIF files.
  -d, --ddl=PATH   DDL for validation.

Optional checks (when using DDL2):
  -c, --context    Check _pdbx_{category|item}_context.type.
  --no-regex       Skip regex checking
  --no-mandatory   Skip checking if mandatory tags are present.
  --no-unique      Skip checking if category keys are unique.
  -p               Check if parent items are present.
  --depo           Deposition checks (_pdbx_item_range not _item_range, etc).

Validation specific to CCP4 monomer files:
  -m, --monomer    Run checks specific to monomer dictionary.
  --z-score=Z      Use Z for validating _chem_comp_atom.[xyz] (default: 2.0).
  --ccd=PATH       CCD file f

We can validate all mmCIF dictionaries that are available at:
[https://mmcif.wwpdb.org/](https://mmcif.wwpdb.org/)  

We validate mmCIF dictionaries against a reference -- the dictionary for mmCIF dictionaries:

`mmcif_ddl.dic`



---



[mmCIF dictionary for mmCIF dictionary homepage](https://mmcif.wwpdb.org/dictionaries/mmcif_ddl.dic/Index/)

**The below checks the dictionary for mmCIF dictionaries against itself.**


In [8]:
!gemmi validate -s -d mmcif_ddl.dic mmcif_ddl.dic

      1 block(s)
    289 frames
      5 non-loop items:  char:5  numb:0  '.':0  '?':0
      7 loops w/
             28 tags:  char:24  numb:2  '.':2  '?':0
           1216 values



## 4. Running Gemmi file validation against a dictionary

Relevant wwPDB dictionary file:
`mmcif_pdbx.dic`

Example to PDB file to test with:
`1xxx.cif`

[mmCIF dictionary homepage for wwPDB dictionary](https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/)

In [12]:
!gemmi validate -s --verbose 1xxx.cif -d mmcif_pdbx.dic

Reading 1xxx.cif...
      1 block(s)
      0 frames
    388 non-loop items:  char:100  numb:149  '.':2  '?':137
     48 loops w/
            531 tags:  char:237  numb:193  '.':10  '?':91
         461196 values

OK
