<a href="https://colab.research.google.com/github/glevans/PDB_Notebooks/blob/main/GemmiRecipes/Validating_mmCIF_format_with_Gemmi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Setting up Notebook

The below installs a software package called `gemmi` index from [PyPi](https://pypi.org/).

There are a versatile set of options available from the `gemmi`  Python library.

These options greatly aid when working with structural biology data.

In [None]:
!pip install gemmi
import gemmi

The below installation of **Gemmi-program** makes more options available.

Some of these options are not available from the `gemmi` Python library.

This weblink indicates options available *via* commandline
[Gemmi-program](https://gemmi.readthedocs.io/en/latest/program.html)

In this jupyter notebook we will access these additional options by using `!gemmi`.

Examples using **Gemmi-program**:


```
# Get list of all tags for the types of data and metadata in a '.CIF' file
!gemmi tags structure.cif >> structure_tags.txt
```

In [None]:
!pip install gemmi-program

In [None]:
# More useful software packages

import requests
import urllib.request
import os

## 2. Retrieve useful files

In [None]:
def download_file(url, save_as):
    """
    Downloads the mmCIF dictionary file from the given URL and saves it locally.

    Parameters:
    - url (str): URL of the mmCIF dictionary file.
    - save_as (str): Local filename to save the downloaded file.

    Returns:
    - str: Path to the saved file.
    """
    response = requests.get(url)
    response.raise_for_status()  # Raise an error for bad status codes

    with open(save_as, 'wb') as f:
        f.write(response.content)

    print(f"File downloaded and saved as '{save_as}'")
    return


In [None]:
download_file("https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_ddl.dic", "mmcif_ddl.dic")
download_file("https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/", "mmcif_pdbx_v50.dic")
download_file("https://www.ebi.ac.uk/pdbe/entry-files/download/1xxx.cif", "1xxx.cif")

In [None]:
def download_github_file(raw_url: str, output_filename: str):
    """
    Downloads a file from a GitHub raw URL and saves it locally.

    Parameters:
    - raw_url (str): The direct raw URL to the file on GitHub.
    - output_filename (str): The name to save the downloaded file as.
    """
    try:
        urllib.request.urlretrieve(raw_url, output_filename)
        print(f"Downloaded and saved as '{output_filename}'.")
    except Exception as e:
        print(f"Download failed: {e}")

In [None]:
download_github_file(
    "https://raw.githubusercontent.com/glevans/PDB_Notebooks/main/GemmiRecipes/mmcif_pdbx.dic",
    "mmcif_pdbx.dic")

In [None]:
# Define the original and new file names
old_filename = "mmcif_pdbx_v50.dic"
new_filename = "mmcif_pdbx.dic"

# Check if the target file already exists
if os.path.exists(new_filename):
    print(f"Target file '{new_filename}' already exists. Rename aborted.")
else:
    # Attempt to rename the file
    try:
        os.rename(old_filename, new_filename)
        print(f"Renamed '{old_filename}' to '{new_filename}' successfully.")
    except FileNotFoundError:
        print(f"Source file '{old_filename}' not found.")
    except Exception as e:
        print(f"Error renaming file: {e}")

## 3. Helpful insight - Gemmi validation options

In [None]:
!gemmi validate -help

We can validate all mmCIF dictionaries that are available at:
[https://mmcif.wwpdb.org/](https://mmcif.wwpdb.org/)  

We validate mmCIF dictionaries against a reference -- the dictionary for mmCIF dictionaries:

`mmcif_ddl.dic`



---



[mmCIF dictionary homepage for mmCIF dictionary for dictionaries](https://mmcif.wwpdb.org/dictionaries/mmcif_ddl.dic/Index/)

**The below checks the dictionary for mmCIF dictionaries against itself.**


In [None]:
!gemmi validate -s -d mmcif_ddl.dic mmcif_investigation_v106.dic

Files such as the 3D coordinates of macromolecules available for download from [PDBe.org](https://www.ebi.ac.uk/pdbe/) (and other [wwPDB](https://www.wwpdb.org/) partners) should validate against the wwPDB mmCIF dictionary:

`mmcif_pdbx.dic`



---



[mmCIF dictionary homepage for wwPDB dictionary](https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/)



## 4. Running Gemmi file validation against a dictionary

Relevant wwPDB dictionary file:
`mmcif_pdbx.dic`

Example to PDB file to test with:
`1xxx.cif`



In [None]:
!gemmi validate -s --verbose 1xxx.cif -d mmcif_pdbx.dic