<a href="https://colab.research.google.com/github/glevans/PDB_Notebooks/blob/main/GemmiRecipes/Extracting_collection_stats_from_mtz_with_Gemmi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Setting up Notebook

The below installs a software package called `gemmi` index from [PyPi](https://pypi.org/).

There are a versatile set of options available from the `gemmi`  Python library.

These options greatly aid when working with structural biology data.

In [1]:
!pip install gemmi
import gemmi

Collecting gemmi
  Downloading gemmi-0.7.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (2.3 kB)
Downloading gemmi-0.7.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gemmi
Successfully installed gemmi-0.7.3


The below installation of **Gemmi-program** makes more options available.

Some of these options are not available from the `gemmi` Python library.

This weblink indicates options available *via* commandline
[Gemmi-program](https://gemmi.readthedocs.io/en/latest/program.html)

In this jupyter notebook we will access these additional options by using `!gemmi`.

Examples using **Gemmi-program**:


```
# Get list of all tags for the types of data and metadata in a '.CIF' file
!gemmi tags structure.cif >> structure_tags.txt
```

In [2]:
!pip install gemmi-program

Collecting gemmi-program
  Downloading gemmi_program-0.7.3-py2.py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB)
Downloading gemmi_program-0.7.3-py2.py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gemmi-program
Successfully installed gemmi-program-0.7.3


In [3]:
# More useful software packages

import requests
import urllib.request
import os

## 2. Retrieve useful files

In [4]:
def download_file(url, save_as):
    """
    Downloads the mmCIF dictionary file from the given URL and saves it locally.

    Parameters:
    - url (str): URL of the mmCIF dictionary file.
    - save_as (str): Local filename to save the downloaded file.

    Returns:
    - str: Path to the saved file.
    """
    response = requests.get(url)
    response.raise_for_status()  # Raise an error for bad status codes

    with open(save_as, 'wb') as f:
        f.write(response.content)

    print(f"File downloaded and saved as '{save_as}'")
    return


In [5]:
def download_github_file(raw_url: str, output_filename: str):
    """
    Downloads a file from a GitHub raw URL and saves it locally.

    Parameters:
    - raw_url (str): The direct raw URL to the file on GitHub.
    - output_filename (str): The name to save the downloaded file as.
    """
    try:
        urllib.request.urlretrieve(raw_url, output_filename)
        print(f"Downloaded and saved as '{output_filename}'.")
    except Exception as e:
        print(f"Download failed: {e}")

In [6]:
download_github_file(
    "https://raw.githubusercontent.com/glevans/PDB_Notebooks/main/GemmiRecipes/115347504_truncate-unique.mtz",
    "115347504_truncate-unique.mtz")

Download failed: HTTP Error 404: Not Found


In [7]:
# Define the original and new file names
old_filename1 = "115347504_truncate-unique.mtz"
new_filename1 = "truncate-unique.mtz"

# Check if the target file already exists
if os.path.exists(new_filename1):
    print(f"Target file '{new_filename1}' already exists. Rename aborted.")
else:
    # Attempt to rename the file
    try:
        os.rename(old_filename1, new_filename1)
        print(f"Renamed '{old_filename1}' to '{new_filename1}' successfully.")
    except FileNotFoundError:
        print(f"Source file '{old_filename1}' not found.")
    except Exception as e:
        print(f"Error renaming file: {e}")

Source file '115347504_truncate-unique.mtz' not found.


In [8]:
download_github_file(
    "https://raw.githubusercontent.com/glevans/PDB_Notebooks/main/GemmiRecipes/Example_Files/08112025_DLS/115347681_staraniso_alldata-unique.mtz",
    "115347681_staraniso_alldata-unique.mtz")

Downloaded and saved as '115347681_staraniso_alldata-unique.mtz'.


In [9]:
# Define the original and new file names
old_filename2 = "115347681_staraniso_alldata-unique.mtz"
new_filename2 = "staraniso_alldata-unique.mtz"

# Check if the target file already exists
if os.path.exists(new_filename2):
    print(f"Target file '{new_filename2}' already exists. Rename aborted.")
else:
    # Attempt to rename the file
    try:
        os.rename(old_filename2, new_filename2)
        print(f"Renamed '{old_filename2}' to '{new_filename2}' successfully.")
    except FileNotFoundError:
        print(f"Source file '{old_filename2}' not found.")
    except Exception as e:
        print(f"Error renaming file: {e}")

Renamed '115347681_staraniso_alldata-unique.mtz' to 'staraniso_alldata-unique.mtz' successfully.


## 3. Helpful insight - Gemmi extraction options

In [10]:
!gemmi mtz -A truncate-unique.mtz

ERROR: Failed to open truncate-unique.mtz: No such file or directory


In [11]:
!gemmi mtz -A truncate-unique.mtz > truncate-unique_data_collection.cif

ERROR: Failed to open truncate-unique.mtz: No such file or directory


In [12]:
!gemmi mtz -A staraniso_alldata-unique.mtz


#MTZAPPENDIX-START
#MTZAPPENDIX-ITEM CIF DatasetID=1
data_1
# data quality metrics from unmerged data as calculated by MRFANA
_symmetry.space_group_name_H-M 'P 41 21 2'


_cell.length_a 57.929
_cell.length_b 57.929
_cell.length_c 151.528
_cell.angle_alpha 90.000
_cell.angle_beta 90.000
_cell.angle_gamma 90.000


_reflns.pdbx_ordinal 1
_reflns.details
;
Some remarks regarding the mmCIF items written, the PDB Exchange Dictionary (PDBx/mmCIF) Version 5.0 supporting the data files in the current PDB archive (dictionary version 5.325, last updated 2020-04-13: http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/) and the actual quantities provided by MRFANA (https://github.com/githubgphl/MRFANA) from the autoPROC package (https://www.globalphasing.com/autoproc/). In general, the mmCIF categories here should provide items that are currently used in the PDB archive. If there are alternatives, the one recommended by the PDB developers has been selected.

The distinction between *_all

In [13]:
!gemmi mtz -A staraniso_alldata-unique.mtz > staraniso_alldata-unique.cif