CIFFile

A comprehensive Python library for reading, creating, processing, validating, and writing Crystallographic Information Files (CIF), including Protein Data Bank Exchange macromolecular Crystallographic Information Files (PDBx/mmCIF) used by the Worldwide Protein Data Bank (wwPDB).

Features

📖 Read CIF Files: Parse CIF files from strings, file paths, or file-like objects
✏️ Create CIF Files: Build CIF files from tabular data (Polars, Pandas, dictionaries, etc.)
✅ Validate: Validate CIF files against DDL2 dictionaries
🧱 Cast: Automatically cast data values from strings into type-specific formats and back
💾 Write: Export CIF files with customizable formatting and styling
🔄 Convert: Transform between different data representations
🎯 Query: Access data blocks, save frames, categories, and items with intuitive indexing
🐼 DataFrames: Work seamlessly with Polars and Pandas DataFrames
🧪 Two Variants: Support for both CIF 1.1 and mmCIF formats

Installation

CIFFile requires Python 3.12 or later and can be installed from PyPI using pip:

pip install ciffile

Dependencies

Polars >= 1.0 (high-performance DataFrame library)
Pydantic >= 2.0 (data validation)
FileEx >= 0.2.10 (file handling utilities)
tqdm >= 4.0 (progress bars)

Quick Start

Reading CIF Files

import ciffile

# Read from file path
cif = ciffile.read("path/to/file.cif")

# Read from string content
cif_content = """
data_example
_item_name  'value'
loop_
_atom_site.id
_atom_site.symbol
1 C
2 N
"""
cif = ciffile.read(cif_content)

# Access data blocks
block = cif["example"]  # or cif[0]

# Access categories
atom_site = block["atom_site"]

# Access data as DataFrame
df = atom_site.df
print(df)

Creating CIF Files

import ciffile
import polars as pl

# Create from dictionary
data = {
	"block": ["my_data"] * 4,
	"category": ["atom_site"] * 2 + ["cell"] * 2,
	"keyword": ["id", "symbol", "length_a", "length_b"],
	"values": [["1", "2"], ["C", "N"], ["10.0"], ["20.0"]],
}
cif = ciffile.create(data, variant="mmcif")

# Write to string
print(cif)

# Write to file
with open("output.cif", "w") as f:
	cif.write(f.write)

Validating CIF Files

# Read dictionary file
dictionary_cif = ciffile.read("mmcif_pdbx_v50.dic")

# Convert to validator dictionary format
validator_dict = dictionary_cif.to_validator_dict(variant="ddl2")

# Create validator using the public API
validator = ciffile.validator(validator_dict)

# Validate and cast data (modifies cif in-place, returns error DataFrame)
errors = validator.validate(cif)

Usage Examples

Working with Data Blocks

# Iterate over blocks
for block in cif:
	print(f"Block: {block.code}")
	print(f"  Categories: {len(block)}")
	print(f"  Frames: {len(block.frames)}")

# Access multiple blocks
block1, block2 = cif[0, 1]

# Check if block exists
if "my_block" in cif:
	block = cif["my_block"]

Working with Categories

# Get category
category = block["atom_site"]

# Access as DataFrame
df = category.df

# Get item names
print(category.item_names)

# Iterate over items
for item in category:
	print(f"{item.name}: {item.value}")

# Set category keys (for sorting)
category.keys = ["id"]

Working with Save Frames

# Access save frames (for dictionary files)
frames = block.frames

# Get specific frame
frame = frames["atom_site.id"]

# Access frame categories
for category in frame:
	print(category.code)

Customizing Output Format

# Customize writing style
cif.write(
	writer=print,  # or file.write
	# String representations
	bool_true="yes",
	bool_false="no",
	null_str="?",
	empty_str=".",
	# Formatting
	list_style="horizontal",  # or "tabular", "vertical"
	table_style="tabular-horizontal",  # or "horizontal", "tabular-vertical", "vertical"
	space_items=3,
	min_space_columns=2,
	indent=0,
	indent_inner=2,
	delimiter_preference=("single", "double", "semicolon"),
)

Converting to Dictionary Format

# Convert DataFrame to nested dictionary
data_dict = category.to_id_dict(
	ids="id",  # or ["id1", "id2"] for multiple keys
	flat=False,  # nested structure
	single_row="value",  # return value directly for single rows
	multi_row="list",  # return list for multiple rows
)

Extracting Categories Across Blocks

# Extract specific categories from all blocks/frames
categories = cif.category("atom_site", "cell")

# Returns dict of CIFDataCategory objects
atom_site_cat = categories["atom_site"]
cell_cat = categories["cell"]

# DataFrames include block/frame columns for tracking
print(atom_site_cat.df)

Architecture

CIFFile provides a hierarchical structure for CIF data:

CIFFile
├── CIFBlock (data_*)
│   ├── CIFDataCategory
│   │   └── CIFDataItem
│   └── CIFBlockFrames (save frames)
│       └── CIFFrame (save_*)
│           └── CIFDataCategory
│               └── CIFDataItem

Each level supports:

Indexing: Access by code/name or integer index
Iteration: Loop over contained elements
Length: Count of direct children
Membership: Check existence with in
DataFrame representation: Access underlying data as Polars DataFrame

Supported CIF Variants

CIF 1.1

Standard crystallographic CIF format
Data names without category requirement
Suitable for small molecule structures

mmCIF (PDBx/mmCIF)

Macromolecular CIF format
Data names must have category.keyword format
Used by Protein Data Bank
Supports large biological macromolecules

File Types

CIFFile distinguishes between:

Data Files: Regular CIF files with data blocks containing scientific data
Dictionary Files: CIF files with save frames defining data item semantics

Advanced Features

Parser Options

cif = ciffile.read(
	"file.cif",
	variant="mmcif",  # or "cif1"
	encoding="utf-8",
	case_normalization="lower",  # or "upper", None
	raise_level=2,  # 0: all errors, 1: errors only, 2: fatal only
	col_name_block="block",  # customize column names
	col_name_frame="frame",
	col_name_cat="category",
	col_name_key="keyword",
	col_name_values="values",
)

Isolating File Parts

# Separate data and dictionary sections
parts = cif.part("data", "dict")
data_file = parts["data"]
dict_file = parts["dict"]

# Or just one part
data_only = cif.part("data")

Converting Back to Strings

After validating and working with typed data, you can convert columns back to CIF string format for writing:

# Convert typed columns back to strings (in-place modification)
validator.values_to_str(cif)

Documentation

For more detailed examples and tutorials, see:

QUICKSTART.ipynb - Interactive Jupyter notebook with examples
Inline documentation - All classes and functions have comprehensive docstrings
Official CIF Specification
mmCIF Documentation

Testing

Run the test suite with pytest:

cd test
pytest -v

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Official CIF Resources

mmCIF Resources

Related Python Packages

Acknowledgments

This library implements the CIF 1.1 specification as defined by the International Union of Crystallography (IUCr) and supports the PDBx/mmCIF format used by the Worldwide Protein Data Bank (wwPDB).

Note: Currently only CIF Version 1.1 is supported. CIF Version 2.0 support is planned for future releases.

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.github/workflows		.github/workflows
pkg		pkg
test		test
.gitignore		.gitignore
LICENSE		LICENSE
QUICKSTART.ipynb		QUICKSTART.ipynb
README.md		README.md
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

CIFFile

Features

Installation

Dependencies

Quick Start

Reading CIF Files

Creating CIF Files

Validating CIF Files

Usage Examples

Working with Data Blocks

Working with Categories

Working with Save Frames

Customizing Output Format

Converting to Dictionary Format

Extracting Categories Across Blocks

Architecture

Supported CIF Variants

CIF 1.1

mmCIF (PDBx/mmCIF)

File Types

Advanced Features

Parser Options

Isolating File Parts

Converting Back to Strings

Documentation

Testing

Contributing

License

References

Official CIF Resources

mmCIF Resources

Related Python Packages

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages