Macromolecule Census

Macromolecule Census is a set of tools for creating machine-learning datasets from macromolecular structure data, especially those made available by the protein data bank (PDB). The purpose of these tools is to account for the following:

Filter for high-quality (e.g. high resolution, low R-factor), low-redundancy (i.e. sequence identity cutoffs) structures.
Make robust training/validation/test splits by accounting for domain-level structural similarities.
Store atomic coordinates in a compact, portable, standard format (SQLite).

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
docs		docs
macromol_census		macromol_census
tests		tests
.gitignore		.gitignore
.gitlint		.gitlint
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.rst		README.rst
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

macromol_census

macromol_census

tests

tests

.gitignore

.gitignore

.gitlint

.gitlint

.pre-commit-config.yaml

.pre-commit-config.yaml

.readthedocs.yml

.readthedocs.yml

CHANGELOG.md

CHANGELOG.md

LICENSE.txt

LICENSE.txt

README.rst

README.rst

pyproject.toml

pyproject.toml

Repository files navigation

Macromolecule Census

About

Releases 2

Packages

Contributors 2

Languages

License

kalekundert/macromol_census

Folders and files

Latest commit

History

Repository files navigation

Macromolecule Census

About

Resources

License

Stars

Watchers

Forks

Languages