Skip to content

Adding new resources in pyMBE

Pablo M. Blanco edited this page May 7, 2024 · 3 revisions

pyMBE stores several resources from the literature, including:

  • Sets of $pK_a$ values
  • Sets of parameters of coarse-grained models of peptides

These resources must be formatted to be JSON files to improve both machine-readability and human-readability and they must include metadata of the publication from where the data was taken. For example:

{
  "metadata": {
    "summary": "pKa-values from CRC 72nd edition",
    "source": "Handbook of Chemistry and Physics, 72nd Edition, CRC Press, Boca Raton, FL, 1991.",
    "isbn": "0-8493-0565-9"
    "citekey": "lide1991a"
  },
  "data": {
    "D": {"pka_value": 3.65, "acidity": "acidic"},
    "E": {"pka_value": 4.25, "acidity": "acidic"}
  }
}

This format follows several guidelines of FAIR data[^fair-data-wikipedia],[^fair-data-wilkinson], in particular:

  • F2. Data are described with rich metadata
  • I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
  • I3. (Meta)data include qualified references to other (meta)data

Regarding accessible language for knowledge representation: the data is formatted in a JSON-like format, however a JSON parser would not be able to read the file. Comment lines are explicitly disallowed in the JSON standard. There are competing standards like JSON5 ("JSON5 Data Interchange Format") or Microsoft JSONC ("JSON with comments"), but those two use C-style comment lines like // or /* */ instead of the pound symbol. This format is also more human-readable, as it can be split over multiple lines, or opened in Firefox which has a built-in JSON viewer. Pretty-printing existing JSON files can be done automatically in Python:

import json
pka_set = json.loads(
  '{"D": {"pka_value": 3.65, "acidity": "acidic"}, "E": {"pka_value": 4.25, "acidity": "acidic"}}'
)
print(json.dumps(pka_set, indent=2))

Output:

{
  "D": {
    "pka_value": 3.65,
    "acidity": "acidic"
  },
  "E": {
    "pka_value": 4.25,
    "acidity": "acidic"
  }
}

References:

[^fair-data-wikipedia]: Concise summary of FAIR data on Wikipedia: https://en.wikipedia.org/wiki/FAIR_data

[^fair-data-wilkinson]: Wilkinson et al. 2016. "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data. 3(1): 160018. doi:10.1038/SDATA.2016.18