Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySCF QCSchema is not JSON serializable #1363

Open
S-Erik opened this issue May 14, 2024 · 0 comments
Open

PySCF QCSchema is not JSON serializable #1363

S-Erik opened this issue May 14, 2024 · 0 comments
Labels

Comments

@S-Erik
Copy link

S-Erik commented May 14, 2024

Environment

  • Qiskit Nature version: 0.7.2
  • Python version: 3.12.1
  • Operating system: Ubuntu 22.04.4 LTS
  • PySCF version: 2.5.0

What is happening?

The QCSchema object from a PySCFDriver object is not JSON serializable, i.e. the to_json() method used on a QCSchema object from a PySCFDriver object results in the error TypeError: Object of type int64 is not JSON serializable.

How can we reproduce the issue?

The following code results in the error TypeError: Object of type int64 is not JSON serializable:

from qiskit_nature.units import DistanceUnit
from qiskit_nature.second_q.drivers import PySCFDriver

driver = PySCFDriver(
    atom="H 0 0 0; H 0 0 0.735",
    basis="sto3g",
    charge=0,
    spin=0,
    unit=DistanceUnit.ANGSTROM,
)

problem = driver.run()

schema = driver.to_qcschema()

# Trying to convert QCSchema to JSON
schema.to_json()

Output:

Traceback (most recent call last):
  File "pyscf_json.py", line 17, in <module>
    schema.to_json()
  File "../qiskit-nature/qiskit_nature/second_q/formats/qcschema/qc_base.py", line 67, in to_json
    return json.dumps(self.to_dict(), indent=2)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/usr/lib/python3.12/json/encoder.py", line 202, in encode
    chunks = list(chunks)
             ^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.12/json/encoder.py", line 326, in _iterencode_list
    yield from chunks
  File "/usr/lib/python3.12/json/encoder.py", line 439, in _iterencode
    o = _default(o)
        ^^^^^^^^^^^
  File "/usr/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type int64 is not JSON serializable

What should happen?

I guess the PySCFDriver object should be JSON serializable and the code above should run without errors.

Any suggestions?

PySCF saves some properties as numpy scalars, e.g. the Mole.nao property which is extracted here in pyscfdriver.py and is of type numpy.int64. Theses numpy scalars are not further processed or transformed into native python types (such as int, float, etc.), e.g. in electronic_structure_driver.py or in pyscfdriver.py. This leads to the fact that some properties in the QCSchema object are not JSON serializable, since they are numpy scalars.
Also the PySCF property atom_mass_list is also wrongly converted to a python list here which results in a list of numpy scalars instead of a list of python types. Since atom_mass_list is a numpy ndarray the tolist method should be used in my opinion.

Therefore, I suggest the following changes to the files electronic_structure_driver.py and pyscfdriver.py:
pyscfdriver.py line 558:

- data.masses = list(self._mol.atom_mass_list())
+ data.masses = self._mol.atom_mass_list().tolist()

electronic_structure_driver.py line 227 onward:

-        properties = QCProperties()
-        properties.calcinfo_natom = len(data.symbols) if data.symbols is not None else None
-        properties.calcinfo_nbasis = data.nbasis
-        properties.calcinfo_nmo = data.nmo
-        properties.calcinfo_nalpha = data.nalpha
-        properties.calcinfo_nbeta = data.nbeta
-        properties.return_energy = data.e_ref
-        properties.nuclear_repulsion_energy = data.e_nuc
-        properties.nuclear_dipole_moment = data.dip_nuc
-        properties.scf_dipole_moment = data.dip_ref

-        def format_np_array(arr):
-            if isinstance(arr, Tensor):
-                # NOTE: this also deals with symmetry-reduced integral classes and ensures that
-                # they are not automatically unfolded to 1-fold symmetry
-                arr = arr.array
-            return arr.ravel().tolist()

+        def format_np_generic(value):
+            # Convert numpy generic types, like numpy.int64, to their Python equivalents
+            if isinstance(value, np.generic):
+                value = value.item()
+            return value

+        def format_np_array(arr):
+            if isinstance(arr, Tensor):
+                # NOTE: this also deals with symmetry-reduced integral classes and ensures that
+                # they are not automatically unfolded to 1-fold symmetry
+                arr = arr.array
+            return arr.ravel().tolist()

+        properties = QCProperties()
+        properties.calcinfo_natom = len(data.symbols) if data.symbols is not None else None
+        properties.calcinfo_nbasis = format_np_generic(data.nbasis)
+        properties.calcinfo_nmo = format_np_generic(data.nmo)
+        properties.calcinfo_nalpha = format_np_generic(data.nalpha)
+        properties.calcinfo_nbeta = format_np_generic(data.nbeta)
+        properties.return_energy = format_np_generic(data.e_ref)
+        properties.nuclear_repulsion_energy = format_np_generic(data.e_nuc)
+        properties.nuclear_dipole_moment = format_np_array(data.dip_nuc)
+        properties.scf_dipole_moment = format_np_array(data.dip_ref)

electronic_structure_driver.py line 335:

- return_result=data.e_ref,
+ return_result=format_np_generic(data.e_ref),

Further, it seems reasonable to me to add a unittest testing the to_json and to_hdf5 methods of the PySCFDriver. I am thinking about something like this in test_driver_pyscf.py:

    def test_to_json(self):
        """Check JSON-serializability of the driver"""
        driver = PySCFDriver(
            atom="H .0 .0 .0; H .0 .0 0.735",
            unit=DistanceUnit.ANGSTROM,
            charge=0,
            spin=0,
            basis="sto3g",
        )
        _driver_result = driver.run()
        schema = driver.to_qcschema()
        schema.to_json()

    def test_to_hdf5(self):
        """Check HDF5-serializability of the driver"""
        driver = PySCFDriver(
            atom="H .0 .0 .0; H .0 .0 0.735",
            unit=DistanceUnit.ANGSTROM,
            charge=0,
            spin=0,
            basis="sto3g",
        )
        _driver_result = driver.run()
        schema = driver.to_qcschema()

        with TemporaryDirectory() as tmp_dir:
            file_path = Path(tmp_dir) / "tmp.hdf5"
            with h5py.File(file_path, "w") as file:
                schema.to_hdf5(file)

Please tell me your opinion on the suggested changes and I can prepare a pull request to resolve this issue, if you wish.

@S-Erik S-Erik added the bug label May 14, 2024
@S-Erik S-Erik changed the title PySCF is not JSON serializable PySCF QCSchema is not JSON serializable May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant