Merge pull request #6 from deepmodeling/devel

Devel update
iProzd · Apr 23, 2021 · 7f45c18 · 7f45c18
2 parents 7374920 + 7defc15
commit 7f45c18
Show file tree

Hide file tree

Showing 76 changed files with 4,336 additions and 653 deletions.
diff --git a/README.md b/README.md
@@ -5,48 +5,38 @@
 
 # Table of contents
 - [About DeePMD-kit](#about-deepmd-kit)
+ 	- [Highlights in v2.0](#highlights-in-deepmd-kit-v2.0)
  	- [Highlighted features](#highlighted-features)
- 	- [Code structure](#code-structure)
  	- [License and credits](#license-and-credits)
  	- [Deep Potential in a nutshell](#deep-potential-in-a-nutshell)
 - [Download and install](#download-and-install)
 - [Use DeePMD-kit](#use-deepmd-kit)
+- [Code structure](#code-structure)
 - [Troubleshooting](#troubleshooting)
 
 # About DeePMD-kit
 DeePMD-kit is a package written in Python/C++, designed to minimize the effort required to build deep learning based model of interatomic potential energy and force field and to perform molecular dynamics (MD). This brings new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. 
 
 For more information, check the [documentation](https://deepmd.readthedocs.io/).
 
+## Highlights in DeePMD-kit v2.0
+
+* [Model compression](doc/use-deepmd-kit.md#compress-a-model). Accelerate the efficiency of model inference for 4-15 times.
+* [New descriptors](doc/use-deepmd-kit.md#write-the-input-script). Including [`se_e2_r`](doc/train-se-e2-r.md) and [`se_e3`](doc/train-se-e3.md).
+* [Hybridization of descriptors](doc/train-hybrid.md). Hybrid descriptor constructed from concatenation of several descriptors.
+* Atom type embedding.
+* Training and inference the dipole (vector) and polarizability (matrix).
+* Split of training and validation dataset.
+* Optimized training on GPUs. 
+
+
 ## Highlighted features
 * **interfaced with TensorFlow**, one of the most popular deep learning frameworks, making the training process highly automatic and efficient, in addition Tensorboard can be used to visualize training procedure.
 * **interfaced with high-performance classical MD and quantum (path-integral) MD packages**, i.e., LAMMPS and i-PI, respectively. 
 * **implements the Deep Potential series models**, which have been successfully applied to  finite and extended systems including organic molecules, metals, semiconductors, and insulators, etc.
 * **implements MPI and GPU supports**, makes it highly efficient for high performance parallel and distributed computing.
 * **highly modularized**, easy to adapt to different descriptors for deep learning based potential energy models.
 
-## Code structure
-The code is organized as follows:
-
-* `data/raw`: tools manipulating the raw data files.
-
-* `examples`: example json parameter files.
-
-* `source/3rdparty`: third-party packages used by DeePMD-kit.
-
-* `source/cmake`: cmake scripts for building.
-
-* `source/ipi`: source code of i-PI client.
-
-* `source/lib`: source code of DeePMD-kit library.
-
-* `source/lmp`: source code of Lammps module.
-
-* `source/op`: tensorflow op implementation. working with library.
-
-* `source/train`: Python modules and scripts for training and testing.
-
-
 ## License and credits
 The project DeePMD-kit is licensed under [GNU LGPLv3.0](./LICENSE).
 If you use this code in any future publications, please cite this using 
@@ -87,43 +77,30 @@ A quick-start on using DeePMD-kit can be found [here](doc/use-deepmd-kit.md).
 A full [document](doc/train-input.rst) on options in the training input script is available.
 
 
-# Troubleshooting
-In consequence of various differences of computers or systems, problems may occur. Some common circumstances are listed as follows. 
-If other unexpected problems occur, you're welcome to contact us for help.
+# Code structure
+The code is organized as follows:
 
-## Model compatability
+* `data/raw`: tools manipulating the raw data files.
 
-When the version of DeePMD-kit used to training model is different from the that of DeePMD-kit running MDs, one has the problem of model compatability.
+* `examples`: examples.
 
-DeePMD-kit guarantees that the codes with the same major and minor revisions are compatible. That is to say v0.12.5 is compatible to v0.12.0, but is not compatible to v0.11.0 nor v1.0.0. 
+* `deepmd`: DeePMD-kit python modules.
 
-## Installation: inadequate versions of gcc/g++
-Sometimes you may use a gcc/g++ of version <4.9. If you have a gcc/g++ of version > 4.9, say, 7.2.0, you may choose to use it by doing 
-```bash
-export CC=/path/to/gcc-7.2.0/bin/gcc
-export CXX=/path/to/gcc-7.2.0/bin/g++
-```
+* `source/api_cc`: source code of DeePMD-kit C++ API.
 
-If, for any reason, for example, you only have a gcc/g++ of version 4.8.5, you can still compile all the parts of TensorFlow and most of the parts of DeePMD-kit. i-Pi will be disabled automatically.
+* `source/ipi`: source code of i-PI client.
 
-## Installation: build files left in DeePMD-kit
-When you try to build a second time when installing DeePMD-kit, files produced before may contribute to failure. Thus, you may clear them by
-```bash
-cd build
-rm -r *
-```
-and redo the `cmake` process.
+* `source/lib`: source code of DeePMD-kit library.
+
+* `source/lmp`: source code of Lammps module.
 
-## MD: cannot run LAMMPS after installing a new version of DeePMD-kit
-This typically happens when you install a new version of DeePMD-kit and copy directly the generated `USER-DEEPMD` to a LAMMPS source code folder and re-install LAMMPS.
+* `source/op`: tensorflow op implementation. working with library.
 
-To solve this problem, it suffices to first remove `USER-DEEPMD` from LAMMPS source code by 
-```bash
-make no-user-deepmd
-```
-and then install the new `USER-DEEPMD`.
 
-If this does not solve your problem, try to decompress the LAMMPS source tarball and install LAMMPS from scratch again, which typically should be very fast.
+
+# Troubleshooting
+
+See the [troubleshooting page](doc/troubleshooting.md).
 
 
 [1]: http://www.global-sci.com/galley/CiCP-2017-0213.pdf

diff --git a/deepmd/loss/tensor.py b/deepmd/loss/tensor.py
@@ -78,6 +78,9 @@ def build (self,
         polar_hat = label_dict[self.label_name]
         polar = model_dict[self.tensor_name]
 
+        # YWolfeee: get the 2 norm of label, i.e. polar_hat
+        normalized_term = tf.sqrt(tf.reduce_sum(tf.square(polar_hat)))
+
         # YHT: added for global / local dipole combination
         l2_loss = global_cvt_2_tf_float(0.0)
         more_loss = {
@@ -117,7 +120,7 @@ def build (self,
             self.l2_loss_global_summary = tf.summary.scalar('l2_global_loss', 
                                             tf.sqrt(more_loss['global_loss']) / global_cvt_2_tf_float(atoms))
 
-            # YHT: should only consider atoms with dipole, i.e. atoms
+            # YWolfeee: should only consider atoms with dipole, i.e. atoms
             # atom_norm  = 1./ global_cvt_2_tf_float(natoms[0])  
             atom_norm  = 1./ global_cvt_2_tf_float(atoms)  
             global_loss *= atom_norm   
@@ -128,7 +131,12 @@ def build (self,
         self.l2_l = l2_loss
 
         self.l2_loss_summary = tf.summary.scalar('l2_loss', tf.sqrt(l2_loss))
-        return l2_loss, more_loss
+
+        # YWolfeee: loss normalization, do not influence the printed loss,
+        #           just change the training process
+        #return l2_loss, more_loss
+        return l2_loss / normalized_term, more_loss
+
 
     def eval(self, sess, feed_dict, natoms):
         atoms = 0

diff --git a/deepmd/train/trainer.py b/deepmd/train/trainer.py
@@ -113,22 +113,22 @@ def _init_param(self, jdata):
         # elif fitting_type == 'wfc':            
         #     self.fitting = WFCFitting(fitting_param, self.descrpt)
         elif fitting_type == 'dipole':
-            if descrpt_type == 'se_a':
+            if descrpt_type == 'se_e2_a':
                 self.fitting = DipoleFittingSeA(**fitting_param)
             else :
-                raise RuntimeError('fitting dipole only supports descrptors: se_a')
+                raise RuntimeError('fitting dipole only supports descrptors: se_e2_a')
         elif fitting_type == 'polar':
             # if descrpt_type == 'loc_frame':
             #     self.fitting = PolarFittingLocFrame(fitting_param, self.descrpt)
-            if descrpt_type == 'se_a':
+            if descrpt_type == 'se_e2_a':
                 self.fitting = PolarFittingSeA(**fitting_param)
             else :
-                raise RuntimeError('fitting polar only supports descrptors: loc_frame and se_a')
+                raise RuntimeError('fitting polar only supports descrptors: loc_frame and se_e2_a')
         elif fitting_type == 'global_polar':
-            if descrpt_type == 'se_a':
+            if descrpt_type == 'se_e2_a':
                 self.fitting = GlobalPolarFittingSeA(**fitting_param)
             else :
-                raise RuntimeError('fitting global_polar only supports descrptors: loc_frame and se_a')
+                raise RuntimeError('fitting global_polar only supports descrptors: loc_frame and se_e2_a')
         else :
             raise RuntimeError('unknow fitting type ' + fitting_type)
 

diff --git a/deepmd/utils/argcheck.py b/deepmd/utils/argcheck.py
@@ -393,19 +393,40 @@ def loss_ener():
         Argument("relative_f", [float,None], optional = True, doc = doc_relative_f)
     ]
 
+# YWolfeee: Modified to support tensor type of loss args.
+def loss_tensor(default_mode):
+    if default_mode == "local":
+        doc_global_weight = "The prefactor of the weight of global loss. It should be larger than or equal to 0. If not provided, training will be atomic mode, i.e. atomic label should be provided." 
+        doc_local_weight =  "The prefactor of the weight of atomic loss. It should be larger than or equal to 0. If it's not provided and global weight is provided, training will be global mode, i.e. global label should be provided. If both global and atomic weight are not provided, training will be atomic mode, i.e.  atomic label should be provided." 
+        return [
+            Argument("pref_weight", [float,int], optional = True, default = None, doc = doc_global_weight),
+            Argument("pref_atomic_weight", [float,int], optional = True, default = None, doc = doc_local_weight),
+        ]
+    else:
+        doc_local_weight = "The prefactor of the weight of atomic loss. It should be larger than or equal to 0. If not provided, training will be global mode, i.e. global label should be provided." 
+        doc_global_weight =  "The prefactor of the weight of global loss. It should be larger than or equal to 0. If it's not provided and atomic weight is provided, training will be atomic mode, i.e. atomic label should be provided. If both global and atomic weight are not provided, training will be global mode, i.e.  global label should be provided." 
+        return [
+            Argument("pref_weight", [float,int], optional = True, default = None, doc = doc_global_weight),
+            Argument("pref_atomic_weight", [float,int], optional = True, default = None, doc = doc_local_weight),
+        ]
 
 def loss_variant_type_args():
-    doc_loss = 'The type of the loss. \n\.'
+    doc_loss = 'The type of the loss. The loss type should be set to the fitting type or left unset.\n\.'
+
 
     return Variant("type", 
-                   [Argument("ener", dict, loss_ener())],
+                   [Argument("ener", dict, loss_ener()),
+                    Argument("dipole", dict, loss_tensor("local")),
+                    Argument("polar", dict, loss_tensor("local")),
+                    Argument("global_polar", dict, loss_tensor("global"))
+                    ],
                    optional = True,
                    default_tag = 'ener',
                    doc = doc_loss)
 
 
 def loss_args():
-    doc_loss = 'The definition of loss function. The type of the loss depends on the type of the fitting. For fitting type `ener`, the prefactors before energy, force, virial and atomic energy losses may be provided. For fitting type `dipole`, `polar` and `global_polar`, the loss may be an empty `dict` or unset.' 
+    doc_loss = 'The definition of loss function. The loss type should be set to the fitting type or left unset.\n\.'
     ca = Argument('loss', dict, [], 
                   [loss_variant_type_args()],
                   optional = True,

diff --git a/doc/data-conv.md b/doc/data-conv.md
@@ -0,0 +1,41 @@
+# Data
+
+
+In this example we will convert the DFT labeled data stored in VASP `OUTCAR` format into the data format used by DeePMD-kit. The example `OUTCAR` can be found in the directory. 
+```bash
+$deepmd_source_dir/examples/data_conv
+```
+
+
+## Definition
+
+The DeePMD-kit organize data in **`systems`**. Each `system` is composed by a number of **`frames`**. One may roughly view a `frame` as a snap short on an MD trajectory, but it does not necessary come from an MD simulation. A `frame` records the coordinates and types of atoms, cell vectors if the periodic boundary condition is assumed, energy, atomic forces and virial. It is noted that the `frames` in one `system` share the same number of atoms with the same type. 
+
+
+
+## Data conversion
+
+It is conveninent to use [dpdata](https://github.com/deepmodeling/dpdata) to convert data generated by DFT packages to the data format used by DeePMD-kit.
+
+To install one can execute 
+```bash
+pip install dpdata
+```
+
+An example of converting data [VASP](https://www.vasp.at/) data in `OUTCAR` format to DeePMD-kit data can be found at
+```
+$deepmd_source_dir/examples/data_conv
+```
+
+Switch to that directory, then one can convert data by using the following python script
+```python
+import dpdata
+dsys = dpdata.LabeledSystem('OUTCAR')
+dsys.to('deepmd/npy', 'deepmd_data', set_size = dsys.get_nframes())
+```
+
+`get_nframes()` method gets the number of frames in the `OUTCAR`, and the argument `set_size` enforces that the set size is equal to the number of frames in the system, viz. only one `set` is created in the `system`. 
+
+The data in DeePMD-kit format is stored in the folder `deepmd_data`.
+
+A list of all [supported data format](https://github.com/deepmodeling/dpdata#load-data) and more nice features of `dpdata` can be found at the [official website](https://github.com/deepmodeling/dpdata).
diff --git a/doc/train-hybrid.md b/doc/train-hybrid.md
@@ -0,0 +1,25 @@
+# Train a Deep Potential model using descriptor `"hybrid"`
+
+This descriptor hybridize multiple descriptors to form a new descriptor. For example we have a list of descriptor denoted by D_1, D_2, ..., D_N, the hybrid descriptor this the concatenation of the list, i.e. D = (D_1, D_2, ..., D_N).
+
+To use the descriptor in DeePMD-kit, one firstly set the `type` to `"hybrid"`, then provide the definitions of the descriptors by the items in the `list`,
+```json=
+        "descriptor" :{
+            "type": "hybrid",
+            "list" : [
+                {
+		    "type" : "se_e2_a",
+		    ...		    
+                },
+                {
+		    "type" : "se_e2_r",
+		    ...
+                }
+            ]
+        },
+```
+
+A complete training input script of this example can be found in the directory
+```bash
+$deepmd_source_dir/examples/water/hybrid/input.json
+```