Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mpi support to compute the descriptor #30

Open
qzhu2017 opened this issue Dec 1, 2020 · 20 comments
Open

Add mpi support to compute the descriptor #30

qzhu2017 opened this issue Dec 1, 2020 · 20 comments
Assignees

Comments

@qzhu2017
Copy link
Owner

qzhu2017 commented Dec 1, 2020

@macstein
I will ask you tomorrow how to enable mpi for this loop

with connect(db_filename) as db:
count = 0
for row in db.select():
count += 1
atoms = db.get_atoms(id=row.id)
energy = row.data.energy
force = row.data.force
# substract the energy/force offsets due to the base_potential
if self.base_potential is not None:
energy_off, force_off, _ = self.compute_base_potential(atoms)
energy -= energy_off
force -= force_off
energy_in = row.data.energy_in
force_in = row.data.force_in
# QZ: todo, add mpi support, this is the most expensive part
d = self.descriptor.calculate(atoms)
ele = [Element(ele).z for ele in d['elements']]
ele = np.array(ele)
if energy_in:
pts_to_add["energy"].append((d['x'], energy/len(atoms), ele))
for id in force_in:
ids = np.argwhere(d['seq'][:,1]==id).flatten()
_i = d['seq'][ids, 0]
pts_to_add["force"].append((d['x'][_i,:], d['dxdr'][ids], force[id], ele[_i]))
pts_to_add["db"].append((atoms, energy, force, energy_in, force_in))
if count % 50 == 0:
print("Processed {:d} structures".format(count))

@qzhu2017 qzhu2017 self-assigned this Dec 1, 2020
@macstein
Copy link
Collaborator

macstein commented Dec 1, 2020

I will take look at it. talk tomorrow.

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 1, 2020

@macstein

Below are the steps to run the code

  • clone the code
  • pip setup.py install
  • cd examples
  • mpiexec -n 6 python example_validate.py models/test_2.json database/PtHO.db

@macstein
Copy link
Collaborator

macstein commented Dec 3, 2020

@qzhu2017 On NERSC, typing "python setup.py install --user" looks install successfully. However, I got error message "OSError: Could not load shared object file: libllvmlite.so" running "python example_validate.py models/test_2.json database/PtHO.db". Do you have any idea about this? If not, I will search about this problem by myself.

@macstein
Copy link
Collaborator

macstein commented Dec 3, 2020

@qzhu2017 Never mind!. pip install llvmlite==0.16 --user looks work.

@macstein
Copy link
Collaborator

macstein commented Dec 3, 2020

@qzhu2017 @pedroantoniosantosf @yanxon
Do you have any idea about this error "sqlite3.OperationalError: unable to open database file". I installed "CSP_BO" on NERSC, and run "srun -n 1 python example_validate.py models/test_2.json /global/homes/b/bkang/bk/UNLV/CSP_BO-master/examples/database/PtHO.db". and end up having the error message.

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 3, 2020

@macstein
In the code, we attempted to extract the data from two db files.
One is /global/homes/b/bkang/bk/UNLV/CSP_BO-master/examples/database/PtHO.db
The other is models/test_2.db, which is specified in models/test_2.json.

I suspect
1, you may not install ASE correctly. Can you try the following,

(base) qiangzhu@Qiangs-MacBook-Pro-2 CSP_BO (master) $ ase db examples/models/test_2.db 
id|age|user    |formula   |natoms
 1|18d|tg849380|Pt48H96O48|   192
 2|18d|tg849380|Pt48H96O48|   192
 3|18d|tg849380|Pt48H96O48|   192
 4|18d|tg849380|Pt48H96O48|   192
 5|18d|tg849380|Pt48H96O48|   192
 6|18d|tg849380|Pt48H96O48|   192
 7|18d|tg849380|Pt48H96O48|   192
 8|18d|tg849380|Pt48H96O48|   192
 9|18d|tg849380|Pt48H96O48|   192
10|18d|tg849380|Pt48H96O48|   192
11|18d|tg849380|Pt48H96O48|   192
12|18d|tg849380|Pt48H96O48|   192
13|18d|tg849380|Pt48H96O48|   192
14|18d|tg849380|Pt48H96O48|   192
15|18d|tg849380|Pt48H96O48|   192
16|18d|tg849380|Pt48H96O48|   192
17|18d|tg849380|Pt48H96O48|   192
18|18d|tg849380|Pt48H96O48|   192
19|18d|tg849380|Pt48H96O48|   192
20|18d|tg849380|Pt48H96O48|   192
Rows: 150 (showing first 20)

If you did not get the same output, please do pip install ase>=3.18.0

@macstein
Copy link
Collaborator

macstein commented Dec 3, 2020

@qzhu2017
I did "pip install ase --user". Now, I have this "OSError: Can not read new ase.db format (version 9). Please update to latest ASE.". running mpi. "pip install ase>=3.18.0" give me this " ERROR: Could not find a version that satisfies the requirement 3.18.0 (from versions: none)
ERROR: No matching distribution found for 3.18.0" on NERSC.

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 3, 2020

try:

pip install ase==3.20.1 --user
Make sure that you are using pip3 and python3 environment.

@macstein
Copy link
Collaborator

macstein commented Dec 3, 2020

I am having "sqlite3.OperationalError: unable to open database file" again. @qzhu2017 Can you assist me to install CSP_BO properly on NESRC via Webex?

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 3, 2020

@macstein
Copy link
Collaborator

macstein commented Dec 7, 2020

@qzhu2017 When we append data to pts_to_add = {"energy": [], "force": [], "db": []}, is it OK to append data randomly? Or do I have to keep the order in for row in db.select(): loops?

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 7, 2020 via email

@macstein
Copy link
Collaborator

macstein commented Dec 7, 2020

Do you mean I can append randomly? I will try this and will check if the results are same.

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 7, 2020 via email

@macstein
Copy link
Collaborator

macstein commented Dec 7, 2020

@qzhu2017 Please let me know the script which can replace "for row in db.select():" to obtain row with index or key. Also let me know efficient way to get total count, if you have.

@macstein
Copy link
Collaborator

macstein commented Dec 8, 2020

@qzhu2017 I found "rowt=db.get(id=3)" works.

@macstein
Copy link
Collaborator

macstein commented Dec 8, 2020

@qzhu2017 @yanxon I am facing serious issue in using ase with mpi. It may be caused by install version problem. Can you test this simple python code in your computer?

from ase.db import connect
from mpi4py import MPI

comm=MPI.COMM_WORLD
rank=comm.Get_rank()

db_file="PtHO.db"
print("db_file",db_file)
db=connect(db_file)

if rank==0:
    rowt=db.get(id=1)
    print("rank,energy_in",rank,rowt.data.energy_in)
else:
    rowt=db.get(id=2)
    print("rank,energy_in",rank,rowt.data.energy_in)

go to CSP_BO/examples/models folder.
make PtHO_mpi.py copying above.
run "mpiexec -n 2 python PtHO_mpi.py"
Please let me know your result.

On NERSRC, it show:

db_file PtHO.db
rank,energy_in 0 True
db_file PtHO.db
rank,energy_in 1 True

It should be

db_file PtHO.db
rank,energy_in 0 True
db_file PtHO.db
rank,energy_in 1 **False**

@qzhu2017
Copy link
Owner Author

qzhu2017 commented Dec 8, 2020

I am getting

qzhu@cms models (master) $ mpiexec -n 2 python PtHO_mpi.py
db_file PtHO.db
db_file PtHO.db
rank,energy_in 0 True
rank,energy_in 1 True

@macstein
Copy link
Collaborator

macstein commented Dec 9, 2020

@qzhu2017
Copy link
Owner Author

  • to check if the sequence of energy/force data will impact the results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants