Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP Issue #1

Closed
msehabibur opened this issue Feb 13, 2024 · 11 comments
Closed

OpenMP Issue #1

msehabibur opened this issue Feb 13, 2024 · 11 comments

Comments

@msehabibur
Copy link

When I tried to train the model it shows:

Creating out...
Found vocab_size = 371 (inside tokens_v1_all/meta.pkl)
Initializing a new model from scratch...
number of parameters: 85.24M
Compiling the model (takes a ~minute)...
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
zsh: abort python bin/train.py dataset=tokens_v1_all

@lantunes
Copy link
Owner

Hi, this error is due to there being different, conflicting OpenMP libraries in your Python environment. This sounds like an environment-specific issue, and not directly related to the code itself.

Sometimes this happens if dependencies are installed from different sources (e.g. some from wheels, some from conda, etc.). If you haven't already done so, I would recommend creating a virtual Python environment:

python -m venv crystallm_venv
source crystallm_venv/bin/activate
poetry install

There's also this issue in the PyTorch repo, which may provide some guidance.

@msehabibur
Copy link
Author

Hi, thanks, I set export KMP_DUPLICATE_LIB_OK=TRUE to bypass the error. Can I ask, How long does it take to train a simple model, I wanted to train a model based on 50 crystals, but it seems like it is taking forever
Screenshot 2024-02-13 at 10 35 25 AM

@lantunes
Copy link
Owner

Training a model normally requires a GPU with sufficient memory, like an NVIDIA A10G or A100. It isn't practical to train on something like a laptop, as training will be extremely slow. If you're trying to train on a laptop, that may be why it's taking so long. Recall that these models have millions of parameters, even if the training set size is very small. You can adjust the number of parameters in the model by setting the number of layers and embedding size. However, even a model with 25 million parameters (what we call a small model) is going to be slow without a GPU. Inference (i.e. sampling from the model), on the other hand, may be practical on a laptop.

@msehabibur
Copy link
Author

When I used model's checkpoints to generate crystal, it just gave me some text, I didn't see any CIF files. Can you please explain how I can get the CIF file? Also, I don't see any prompt.txt files when the training is completed.

python bin/sample.py
out_dir=out/my_model
start=FILE:out/prompt.txt
num_samples=2
top_k=10
max_new_tokens=3000
device=cuda

Screenshot 2024-02-13 at 10 01 27 PM

@lantunes
Copy link
Owner

lantunes commented Feb 14, 2024

Hi, thanks for bringing this up. I have added more information to the README related to prompting the model and generating CIF files. I have also added a new script to help create prompt files (bin/make_prompt_file.py), and enhanced the bin/sample.py script so that generated CIF files can optionally be saved locally.

The strange-looking sequence of characters in your output was generated by the model you trained. However, I'm not sure how long you trained the model, and your model also has a relatively small number of parameters (0.01M). I therefore suspect that your model may not have sufficiently learned the data. Going forward, I will use the pre-trained small model (with 25M parameters) for my examples, which runs fine on my macbook for sampling.

First, download the model. From the root of the project:

python bin/download.py crystallm_v1_small.tar.gz

Unpack the file:

tar xvf crystallm_v1_small.tar.gz

This will result in a folder named crystallm_v1_small.

Next, make a prompt file. We'll generate a structure for NaCl with Z=2 and let the model determine the space group:

python bin/make_prompt_file.py Na2Cl2 my_prompt.txt

Finally, sample from the model:

python bin/sample.py \
out_dir=crystallm_v1_small \
start=FILE:my_prompt.txt \
num_samples=1 \
top_k=10 \
max_new_tokens=3000 \
device=cpu

This gives me the output:

Using configuration:
out_dir: crystallm_v1_small
start: FILE:my_prompt.txt
num_samples: 1
max_new_tokens: 3000
temperature: 0.8
top_k: 10
seed: 1337
device: cpu
dtype: bfloat16
compile: false
target: console

number of parameters: 25.36M
data_Na2Cl2
loop_
_atom_type_symbol
_atom_type_electronegativity
_atom_type_radius
_atom_type_ionic_radius
Na 0.9300 1.8000 1.1600
Cl 3.1600 1.0000 0.7800
_symmetry_space_group_name_H-M P4/nmm
_cell_length_a 4.9038
_cell_length_b 4.9038
_cell_length_c 3.9603
_cell_angle_alpha 90.0000
_cell_angle_beta 90.0000
_cell_angle_gamma 90.0000
_symmetry_Int_Tables_number 129
_chemical_formula_structural NaCl
_chemical_formula_sum 'Na2 Cl2'
_cell_volume 95.2964
_cell_formula_units_Z 2
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 'x, y, z'
loop_
_atom_site_type_symbol
_atom_site_label
_atom_site_symmetry_multiplicity
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
Na Na0 2 0.0000 0.0000 0.0000 1
Cl Cl1 2 0.0000 0.5000 0.9817 1


---------------

You can also have the generated content be saved locally to a file by including the target=file argument:

python bin/sample.py \
out_dir=crystallm_v1_small \
start=FILE:my_prompt.txt \
num_samples=1 \
top_k=10 \
max_new_tokens=3000 \
target=file \
device=cpu

A file called sample_1.cif would be created, in this case.

Note that these generated CIF files must be post-processed. Create a directory called my_raw_cifs and place any generated CIF files in there. Then:

python bin/postprocess.py my_raw_cifs my_processed_cifs

The post-processed CIF files will be in a new directory called my_processed_cifs.

@msehabibur
Copy link
Author

Hi, thanks for the detailed explanation. I think the sample.py script does not take target arguments; I also looked into the script. Can you please clarify?
Screenshot 2024-02-14 at 10 16 24 AM

@msehabibur
Copy link
Author

Oh, I see, The sample.py script has been updated to accept the argument.

@msehabibur
Copy link
Author

Is it possible to put multiple composition in the same prompt file?
python bin/make_prompt_file.py Na2Cl2 my_prompt.txt

Can I write python bin/make_prompt_file.py Na2Cl2 CdTe AlCl3 my_prompt.txt

@lantunes
Copy link
Owner

No, a prompt .txt file is expected to contain only a single prompt. You will need to make separate prompt files for each composition. Running that command with multiple compositions should result in an error.

@msehabibur
Copy link
Author

I do have another question. When it generates new CIF files, it only changes the lattice parameter and coordinate of each atom; are these all I'd expect? For example, if I say I want a CdTe-based crystal with a certain space group, it will give me some identical crystal files. We could have alternatively applied strain and compression to the parent CdTe to make these crystals too. Therefore, what is the point of training these generative models? Apologies if I am not understanding the key points of generative models. Can you please explain the benefits of these models in a bigger picture?

@lantunes
Copy link
Owner

Sorry but I don't understand what else you're expecting to see. These models generate a structure for you when all you know is the cell composition (and possibly space group). If you prompt the model with the same cell composition and space group, it will likely give you a very similar structure each time. I'm not sure what you mean by "parent CdTe", as there is no concept of a parent structure in this context. These models generate structures based on given compositions, independent of any "parent" structure.

Have a look at our manuscript, located at https://arxiv.org/abs/2307.04340, if you'd like to know more about what the point is of these generative models.

Thanks for your questions, but this discussion is no longer related to the original issue. Please create a new Issue if you have additional questions related to the code in this repo, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants