element profile/hyperparameter optimization #385

Rana-Phy · 2022-01-13T19:30:19Z

Dear Developers,

I am trying to optimize the element profile for a multicomponent system.
I am a very beginner in python doing this (manually) by python 'for loop'.
I am afraid that it will take 15 years to be finished (200x200x200 number of searches).
I am seeing that authors previously did it for several multicomponent systems.

Could you suggest to us some efficient and faster way to do it?

####################

loop

rcut_grid = []
for rc_1 in np.arange(4,6,0.01):
for rc_2 in np.arange(4,6,0.01):
for rc_3 in np.arange(4,6,0.01):

        element_profile = {'Ti': {'r': rc_1, 'w': Ti}, 'Si': {'r': rc_2 , 'w': Si}, 
               'C': {'r': rc_3, 'w': C}}
        describer = BispectrumCoefficients(rcutfac=0.5, twojmax=6, 
                               element_profile=element_profile, quadratic=False, 
                               pot_fit=True, include_stress=False, n_jobs=4)
        tsc_features = describer.transform(tsc_train_structures)
        y = tsc_df['y_orig'] / tsc_df['n']
        x = tsc_features
        simple_model = LinearRegression(n_jobs=4)
        simple_model.fit(x, y, sample_weight=weights)
        energy_indices = np.argwhere(np.array(tsc_df["dtype"]) == "energy").ravel()
        forces_indices = np.argwhere(np.array(tsc_df["dtype"]) == "force").ravel()
        simple_predict_y = simple_model.predict(x)
        original_energy = y[energy_indices]
        original_forces = y[forces_indices]
        simple_predict_energy = simple_predict_y[energy_indices]
        simple_predict_forces = simple_predict_y[forces_indices]
        e_e=mean_absolute_error(original_energy, simple_predict_energy) *10000
        e_f=mean_absolute_error(original_forces, simple_predict_forces)

        rcut_grid.append((rc_1, rc_2, rc_3, e_e, e_f))

The text was updated successfully, but these errors were encountered:

JiQi535 · 2022-01-13T20:20:41Z

Hi Rana, I can give two pieces of advice:

Try to parallelize your grid search for the best combination of parameters.
Since each combination of parameters is independent to each other, we can let them run in parallel and make selection afterwards. There are Python packages helping us to parallelize the grid search, for example, the multiprocessing package. If you can divide your search into 24 parallel processes, then the search is likely accelerated for a few times or over 10 times.

Make a reasonable size of search space for the optimal parameters.
In your case, 200x200x200 number of searches seem to include too many cases which are not practical or necessary. I won't suggest an exact range for your search, but you may decide the intervals and total number of searches depends on the available resources you have.

Hi Rana, I can give two pieces of advice:

Try to parallelize your grid search for the best combination of parameters.
Since each combination of parameters is independent to each other, we can let them run in parallel and make selection afterwards. There are Python packages helping us to parallelize the grid search, for example, the multiprocessing package. If you can divide your search into 24 parallel processes, then the search is likely accelerated for a few times or over 10 times.
Make a reasonable size of search space for the optimal parameters.
In your case, 200x200x200 number of searches seem to include too many cases which are not practical or necessary. I won't suggest an exact range for your search, but you may decide the intervals and total number of searches depends on the available resources you have.

Rana-Phy · 2022-01-14T18:58:34Z

Thanks for your suggestion. It is fast now!
Is there any technical reason behind 'divide your search into 24 parallel processes'?

JiQi535 · 2022-01-14T19:31:08Z

Thanks for your suggestion. It is fast now! Is there any technical reason behind 'divide your search into 24 parallel processes'?

Happy to know that it helps! I used "24" as an example, as there are 24 cores on each node of the computer cluster resources our group have access to. This value should be modified on different machines to achieve best efficiency.

Rana-Phy · 2022-01-25T19:34:59Z

Dear Ji Qi,

Ok, now I am seeing that multiprocessing is at least three times slower than 'n_jobs'=24 of sci-kit learn for the larger dataset.
Maybe this could be because of our cluster setup or my script.
I am trying to understand the maml base model classes.
and my understanding could be completely wrong.
So,
skl_model=SKLModel(describer=describer, model=LinearRegression())
Is not the SKLModel is the model and describer is containing the hyperparameter (cut,wt,jmax) and model=LinearRegression() parameters will be learned during training/fitting?
May I put element_profile in a hyperparameter optimization package like optuna, hyperopt or anything you suggest instead of doing for loop. It's not possible for some reason.
I will be waiting to hear from you.

Best regards,
Rana

JiQi535 · 2022-01-25T20:41:40Z

Rana-Phy

The describer here is the local environment describer of the SNAP potential, which describes the material structures in a math form. The LinearRegression is the model used in the ML training process to connect the local environment describers (input) to target properties, which are energies, forces and stresses.

For parameter tuning, I'm not aware any existing automatic algorithms for SNAP training. Please let me know if there is, which I would be interested in. In previous works from our group, we used differential evolution implemented in scipy for parameter tuning of a SNAP for Mo (http://dx.doi.org/10.1103/PhysRevMaterials.1.043603), and we also used stepwise grid search for SNAPs for alloy systems (http://dx.doi.org/10.1038/s41524-020-0339-0). Those maybe good references for you.

shyuep closed this as completed Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

element profile/hyperparameter optimization #385

element profile/hyperparameter optimization #385

Rana-Phy commented Jan 13, 2022

JiQi535 commented Jan 13, 2022 •

edited

Loading

Rana-Phy commented Jan 14, 2022

JiQi535 commented Jan 14, 2022

Rana-Phy commented Jan 25, 2022

JiQi535 commented Jan 25, 2022

element profile/hyperparameter optimization #385

element profile/hyperparameter optimization #385

Comments

Rana-Phy commented Jan 13, 2022

loop

JiQi535 commented Jan 13, 2022 • edited Loading

Rana-Phy commented Jan 14, 2022

JiQi535 commented Jan 14, 2022

Rana-Phy commented Jan 25, 2022

JiQi535 commented Jan 25, 2022

JiQi535 commented Jan 13, 2022 •

edited

Loading