Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAMMPS input files for examples would be helpful #18

Closed
bernstei opened this issue May 7, 2020 · 11 comments · Fixed by #60
Closed

LAMMPS input files for examples would be helpful #18

bernstei opened this issue May 7, 2020 · 11 comments · Fixed by #60
Assignees
Labels
Milestone

Comments

@bernstei
Copy link

bernstei commented May 7, 2020

I think it would be useful for users who are trying to use this package for their own input to have the LAMMPS input files corresponding to the examples provided here. I'm trying to do exactly that, and getting terrible disagreement between my conventional autocorrelation integration analysis and the cepstrum-based output. I don't know if I'm somehow generating bad data, and if I had the LAMMPS structures and potentials (and things like the time step used for the MD), I could check if my problem is somehow inherent to my system, or merely the way I'm setting up the heat flux equilibration and sampling, for example.

@rikigigi rikigigi added documentation Documentation and comments help wanted labels May 7, 2020
@rikigigi
Copy link
Member

rikigigi commented May 7, 2020

Yes, this can be a good suggestion for new users that want to calculate thermal coefficient.
Regarding your problem: if you use the same time series for both methods you should get the same result if the time series is long enough. And note that this method was created for getting out a number from very bad time series, so this should not be a problem. If the time series is very bad and you are able to calculate additional non diffusive currents in your simulation, you may have exceptional improvements by using the multicomponent analysis with one of those non diffusive currents (like for example, the center of mass velocity of the oxigens in a liquid water system).
You can also check convergence with respect to the cutoff frequency FSTAR (try to make it as small as possible) and length of the trajectory. In some cases you can have a very steep descent near the zero frequency, that means that you need a longer trajectory or some better multicomponent analysis. If by a visual inspection the periodogram near zero goes to the value that the program tell, I would trust the thermocepstrum code (assuming that the units are correct :D ).

@lorisercole
Copy link
Member

Hi @bernstei, thanks for your comment.
We shall include the LAMMPS inputs files asap, where possible, if that helps. I acknowledge that the documentation would also benefit from some improvement: we are working on it.

In any case, if you are computing the heat flux with LAMMPS using the compute heat/flux command, you should analyse the first 3 components that it returns you, like it is shown in the Silica example (the data file that we read is simply a copy-paste of the 3 columns of LAMMPS output that contain the heat current).
Make sure of the units that you use in LAMMPS: right now thermocepstrum only supports metal and real units.

Once you have defined a heat current j like in the example, you can check if the integral of the autocorrelation function is the same as what you were expecting, by entering:

j.compute_acf()        # computes the auto-correlation function
j.compute_gkintegral() # computes the integral of the acf
plt.figure()
plt.plot(j.timeseries(), j.tau * j.kappa_scale)

If the numbers look wrong (e.g. the integral seems to converge to a wrong value), it might be a problem of units. Otherwise, you can go ahead and perform the cepstral analysis.

Resampling and choosing a low enough FSTAR are important for the final result, but this should not be too far off already.
The shape of the spectrum (periodogram) and its behavior at low frequencies will ultimately determine the performance of the method. As @rikigigi mentioned, there are methods/tricks that can help to deal with these difficult cases, but it is hard to tell if this applies to you without seeing at least the periodogram of your data.

@bernstei
Copy link
Author

bernstei commented May 7, 2020

Thanks - I've been using the command line routine so far, but I'll try to set up python like the silica example and take a look at what you suggest.

@bernstei
Copy link
Author

bernstei commented May 7, 2020

is j.tau * j.kappa_scale in your suggestion above supposed to be returning a thermal conductivity in W/m K ?

@bernstei
Copy link
Author

bernstei commented May 7, 2020

Here is the command line output

-----------------------------------------------------
  RESAMPLE TIME SERIES
-----------------------------------------------------
 Original Nyquist freq  f_Ny =     100.00000 THz
 Resampling freq          f* =      10.00000 THz
 Sampling time         TSKIP =            10 steps
                             =        50.000 fs
 Original  n. of frequencies =        125001
 Resampled n. of frequencies =         12501
 PSD      @cutoff  (pre-filter) = 201372.66626
                  (post-filter) = 204089.37539
 log(PSD) @cutoff  (pre-filter) =     12.21291
                  (post-filter) =     12.22631
 min(PSD)          (pre-filter) =      0.22651
 min(PSD)         (post-filter) =  83699.03976
 % of original PSD Power f<f* (pre-filter)  = 95.685439
-----------------------------------------------------

-----------------------------------------------------
  CEPSTRAL ANALYSIS
-----------------------------------------------------
  AIC_Kmin  = 750  (P* = 751, corr_factor = 2.000000)
  L_0*   =          16.623282 +/-   0.050278
  S_0*   =    16924280.540605 +/- 850923.918536
-----------------------------------------------------
  kappa* =           1.328149 +/-   0.066777  W/mK
-----------------------------------------------------

I'm also attaching the PSD and the autocorrelation plots, as suggested in the a-SiO2 sample and your comment above. I'm still looking into the units when comparing your and my autocorrelation, but any other suggestions you have would be welcome.

psd.pdf
ac.pdf

@lorisercole
Copy link
Member

is j.tau * j.kappa_scale in your suggestion above supposed to be returning a thermal conductivity in W/m K ?

Yes if units are set correctly it should give you a thermal conductivity in W/m K.

However, there is something weird in your autocorrelation plot.
If the heat current was a (N, 3) array (where N is the number of steps of the trajectory), I would expect to see only 3 curves in the plot, corresponding to the x, y, z components.
Make sure that the shape of the array you use to define j is (N, 3) (if Jx, Jy, Jz are equivalent, i.e. if the material is isotropic).

j = tc.HeatCurrent(array, UNITS, DT_FS, TEMPERATURE, VOLUME)
print(j.traj.shape)

@bernstei
Copy link
Author

bernstei commented May 8, 2020

I was able to get good agreement in the autocorrelation-based kappa only if I multiply the integral by DT_FS. If I check the actual values, it is not actually an integral but a discrete sum, and I don't see anyplace else where the time step factor is applied. I am now able to get more or less good agreement between compute_gkintegral and my own implementation. It's not perfect - for some reason compute_acf() returns values that are very close to mine for small time lag, but become increasingly smaller as lag increases.

The autocorrelation plot has so many traces because I'm using 16 independent runs, including all 3*16 columns as c_flux[N] N=1..48.

I'm still looking into the comparison with cepstral analysis.

@lorisercole
Copy link
Member

You are actually right: you should multiply by the time step DT_FS, I forgot about that.
j.kappa_scale contains all the factors but the time step.

The function compute_acf() does a simple trapezoidal integration of the ACF.
I would not worry about the slight difference between your ACF and the one computed by compute_acf(): this function computes the ACF via a Fourier transform, and uses the "unbiased" definition (it divides the sum by N-k, where k is the time lag).
If you use the biased definition (dividing by N instead) you will get a difference at large time lags.
See the code we use here: https://github.com/lorisercole/thermocepstrum/blob/2fb9270a93e3ce8235ae754c55ae387e434914fb/thermocepstrum/md/mdsample.py#L288-L306 and https://github.com/lorisercole/thermocepstrum/blob/2fb9270a93e3ce8235ae754c55ae387e434914fb/thermocepstrum/md/acf.py#L6-L35
Anyways, the cepstral analysis will note be affected by this, as we do not need to compute the ACF.

The autocorrelation plot has so many traces because I'm using 16 independent runs, including all 3*16 columns as c_flux[N] N=1..48.

Ok I see.

Having a look at your psd spectrum there might be some difficulty due to the sharp peak at near zero frequency. However you seem to have a big amount of data, which surely helps.
I can suggest you to reproduce the last plot of the example (or the file output.cepstrumfiltered_psd.dat if you use the command-line). You should compare the "cepstrum-filtered" spectrum with the "resampled" one: at low frequencies near zero they should be quite similar.
If not, reducing FSTAR in the resample can help (probably you can safely go down to 0.25-0.5THz). You should try to see if the result changes as you decrease FSTAR.

@bernstei
Copy link
Author

bernstei commented May 8, 2020

Thanks - now I'm getting pretty good agreement, as long as I set the correction factor to 2. Is there a heuristic for figuring out what's a good value for that parameter?

@bernstei
Copy link
Author

bernstei commented May 8, 2020

Also, should I expect the command line's psd and resampled_psd to vary at all? They don't seem to:

# freqs_THz  psd  fpsd  logpsd  flogpsd
# 
0.000000000000000000e+00 2.707877126081777364e+07 2.707877126081777364e+07 1.711426063056300961e+01 1.711426063056300961e+01
7.999999999999999299e-04 2.819196909925082326e+07 2.819196909925082326e+07 1.715454771163524939e+01 1.715454771163524939e+01
1.599999999999999860e-03 2.624597693873424828e+07 2.624597693873424828e+07 1.708302327577968427e+01 1.708302327577968427e+01
2.399999999999999790e-03 2.797262338531310111e+07 2.797262338531310111e+07 1.714673685360479283e+01 1.714673685360479283e+01
3.199999999999999720e-03 2.305922836236362904e+07 2.305922836236362904e+07 1.695357661007548344e+01 1.695357661007548344e+01
4.000000000000000083e-03 2.015438349101626873e+07 2.015438349101626873e+07 1.681893236567639960e+01 1.681893236567639960e+01
4.799999999999999580e-03 2.433287676416046545e+07 2.433287676416046545e+07 1.700733894715183325e+01 1.700733894715183325e+01
5.599999999999999943e-03 2.304247028344745189e+07 2.304247028344745189e+07 1.695284960520813300e+01 1.695284960520813300e+01

==> output.resampled_psd.dat <==
# freqs_THz  psd  fpsd  logpsd  flogpsd
# 
0.000000000000000000e+00 2.696351353376904875e+07 2.696351353376904875e+07 1.710999515946981120e+01 1.710999515946981120e+01
8.012820512820512543e-04 2.812943695145658031e+07 2.812943695145658031e+07 1.715232716432034366e+01 1.715232716432034366e+01
1.602564102564102509e-03 2.628008224026321992e+07 2.628008224026321992e+07 1.708432188096175608e+01 1.708432188096175608e+01
2.403846153846153980e-03 2.798279925522816926e+07 2.798279925522816926e+07 1.714710056705907704e+01 1.714710056705907704e+01
3.205128205128205017e-03 2.288070048892340809e+07 2.288070048892340809e+07 1.694580433978184075e+01 1.694580433978184075e+01
4.006410256410256054e-03 2.010930342816803977e+07 2.010930342816803977e+07 1.681669312306677710e+01 1.681669312306677710e+01
4.807692307692307959e-03 2.425509772156803682e+07 2.425509772156803682e+07 1.700413736857801439e+01 1.700413736857801439e+01
5.608974358974358997e-03 2.302947096469915286e+07 2.302947096469915286e+07 1.695228529996111888e+01 1.695228529996111888e+01

@lorisercole lorisercole self-assigned this May 11, 2020
@lorisercole
Copy link
Member

Thanks - now I'm getting pretty good agreement, as long as I set the correction factor to 2. Is there a heuristic for figuring out what's a good value for that parameter?

You should plot either L0 or kappa as a function of P*, with the functions:
jf.plot_L0_Pstar()
or
jf.plot_kappa_Pstar()
(cells 13 and 14 of the example)
or columns 3 & 5 of the file output.cepstral.dat.
The vertical dashed line represents the P* "cutoff" chosen by the AIC.

If your system has a particularly slow convergence (as it might, due to the sharp peak at near-zero frequency), the convergence of L0 or kappa will likely be slow, and the P* chosen might be too low. So you can try to double it with the correction factor. This will increase the statistical error, but will probably reduce the bias.
In any case, consider this convergence behavior with care. It is difficult to estimate the right correction factor: if P* is too large you will pick up random noise (you can notice L0 or kappa start to behave randomly at high P*).
Using a small FSTAR may be a good choice (this will make the convergence of L0(P*) faster). Just do not go too small or the numbers will explode :)

Also, should I expect the command line's psd and resampled_psd to vary at all? They don't seem to:

You should plot them: the frequency resolution of this data is 8e-4 THz. You want to see if the features of the low-frequency region of spectrum are reproduced by the resampled_psd.

@lorisercole lorisercole added type/question and removed documentation Documentation and comments labels May 14, 2020
@lorisercole lorisercole mentioned this issue May 22, 2020
3 tasks
@lorisercole lorisercole added this to the Documentation milestone Oct 13, 2020
@lorisercole lorisercole added the documentation Documentation and comments label Jun 22, 2022
@lorisercole lorisercole linked a pull request Jun 25, 2022 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants