-
Notifications
You must be signed in to change notification settings - Fork 119
how to ? #1352
Comments
Let's look at the tutorial notebook you mentioned. The call pdb = mdshare.fetch('alanine-dipeptide-nowater.pdb', working_directory='data') ensures that the file pdb = 'data/alanine-dipeptide-nowater.pdb' Likewise, files = mdshare.fetch('alanine-dipeptide-*-250ns-nowater.xtc', working_directory='data') would be equivalent to files = [
'data/alanine-dipeptide-0-250ns-nowater.xtc',
'data/alanine-dipeptide-1-250ns-nowater.xtc',
'data/alanine-dipeptide-2-250ns-nowater.xtc'] And that is exactly the kind of information you need to pass to pyemma's loading functions: the relative or absolute paths of your files as strings. Once you have the location of your PDB file stored in the variable feat = pyemma.coordinates.featurizer(pdb)
feat.add_backbone_torsions(periodic=False) # load only backbone torsions and load the selected molecular features into memory data = pyemma.coordinates.load(files, features=feat) or create a reader object (recommended for huge data sets) reader = pyemma.coordinates.source(files, features=feat) |
its working but got another error i following https://github.com/markovmodel/pyemma_tutorials/blob/master/notebooks/01-data-io-and-featurization.ipynb this tutorial and @ this step data_concatenated = np.concatenate(data) getting following error:IndexError Traceback (most recent call last) ~/miniconda3/envs/lib/python3.6/site-packages/pyemma/plots/plots1d.py in plot_feature_histograms(xyzall, feature_labels, ax, ylog, outfile, n_bins, ignore_dim_warning, **kwargs) IndexError: tuple index out of range |
Yes, that exception is raised if you want to plot the histograms of more than 50 features. You can either plot your features in batches, e.g., via pyemma.plots.plot_feature_histograms(data_concatenated[:, 0:10])
pyemma.plots.plot_feature_histograms(data_concatenated[:, 10:20])
... or use the option mentioned in the Traceback to suppress the exception: pyemma.plots.plot_feature_histograms(
data_concatenated, feature_labels=feat, ignore_dim_warning=True) The latter, however, will most likely result in a completely unusable figure. |
Actually, can you show us what |
Yes, you are right @thempel, I misread the Traceback. |
type of data: <class 'numpy.ndarray'> |
alanine-dipeptide-0-250ns-nowater.xtc and alanine-dipeptide-nowater.pdb |
Thanks, unfortunately we are still having problems to follow you. Could you please provide the code that you are trying to run? A minimal example would be great so we can reproduce the issue. It might be possible that, if you have only one single trajectory, you should not concatenate the data. If that is the case, try to use the original data instead of the concatenated data. Concatenation only makes sense if you have multiple trajectories that you need to concatenate e.g. for histogram plotting. |
actually i want to do analysis for my simulation files for this jupyter code https://github.com/markovmodel/deeptime/blob/master/vampnet/examples/Alanine_dipeptide_multiple_files.ipynb how you get following 2 files for heavy atom position and bacbone dihedral in npz format. is it necessary to take 3 files currently i have one xtc and one pdb file so how i can get npz file and can use this code. |
OK, a few points on this:
The functions In the vampnet example you mentioned, we are using precomputed molecular features. In detail, we have run the code feat = pyemma.coordinates.featurizer(pdb)
feat.add_backbone_torsions(periodic=False)
data = pyemma.coordinates.load(files, features=feat)
np.savez('alanine-dipeptide-3x250ns-backbone-dihedrals.npz', *data) to extract the backbone dihedrals from the three Now, if we want to run a vampnet calculation using backbone dihedrals, we can load this precomputed data via with np.load('alanine-dipeptide-3x250ns-backbone-dihedrals.npz') as fh:
data = [fh['arr_0'], fh['arr_1'], fh['arr_2']] Unfortunately, pyemma cannot directly read |
i got that but you have not clear another doubt that is you used 3 xtc files becuse of that we have 3 npy files to use in this code : Save the files separatelynp.save('traj0.npy', alanine_files['arr_0']) Separate data files between training data and validation datatrain_data_files_list = [ valid_data_files_list = [ |
Yes, you can use .gro files as topology file. The number of files is arbitrary, you can structure the data as you like. The crucial part is that you subsample your data such that there is no overlap between training and validation data. In the above case, we had 3 independent trajectories and chose the first two for training and the third for validation. If you have multiple trajectories, you can take an arbitrary subset for training and the remainder for validation. If you have only a single trajectory, you need to subsample this trajectory into blocks. Generally, this split does not require the data to be in different files. More information on this kind of splitting is provided in introductions about cross-validation. This should be explained in the PyEMMA tutorials that you already mentioned (notebook 00 and 01). If you have further issues with VAMPNets in particular, please consider opening an issue in the deeptime repository. |
Thanks a lot, understand everything only have last doubt that is as you mentioned above :"If you have only a single trajectory, you need to sub sample this trajectory into blocks. " how to do this do you have any example where you did for single trajectory. |
Let us assume your single trajectory is loaded into the variable n = len(data) // 2
data_train = data[:n]
data_validation = data[n:] would split your trajectory into roughly equal sized parts which are not overlapping. This is a crude but simple example. If you want a more elaborate example, please consider working through this block subsampling function from deeptime's time-lagged autoencoder project: https://github.com/markovmodel/deeptime/blob/f2b97328baa1c38c92616f058195fa5803ff05d9/time-lagged-autoencoder/tae/utils.py#L190-L211 |
how to resolve this?
TypeErrorTraceback (most recent call last)
<ipython-input-18-03fc53fef753> in <module>
5
output_size),
6 steps =
np.sum(np.ceil((total_data_source.trajectory_lengths()-tau)/batch_size)),
----> 7 verbose = 0)
8 states_prob_t = states_prob_all[:,:output_size]
9 states_prob_lag = states_prob_all[:,output_size:]
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py
in predict_generator(self, generator, steps, max_queue_size, workers,
use_multiprocessing, verbose)
1534 workers=workers,
1535 use_multiprocessing=use_multiprocessing,
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
-> 1536 verbose=verbose)
1537
1538 def _get_callback_model(self):
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_generator.py
in model_iteration(model, data, steps_per_epoch, epochs, verbose,
callbacks, validation_data, validation_steps, class_weight, max_queue_size,
workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size,
**kwargs)
174 progbar.on_epoch_begin(epoch, epoch_logs)
175
--> 176 for step in range(steps_per_epoch):
177 batch_data = _get_next_batch(output_generator, mode)
178 if batch_data is None:
TypeError: 'numpy.float64' object cannot be interpreted as an integer
…On Mon, Sep 3, 2018 at 2:42 PM Tim Hempel ***@***.***> wrote:
Closed #1352 <#1352>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1352 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Anc4w5nVaNjxVMNz0w1pLeSDEuFN9Ou-ks5uXPKUgaJpZM4WSt_v>
.
|
Thank you very much for posting your tensorflow problem to the pyemma issue tracker and for putting so much efforts into formatting it. I assume you resolve this by not using floats but integers when calling |
What is the difference between a working directory and a path? I am at a loss. I am trying to fetch the file pH10-amber-R1-dry.xtc via the command files = fetch( 'pH10-amber-R1-dry.xtc', working_directory='C:/Users/giova/data/') but I keep on obtaining the following error message: pH10-amber-R1-dry.xtc [no match in repository] I assure you that the file pH10-amber-R1-dry.xtc does belong to the directory /data. Why do I get the said message? Thank you very much for you attentive reply! |
Thanks
The text was updated successfully, but these errors were encountered: