Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataprep and data.log #152

Closed
AndreaYCT opened this issue Feb 23, 2024 · 8 comments
Closed

dataprep and data.log #152

AndreaYCT opened this issue Feb 23, 2024 · 8 comments

Comments

@AndreaYCT
Copy link

Hi,

I am running dataprep on HPC so I check the progress via data.log. However, i found data.log is no longer adding new data after 4 hours but the dataprep is not finished yet after 12 hours. Should I wait or there is something wrong?

echo "m6Anet dataprep start: STM2457_2uM"
date
m6anet dataprep --eventalign STM2457_2uM_eventalign.txt \
--out_dir dataprep/STM2457_2uM \
--n_processes 28 \
--readcount_max 2000000
echo "m6Anet dataprep done: STM2457_2uM"
date
image

thanks!

Andrea

@yuukiiwa
Copy link
Collaborator

Hi @AndreaYCT,

I supposed your HPC run log didn't echo out "m6Anet dataprep done: STM2457_2uM".

As you run m6anet with 28 cores, it will take 28 errors to have the run entirely halt, which explains why it was "running" after 12 hours. You can try to see which transcript ids are missing in your data.log and subset your eventalign.txt file on those transcript ids and run m6anet on those locally to troubleshoot.

Thanks!

Best wishes,
Yuk Kei

@AndreaYCT
Copy link
Author

Hi, Yuk kei,

where to find the "error information"? Since I don't get any error msg form the error.txt.

Or how to skip?

Many thanks! I'm stilling learning how to use HPC.

@yuukiiwa
Copy link
Collaborator

Hi @AndreaYCT,

You can use the interactive mode of your HPC to test out running a subset of transcript ids that weren't included in the data.info.

There's no way to skip those if there're problems with the eventalign.txt, you will have to troubleshoot that in the interactive mode of your HPC.

Thanks!

Best wishes,
Yuk Kei

@AndreaYCT
Copy link
Author

Hi, @yuukiiwa

I tried and i interrupted by crtl+C. Here are some msg:
Process Consumer-9: Process Consumer-2: Process Consumer-7: Process Consumer-12: Process Consumer-3: Process Consumer-5: Process Consumer-11: Process Consumer-13: Process Consumer-6: Process Consumer-8: Process Consumer-1: Process Consumer-4: Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/bin/m6anet", line 8, in <module> Process Consumer-10: sys.exit(main()) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/__init__.py", line 30, in main args.func(args) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/scripts/dataprep.py", line 63, in main args.out_dir, args.n_processes) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 250, in parallel_index Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() KeyboardInterrupt File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt lines = [len(eventalign_file.readline()) for i in range(chunk_concat_size)] File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 250, in <listcomp> lines = [len(eventalign_file.readline()) for i in range(chunk_concat_size)] KeyboardInterrupt Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 85, in run result = self.task_function(*next_task_args,self.locks) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 202, in index with locks['index'], open(out_paths['index'],'a', encoding='utf-8') as f_index: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() Traceback (most recent call last): KeyboardInterrupt File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 85, in run result = self.task_function(*next_task_args,self.locks) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 202, in index with locks['index'], open(out_paths['index'],'a', encoding='utf-8') as f_index: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() KeyboardInterrupt File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 85, in run result = self.task_function(*next_task_args,self.locks) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 202, in index with locks['index'], open(out_paths['index'],'a', encoding='utf-8') as f_index: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() KeyboardInterrupt Traceback (most recent call last): Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 85, in run result = self.task_function(*next_task_args,self.locks) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 81, in run next_task_args = self.task_queue.get() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 202, in index with locks['index'], open(out_paths['index'],'a', encoding='utf-8') as f_index: File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/queues.py", line 94, in get res = self._recv_bytes() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) KeyboardInterrupt File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) KeyboardInterrupt Traceback (most recent call last): File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/helper.py", line 85, in run result = self.task_function(*next_task_args,self.locks) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/m6anet/utils/dataprep_utils.py", line 205, in index pos_end += eventalign_result.loc[_index]['line_length'].sum() File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexing.py", line 873, in __getitem__ return self._getitem_tuple(key) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexing.py", line 1044, in _getitem_tuple return self._getitem_lowerdim(tup) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexing.py", line 774, in _getitem_lowerdim result = self._handle_lowerdim_multi_index_axis0(tup) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexing.py", line 1066, in _handle_lowerdim_multi_index_axis0 return self._get_label(tup, axis=axis) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexing.py", line 1059, in _get_label return self.obj.xs(label, axis=axis) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/generic.py", line 3491, in xs loc, new_index = self.index.get_loc_level(key, drop_level=drop_level) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2857, in get_loc_level return partial_selection(key) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2844, in partial_selection indexer = self.get_loc(key) File "/opt/ohpc/Taiwania3/pkg/biology/Python/Python_v3.7.10/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2741, in get_loc mask = self.codes[i][loc] == self._get_loc_single_level_index( KeyboardInterrupt

any suggestion will be really appreciated!

Hi @AndreaYCT,

You can use the interactive mode of your HPC to test out running a subset of transcript ids that weren't included in the data.info.

There's no way to skip those if there're problems with the eventalign.txt, you will have to troubleshoot that in the interactive mode of your HPC.

Thanks!

Best wishes, Yuk Kei

@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Mar 6, 2024

Hi @AndreaYCT,

I have skimmed through the error message, which is specific to your KeyboardInterrupt instead of a potential error message outputted by m6anet dataprep.

Can you try letting m6anet dataprep run without interrupting it with your keyboard, please?

Thanks!

Best wishes,
Yuk Kei

@AndreaYCT
Copy link
Author

Hi, @yuukiiwa ,

How to run it without interrupting it with your keyboard? I sent codes in a bash file and I also can try again.

But this will not happen when i used ENSEMBL resource.

Thanks!

Andrea

Hi @AndreaYCT,

I have skimmed through the error message, which is specific to your KeyboardInterrupt instead of a potential error message outputted by m6anet dataprep.

Can you try letting m6anet dataprep run without interrupting it with your keyboard, please?

Thanks!

Best wishes, Yuk Kei

@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Mar 6, 2024

Hi @AndreaYCT,

If ENSEMBL reference works, then we would suggest yo use it instead.

I was asking you to run m6anet dataprep locally to see whether you can reproduce the error that halt in the HPC, so you would have to turn on your computer and let it run.

But just checking, are you using the same transcriptome.fa file running minimap2 and eventalign? If not, we would recommend you to re-run both using the ENSEMBL transcriptome.fa that worked with minimap2 and eventalign.

Thanks!

Best wishes,
Yuk Kei

@AndreaYCT
Copy link
Author

Hi, @yuukiiwa

I did re-do all steps with ENSEMBL reference. Now i got m6anet inference done without any warning or other msgs.

Many thanks!

So far, I don't have a computer efficient to run all process, so i get to try later (dependent on if i get my computer)

Andrea

Hi @AndreaYCT,

If ENSEMBL reference works, then we would suggest yo use it instead.

I was asking you to run m6anet dataprep locally to see whether you can reproduce the error that halt in the HPC, so you would have to turn on your computer and let it run.

But just checking, are you using the same transcriptome.fa file running minimap2 and eventalign? If not, we would recommend you to re-run both using the ENSEMBL transcriptome.fa that worked with minimap2 and eventalign.

Thanks!

Best wishes, Yuk Kei

@yuukiiwa yuukiiwa closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants