There is one error in GUI with multiprocess #12

szwfive · 2024-03-07T13:47:41Z

The hardware environment is a single 3090. And timing of this error is uncertain. Although the slam process continues later, the stuck GUI cause the memory overflow. This may be the reason why Issue#7 cannot run.

muskie82 · 2024-03-07T16:22:23Z

Hi,

We previously encountered a similar issue randomly in our environment, though it hasn't appeared recently.
If you happen to find a way to reproduce the error deterministically, could you share it with us?

The problem seems to be:

PyTorch's multiprocessing communication between the main process and GUI fails for some reason, causing unpredictable GUI crashes.
The main process continues to send messages to the queue (q_main2vis), but with the GUI down, these items aren't cleared from GUI process, leading to memory overflow.

This issue occurs only when the GUI is enabled, so it should not be a problem when running in headless mode.

szwfive · 2024-03-09T05:29:00Z

Thank you for your answer.
I have located the source of the issue, in line 400 of slam frontend. py, "gains=clone_obj (self. issues)". It seems that an unknown error occurred during the copy.deepcopy process. I am trying to find the problem more accurately.

Tianci-Wen · 2024-03-11T13:21:47Z

I also have the same problem while running this repo in a single GPU 4090. I'm also trying to solve this problem.

whwh747 · 2024-04-03T08:40:27Z

same error

hnglp · 2024-04-04T10:05:35Z

Thank you for your answer. I have located the source of the issue, in line 400 of slam frontend. py, "gains=clone_obj (self. issues)". It seems that an unknown error occurred during the copy.deepcopy process. I am trying to find the problem more accurately.

Hello, is there any progress on solving the issue? Additionally, I would like to ask if I can run this project with an RTX 3070 (39GB). Regards.

whwh747 · 2024-04-04T10:41:05Z

I'm sorry, I can't resolve it.

JIAZHAOQIAN · 2024-07-05T05:37:00Z

I modified the function named 'get_latest_queue' in gui_utils.py as follows:

def get_latest_queue(q):
    message = None
    while True:
        try:
            message_latest = q.get_nowait()
            if message is not None:
                del message
            message = message_latest
        except queue.Empty:
            if q.qsize() < 1:
                break
        # zajia: for unsolved bug related to "torch.storage._UntypedStorage"
        except TypeError:
            print("get a torch.storage._UntypedStorage error!")
            break
    return message

I try to capture the error and do not modifiy the 'message' variable. It seems work properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is one error in GUI with multiprocess #12

There is one error in GUI with multiprocess #12

szwfive commented Mar 7, 2024

muskie82 commented Mar 7, 2024

szwfive commented Mar 9, 2024

Tianci-Wen commented Mar 11, 2024

whwh747 commented Apr 3, 2024

hnglp commented Apr 4, 2024

whwh747 commented Apr 4, 2024

JIAZHAOQIAN commented Jul 5, 2024 •

edited

Loading

There is one error in GUI with multiprocess #12

There is one error in GUI with multiprocess #12

Comments

szwfive commented Mar 7, 2024

muskie82 commented Mar 7, 2024

szwfive commented Mar 9, 2024

Tianci-Wen commented Mar 11, 2024

whwh747 commented Apr 3, 2024

hnglp commented Apr 4, 2024

whwh747 commented Apr 4, 2024

JIAZHAOQIAN commented Jul 5, 2024 • edited Loading

JIAZHAOQIAN commented Jul 5, 2024 •

edited

Loading