Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ru_generators sometimes fails when started before run starts, runs fine on restart of ru_generator #56

Closed
ps-account opened this issue Jun 19, 2020 · 5 comments
Labels

Comments

@ps-account
Copy link

ps-account commented Jun 19, 2020

If I start the ru_generator before the run has started it goes into a wait mode, starts when the run starts, and often immediately fails. Restarting ru_generator fixes the issue.

2020-06-19 10:06:43,253 Manager ru_generators --experiment-name test --device MN26516 --toml example.toml --log-file RU_log.log
2020-06-19 10:06:43,254 Manager batch_size=512
2020-06-19 10:06:43,254 Manager cache_size=512
2020-06-19 10:06:43,254 Manager channels=[1, 512]
2020-06-19 10:06:43,254 Manager chunk_log=chunk_log.log
2020-06-19 10:06:43,254 Manager device=MN26516
2020-06-19 10:06:43,254 Manager dry_run=False
2020-06-19 10:06:43,254 Manager experiment_name=test
2020-06-19 10:06:43,254 Manager host=127.0.0.1
2020-06-19 10:06:43,254 Manager log_file=RU_log.log
2020-06-19 10:06:43,254 Manager log_format=%(asctime)s %(name)s %(message)s
2020-06-19 10:06:43,254 Manager log_level=info
2020-06-19 10:06:43,254 Manager paf_log=paflog.log
2020-06-19 10:06:43,254 Manager port=9501
2020-06-19 10:06:43,254 Manager read_cache=AccumulatingCache
2020-06-19 10:06:43,255 Manager run_time=172800
2020-06-19 10:06:43,255 Manager throttle=0.1
2020-06-19 10:06:43,255 Manager toml=example.toml
2020-06-19 10:06:43,255 Manager unblock_duration=0.1
2020-06-19 10:06:43,255 Manager workers=1
2020-06-19 10:06:43,308 Manager Initialising minimap2 mapper
2020-06-19 10:06:43,334 Manager Mapper initialised
2020-06-19 10:06:43,334 read_until_api_v2.main Client type: many chunk
2020-06-19 10:06:43,334 read_until_api_v2.main Cache type: AccumulatingCache
2020-06-19 10:06:43,335 read_until_api_v2.main Filter for classes: adapter and strand
2020-06-19 10:06:43,335 read_until_api_v2.main Creating rpc connection for device MN26516.
2020-06-19 10:06:43,759 read_until_api_v2.main Loaded RPC
2020-06-19 10:06:43,759 read_until_api_v2.main Waiting for device to start processing

Once the flow cell starts running it often crashes as follows:

...
2020-06-19 10:07:47,668 Manager Creating 1 workers
2020-06-19 10:07:47,669 read_until_api_v2.main Processing started
2020-06-19 10:07:47,669 read_until_api_v2.main Sending init command, channels:1-512, min_chunk:0
2020-06-19 10:07:47,673 read_until_api_v2.main <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.FAILED_PRECONDITION
        details = "Data acquisition not running, or analysis not enabled"
        debug_error_string = "{"created":"@1592554067.669713569","description":"Error received from peer ipv4:127.0.0.1:8002","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Data acquisition not running, or analysis not enabled","grpc_status":9}"

Then starting ru_generators again with the run already on its way, works just fine. Any idea what the problem is here? Could this be a timeout issue?

@mattloose
Copy link
Contributor

This is a frustrating issue which we think is caused by the MinKNOW RPC.

It appears that the RPC reports that it is ready but actually isn't - and this may be millisecond timing.

My advice at present would be to wait until the run is actually started before starting ru_generator.

I'll tag @alexomics for any further comment.

@ps-account
Copy link
Author

ps-account commented Jun 19, 2020

Thanks! Good to know it's probably not a configuration error on our side or something.

If you check MinKNOW during a normal start, it actually gives a "Finishing up" status somewhere during the initialization (I added a screenshot). Could be related to that?

In the meantime, maybe I missed it but it could be helpful to add that this situation can occur to the instructions?

minknow

@mattloose
Copy link
Contributor

Yeah - these things should be caught.

We will add somethign to the readme - (@alexomics) - unfortunately this particular scenario hasn't happened on any of our testing machines - it's just been reported to us.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 20, 2023
@github-actions
Copy link

This issue was closed because there has been no response for 5 days after becoming stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants