Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database is locked in thread ... error #111

Open
JensRichnow opened this issue Nov 22, 2023 · 18 comments
Open

Database is locked in thread ... error #111

JensRichnow opened this issue Nov 22, 2023 · 18 comments
Labels
bug Something isn't working

Comments

@JensRichnow
Copy link

JensRichnow commented Nov 22, 2023

I'm using v0.17.0. When running grafana-wtf --drop-cache find xyz I get a DB lock errror:

2023-11-22 14:11:14,368 [grafana_wtf.core                    ] INFO   : Fetching dashboards in parallel with 5 concurrent requests                                             | 0/578 [00:00<?, ?it/s]
2023-11-22 14:11:19,952 [requests_cache.backends.sqlite      ] WARNING: Database is locked in thread 6272069632; retrying (1/3)
2023-11-22 14:11:25,210 [requests_cache.backends.sqlite      ] WARNING: Database is locked in thread 6204764160; retrying (1/3)
2023-11-22 14:11:30,478 [requests_cache.backends.sqlite      ] WARNING: Database is locked in thread 6221590528; retrying (1/3)
2023-11-22 14:11:35,742 [requests_cache.backends.sqlite      ] WARNING: Database is locked in thread 6238416896; retrying (1/3)
2023-11-22 14:11:41,009 [requests_cache.backends.sqlite      ] WARNING: Database is locked in thread 6255243264; retrying (1/3)

Running on Apple M2 Max.

How can I recover from that?
Cheers

@JensRichnow
Copy link
Author

Update on the above:

  • I uninstalled and reinstalled (removed also the grafana-wtf.sqlite db)
  • I get Segmentation fault during the fetch process. I might have missed this error the first time ...
Segmentation fault: 11                                                                                                                                                 | 27/578 [00:02<00:38, 14.16it/s]
/Users/jens/.pyenv/versions/3.9.12/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@amotl
Copy link
Contributor

amotl commented Dec 3, 2023

Dear Jens,

thank you for writing in, and apologies for the late reply.

grafana-wtf was conceived as a sloppy tool fulfilling some of our needs and to make use of grafana-client the other day. It looks like we did not get the code right when bringing concurrent HTTP downloads together with the SQLite-based request cache -- it apparently "just worked" for us and others up until now that you are reporting this flaw.

We will have to look into it where corresponding locks would be missing, that will prevent accessing shared resources concurrently. In the meanwhile, you may be successful by using the --concurrency= command-line option, in order to decrease download concurrency? Please let us know how this goes.

With kind regards,
Andreas.

@amotl
Copy link
Contributor

amotl commented Dec 3, 2023

There are a few discussions on the issue tracker of the requests-cache package about this topic.

In general, they indicate the SQLite-based caching has limits when it comes to high concurrency situations, but there are also specific recommendations which may need to be applied to the grafana-wtf code base. If you have the capacity to dive into the details, we will be more than happy.

Now that it trips at a concurrency of "5" already, indicates that either the code of grafana-wtf would need to be improved, as mentioned above, or that you just discovered something in this area which will trip on M2 machines much more likely, even with lower concurrency. In this case, it may be relevant to report this to the requests-cache developers as well.

Another recommendation I've picked up from quickly scanning those discussions would be to use a different cache backend. Please let us know if you have such needs, we think it will be easy to prepare grafana-wtf to offer different cache backends on behalf of requests-cache.

@amotl
Copy link
Contributor

amotl commented Dec 3, 2023

There is another option we may want to consider.

grafana-wtf --drop-cache

Some people are also using that option in order to emulate a --no-cache behavior.

Maybe it is also the right time to finally offer such an option, so that grafana-wtf can also be more easily used as a library, without doing too much like making the request caching obligatory.

In this spirit, when implemented, --no-cache should turn off any caching at all, eventually resolving your issue in a different manner.

@amotl
Copy link
Contributor

amotl commented Dec 3, 2023

Hi again,

taking all this into consideration, specifically @JWCook's comment at requests-cache/requests-cache#870 (comment), ...

The solution is to use CachedSession instead, which is thread-safe.

... and reviewing grafana-wtf's code at the spots where the cache enablement and the concurrent downloads are happening, we can see it does not use CachedSession at all.

So, we figure a good first measure would be to actually use request-cache's CachedSession, which may improve the situation right away, when we are lucky?

With kind regards,
Andreas.

@JWCook
Copy link

JWCook commented Dec 3, 2023

It looks like the actual requests are sent by the grafana-client library, so I'm guessing that's why install_cache() is being used here. That's a convenient way to patch an external library you don't control, but as you found, it's not thread-safe.

I don't think you need to switch backends; SQLite should be more than able to handle this. For reference, though, there are some recommendations on choosing a different backend here.

Thankfully, there is a requests.Session object stored on GrafanaClient.s. You can replace that with a requests_cache.CachedSession object, which would look something like this:

api = GrafanaApi(...)
session = CachedSession(cache_name=__appname__, expire_after=expire_after, use_cache_dir=True)
session.headers['User-Agent'] = api.client.user_agent
api.client.s = session

Good luck! Feel free to open an issue or discussion on requests-cache if you get stuck.

@JensRichnow
Copy link
Author

Thanks @amotl ! Makes perfectly sense and setting--concurrency=1 for the time being worked nicely!

@amotl
Copy link
Contributor

amotl commented Dec 3, 2023

Hi there,

thank you both for your excellent replies. Jordan, the suggestion to patch GrafanaClient accordingly sounds good, and at the same time, I am happy to hear that Jens has been able to use --concurrency=1 as a workaround.

Cheers,
Andreas.

@amotl
Copy link
Contributor

amotl commented Mar 29, 2024

Hi again. There is a patch now, which may improve the situation.

@amotl
Copy link
Contributor

amotl commented Mar 31, 2024

Dear Jens and Jordan,

grafana-wtf 0.19.0 has been released, including corresponding patches to improve the situation. Can you tell us if it works better for you now? Thanks for your support!

With kind regards,
Andreas.

@edgarasg
Copy link

Issue still persists with 0.19.0

@amotl
Copy link
Contributor

amotl commented Apr 20, 2024

Dear Edgaras,

thank you for writing in. Do you think grafana-wtf should start providing an alternative cache backend, which is less problematic in concurrent access situations?

@JWCook and @Ousret: Can you spot any obvious problems with the current implementation after improving based on your feedback? Maybe we are still doing things unfortunate or even completely wrong?

With kind regards,
Andreas.

@JWCook
Copy link

JWCook commented Apr 20, 2024

@amotl I looked over your changes and they look good to me. This issue could potentially be fixable with some backend settings, for example SQLite write-ahead logging. This comes with a few tradeoffs, but notably it allows read operations to not block writes. This can be enabled by passing wal=True to CachedSession.

It's also worth mentioning that I can't make any guarantees about niquests' compatibility with requests-cache. From a quick look at its Session class, it should behave the same as requests.Session, but it may or may not remain that way.

I can help debug further, but I'll need some more information. @edgarasg can you provide the exact error message you're getting, and post a complete example that reproduces this issue? Are you getting exceptions, or just log warnings? And @amotl, roughly how many threads are making concurrent requests? Is there a user-configurable thread limit, automatic based on the CPU, or other?

@amotl
Copy link
Contributor

amotl commented Apr 20, 2024

Dear Jordan,

thanks for your swift reply. We used your suggestion about wal=True right away.

grafana-wtf 0.19.1 includes this improvement. May we ask you to use that version for proceeding into further tests, @edgarasg?

It's also worth mentioning that I can't make any guarantees about niquests' compatibility with requests-cache. From a quick look at its Session class, it should behave the same as requests.Session, but it may or may not remain that way.

This is right on the spot. There could be something fishy, but up until further notice, I am also assuming "it will just work". Do you have any idea why that could go south in concurrency situations, because of unfortunate side effects because Niquests is async, @Ousret?

I can help debug further, but I'll need some more information.

Actually, I didn't mean to bother you too much with that issue, so I appreciate your reply and offer very much. Let's hear back what @edgarasg might report about 0.19.1 before conducting another round of investigations.

How many threads are making concurrent requests? Is there a user-configurable thread limit, automatic based on the CPU, or other?

The default value for the concurrency setting is 5. It can be adjusted by using the --concurrency command-line argument.

https://github.com/panodata/grafana-wtf/blob/c28d961d04e2e81219770d0220ac902690fd6465/grafana_wtf/commands.py#L55

With kind regards,
Andreas.

@Ousret
Copy link

Ousret commented Apr 21, 2024

It's also worth mentioning that I can't make any guarantees about niquests' compatibility with requests-cache. From a quick look at its Session class, it should behave the same as requests.Session, but it may or may not remain that way.

Niquests advertise itself as a drop in replacement. It would lost his purpose if this was "broken" or if it diverged too much.
We are making sure it would work.

This is right on the spot. There could be something fishy, but up until further notice, I am also assuming "it will just work". Do you have any idea why that could go south in concurrency situations, because of unfortunate side effects because Niquests is async, @Ousret?

Niquests is thread safe in a synchronous enclosure and task safe in asynchronous. So nothing to be concerned there.
I believe you don't do async in grafana wtf. So the "thread safe" aspect apply.
And yes, it should "just work". We keep close contact in case there's a need to improve.

regards,

@edgarasg
Copy link

Thanks for swift response. Now I'm getting such error:

❯ grafana-wtf find Phj18H0nk --drop-cache
[grafana_wtf.core                    ] INFO    : Fetching dashboards in parallel with 5 concurrent requests                                                           | 0/479 [00:00<?, ?it/s]
[1]    9213 segmentation fault  grafana-wtf find Phj18H0nk --drop-cache
/Users/edgarasg/.pyenv/versions/3.11.6/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@amotl
Copy link
Contributor

amotl commented Apr 25, 2024

Sorry to hear, and thanks for reporting. Please also use --concurrency=1 as a workaround, until this problem will be resolved properly.

@amotl
Copy link
Contributor

amotl commented Sep 28, 2024

Hi again @edgarasg,

can you confirm it is a hard error on your side, or is it just a warning with version 0.19.1, so that in general, it works for you now?

@JensRichnow: Can I also humbly ask you to try 0.19.1 or higher on your environment and report back about it?

With kind regards,
Andreas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants