Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification not working anymore, restart container helped #53

Closed
DarianAnjuhal opened this issue Dec 13, 2022 · 23 comments · Fixed by #70
Closed

Verification not working anymore, restart container helped #53

DarianAnjuhal opened this issue Dec 13, 2022 · 23 comments · Fixed by #70
Assignees

Comments

@DarianAnjuhal
Copy link

Hi,

after couple of months without issues the verification step did not work anymore.
The widget looked correct.

Unfortunately I do not have the errors from the client console (I try to get more infos).
But in the mcaptcha service I have the following log (Broken pipe (os error 32):

Defense { levels: [Level { visitor_threshold: 2000, difficulty_factor: 50000 }, Level { visitor_threshold: 10000, difficulty_factor: 3000000 }, Level { visitor_threshold: 20000, difficulty_factor: 5000000 }], current_visitor_threshold: 0 }
 ERROR mcaptcha::errors              > Broken pipe (os error 32)
 INFO  actix_web::middleware::logger > 10.42.0.226 "POST /api/v1/pow/config HTTP/1.1" 400 37 "https://*********/widget/?sitekey=*****" "Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0" 0.081178
 INFO  sqlx::query                   > /* SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 339.746µs  

Is it possible to monitor the verification endpoint in some way? Next time I prefer to get a mail from our monitoring and not a customers call. :-)

Are there additional logs I can have a look?
Please let me know if I can help to solve this issue. If I fix something by myself, of course I will generate a Pull request (as always).

Bye and have a nice day
Darian

@realaravinth
Copy link
Member

Hello,

Are you using mCaptcha/cache with this deployment? If yes, how was it doing at the time of this error? I've encountered broken pipe errors before when a dependent service has crashed.

Also, which version are you running? Build details can be obtained from /api/v1/meta/build

Is it possible to monitor the verification endpoint in some way?

There's a health endpoint, but it only checks if the database and the cache are reachable.

@DarianAnjuhal
Copy link
Author

Hi!

@realaravinth Thanks for your answer and your support.

Yes, I am using mCaptcha/cache:latest. Unfortunately I don't have logs from the cache right now. I have added monitoring to your health endpoint and as well the /pow/config. If the error occurs next time, I can provide more logs.

Here are my versions:
{"version":"0.1.0","git_commit_hash":"c1f6ce3ae29321f0fdecf801ba789f60e4f89511"}

Bye and have a nice day
Darian

@wzrdtales
Copy link
Contributor

this is apparently a big issue...

@realaravinth
Copy link
Member

this is apparently a big issue...

Did you face the issue as well?

@wzrdtales
Copy link
Contributor

permanently yes, its terrible, every few hours. Also there seems to be no option to actually disable this cache module.

@realaravinth
Copy link
Member

permanently yes, its terrible, every few hours.

Please provide logs.

Also there seems to be no option to actually disable this cache module.

Commenting out the redis section in the config file will use an embedded cache. I just realized that it is not documented, I'll improve docs to reflect that. But please note that the embedded cache doesn't persist data to disk.

@wzrdtales
Copy link
Contributor

i would love to provide logs. But there are none...

Is there an undocumented setting for a tracing or mode or something that actually generates meaningful logs useful for you?

@wzrdtales
Copy link
Contributor

with there are none i mean literally there is only apache like access logs, no errors whatsoever were in the logs

@realaravinth
Copy link
Member

with there are none i mean literally there is only apache like access logs, no errors whatsoever were in the logs

Weird. Are you running this using Docker or just the binary?

@realaravinth
Copy link
Member

I was able to reproduce the error, it happens when the Redis container crashes. mCaptcha/libmcaptcha#10 should fix that 🤞

mcaptcha-copy-mcaptcha-1           | Defense { levels: [Level { visitor_threshold: 100, difficulty_factor: 500 }, Level { visitor_threshold: 110, difficulty_factor: 5000 }, Level { visitor_threshold: 200, difficulty_factor: 50000 }, Level { visitor_threshold: 1000, difficulty_factor: 500000 }, Level { visitor_threshold: 1700, difficulty_factor: 1000000 }, Level { visitor_threshold: 2000, difficulty_factor: 2000000 }, Level { visitor_threshold: 2500, difficulty_factor: 5000000 }], current_visitor_threshold: 0 }
mcaptcha-copy-mcaptcha-1           |  ERROR mcaptcha::errors              > Broken pipe (os error 32)

cc: @wzrdtales @DarianAnjuhal

@realaravinth realaravinth self-assigned this Feb 21, 2023
@wzrdtales
Copy link
Contributor

actually it does not need a redis crash, our redis never crashed. having a dropped connection due to lifecycles TCP connections or simply a redis service that gets moved around to another server due to maintenance is probably enough.

@wzrdtales
Copy link
Contributor

with there are none i mean literally there is only apache like access logs, no errors whatsoever were in the logs

Weird. Are you running this using Docker or just the binary?

docker in kubernetes.

@realaravinth
Copy link
Member

actually it does not need a redis crash, our redis never crashed. having a dropped connection due to lifecycles TCP connections or simply a redis service that gets moved around to another server due to maintenance is probably enough.

Right, all things that make Redis unavailable :D

docker in kubernetes.

Interesting. Setting RUST_LOG=info but it should be done automatically by the program if the env var is unset 🤷

@wzrdtales
Copy link
Contributor

that is the thing we already set RUST_LOG to debug, with no effect. To be honest all in all, everything from the rust universe feels to be quite immature still.

@wzrdtales
Copy link
Contributor

Right, all things that make Redis unavailable :D

A dropped connection should normally be a default thing to handle for any library dealing with redis, or simply any network accessed protocol library.

@realaravinth
Copy link
Member

that is the thing we already set RUST_LOG to debug, with no effect.

I'm unfamiliar with Kubernetes. I used kompose to convert the docker-compose file shipped with this repository to generate Kubernetes deployment configurations for mCaptcha and deployed it. Logs appear to be working:

16:10 atm@lab tmp → kubectl logs -f deployment/mcaptcha
 INFO  mcaptcha > mcaptcha: mCaptcha - a PoW-based CAPTCHA system.
For more information, see: https://mcaptcha.org
Build info:
Version: 0.1.0 commit: f78669955c2150864d2ee8a9a5d90e134aaa52aa
 INFO  mcaptcha::settings > Loading config file from /etc/mcaptcha/config.toml
 INFO  mcaptcha::settings > Overriding [server].port with environment variable
 INFO  mcaptcha::settings > Overriding [database].url and [database].database_type with environment variable
 INFO  mcaptcha::data     > Initializing credential manager
 INFO  mcaptcha::data     > Initialized credential manager
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DBError(Io(Custom { kind: Uncategorized, error: "failed to lookup address information: No address associated with hostname" }))', src/db.rs:36:53
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To be honest all in all, everything from the rust universe feels to be quite immature still.

Rust is a developing language, it takes time to build stuff. Majority of the work done within the Rust community, including this project, is volunteer-driven.

@wzrdtales
Copy link
Contributor

a similar problem seem to exist with the db connections as well, completely deactivating this redis module results in the same issues after some time

@realaravinth
Copy link
Member

a similar problem seem to exist with the db

Unable to reproduce

image

@wzrdtales
Copy link
Contributor

can't really tell how to reproduce it, but we have disabled redis in the meantime completely and still get lock ups. About the environment, it is using a high available postgres db, using pgbouncer, maybe that helps as background info. Something locks the application up completely however, so yeah this is hard to evaluate if you can't reproduce it. I can only suggest trying to run it in k8s with the zalando operator for running a HA postgres db. As this is the setup where it is locking up currently (roughly about once in 24-48hours the service stalls completely, you don't even get an answer from config and sometimes not even the widget is loading anymore)

@wzrdtales
Copy link
Contributor

we actually have a locked up one right now in this second...

@wzrdtales
Copy link
Contributor

wzrdtales commented Mar 31, 2023

this time:

image

it is getting the configs still but no verify is working anymore.

And here the complete lifetimes log
https://pastes.l00m32.wizardtales.net/?5017c6a279993f10#FK5y9kMaNYdvg3L6x9dWezcCxmNUkCB34xCDUYP57uTV

in particular

from conversin: pool timed out while waiting for an open connection 

this was the reason i was saying there are issues with the db connection as well

@realaravinth
Copy link
Member

realaravinth commented Mar 31, 2023

@wzrdtales: Please create a separate issue for the situation you describe. Also, I request you to kindly provide complete context when you say something isn't working as it should be.

k8s is unsupported for the time being. I don't have the bandwidth for it and mCaptcha doesn't even have an alpha release yet, so supporting k8s seems unjustified.

The DB timeout usually happens when the DB library is unable to acquire a connection, it can be simulated by killing the database and doing something on the app to trigger DB activity:
image

I don't know why it is happening in your environment, especially when you say your database is HA.


Please feel free to reopen the ticket if the problem persists.

@wzrdtales
Copy link
Contributor

you tested this within a few minutes. this does only happen if you run it long term and do sufficient enough traffic on it. no traffic no problems and usually in the beginning from the start no problems.

no problem however, we are taking care of it ourselves currently, I am just giving you feedback not expecting anything from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants