You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As seen in recent incidents (ooni/sysadmin#183, ooni/sysadmin#216) the legacy python based backends are quite fragile and hard to deploy.
Moreover I think there is some value in improving our collector protocol.
The current collector protocol is not too hard to implement and I think there is some value in having a golang written collector replacement, that should also be easier to maintain and deploy.
Some of the issues with the current collector, stem from the fact that we have in the same service too many things (a collector is actually also a bouncer, 5 different test helpers, etc.). This means that the configuration is very fragile and in some cases needs to have valid values for irrelevant configuration options.
Designing it as a microservice that does only one thing (collect measurements) and does that well I think does not require too much effort and will pay off in the long run.
I think for the purpose of deploying an MVP the protocol should stay the same, yet have some improvements.
The requirements I would say for an MVP (and a strawman design for it) are the following:
Requirements
It must support all the legacy endpoints and be 100% backward compatible
It must not require setting up any cronjobs for it to work (ex. the daily-tasks.sh and rename-reports.py scripts)
It must support being restarted into a consistent state (reports in progress should be able to resume when it comes back up)
It should support some better way of transmitting measurements to the pipeline and not require ssh access to sync them
It should support passing in a minimal configuration via environment variables
It should support "registering" with the registry so that the /collector endpoint knows of the collectors that are out there
It should publish, as prometheus endpoint, some telemetry regarding reports being submitted and probe versions
Notes:
I think that the current design of making the collector responsible for starting tor and runinng a TLS server is sub-optimal and we should rather delegate these tasks to specific system services.
I also think it should be optimal to rely on a database for keeping track of state for the collector as part of this also record some telemetry that is useful to better understand our existing probe userbase and investigate issues easier.
The text was updated successfully, but these errors were encountered:
As seen in recent incidents (ooni/sysadmin#183, ooni/sysadmin#216) the legacy python based backends are quite fragile and hard to deploy.
Moreover I think there is some value in improving our collector protocol.
The current collector protocol is not too hard to implement and I think there is some value in having a golang written collector replacement, that should also be easier to maintain and deploy.
Some of the issues with the current collector, stem from the fact that we have in the same service too many things (a collector is actually also a bouncer, 5 different test helpers, etc.). This means that the configuration is very fragile and in some cases needs to have valid values for irrelevant configuration options.
Designing it as a microservice that does only one thing (collect measurements) and does that well I think does not require too much effort and will pay off in the long run.
I think for the purpose of deploying an MVP the protocol should stay the same, yet have some improvements.
The requirements I would say for an MVP (and a strawman design for it) are the following:
Requirements
daily-tasks.sh
andrename-reports.py
scripts)Notes:
I think that the current design of making the collector responsible for starting tor and runinng a TLS server is sub-optimal and we should rather delegate these tasks to specific system services.
I also think it should be optimal to rely on a database for keeping track of state for the collector as part of this also record some telemetry that is useful to better understand our existing probe userbase and investigate issues easier.
The text was updated successfully, but these errors were encountered: