Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration between Orchestrator and ProxySQL #161

Open
leeparayno opened this issue Apr 26, 2017 · 4 comments
Open

Integration between Orchestrator and ProxySQL #161

leeparayno opened this issue Apr 26, 2017 · 4 comments

Comments

@leeparayno
Copy link

Percona Live 2017 - Birds of a Feather Discussion: Integration Between Orchestrator and ProxySQL

Notes on the BoF discussion tonight. (Apologies if I misquoted anyone).

Amsterdam

  • Shlomi - watched 5 presentations on ProxySQL, it was exploding
  • Orchestrator and ProxySQL can complement nicely, with Orchestrator handling the topology and failover and ProxySQL buffering connections from the application, especially in times of failover

Rene - solution that fits most of the stuff

Problem Statement

No overview of entire cluster

  • See effective or not
  • detecting active writer is a problem
    • If you failover, possibly with a network partition, it is not effective to have determine the true master if the old master returns
    • Different ProxySQL instances can have different view of who is the master (depending on which side of a network partition)

Solution provided by someone:

Hooked with Consul with Orchestrator

  • User testing Consul template for ProxySQL
    • if it fully works can contribute it to ProxySQL

No knowledge of whole architecture known to individual ProxySQL (Rene)

Jessica (Github)

  • Reference implementation
    • debate - to test
    • can make changes and branch deploy
    • if trying to deploy two complex systems, any attempt to unite them will be fraught with issues
  • Orchestrator
    • not dependent on external systems
    • currently supports to Graphite
    • Not away of pushing data to Consul or another

Consul in production

  • highly available?
    • told not to treat it as highly available (Shlomi)
    • but we need to treat whatever we depend on for this external source of truth (if we use something external) as HA

Chubby - at Google (Sugu)

  • If it is up 100% for 3 months , they actually take it down, so users don't get used to it being up
  • Expect application to cache what they need and react to a system being down

What is the Source of Truth?

  • What are looking to put into "external system"

Orchestrator

  • HA - runs with backend database

  • Service should be high available

  • Github

    • running against Master Master
    • with HAproxy in front
    • generally Orchestrator HA
    • at time of failure
      • should we rely on it to be always HA, to replace Consul?
  • Simon Mudd - always running 2 clusters

    • Don't do failover
      • feel to upgrade whatever cluster
      • then you need to ask

We don't need the coupling between these systems to be 100% HA necessarily (Lee)

  • Netflix has often posited this notion/pattern of Circuit Breakers in their infrastructure.
    • If a service is unavailable/times out, the requestor can fail out and drop down to a secondary option/alternative solution/resolution
  • In Elastic stack, the Beats have some notion of throttling or queuing, when Elasticsearch indexing is backed up, so individual beats, like Filebeat or Metricbeat, can hold back the work they need to communicate until the service becomes available again

Potential Solutions

  • dependent on read_only = 0
    • ProxySQL will direct traffic on read_only = 0
      • read_only = 1, MySQL instances will be placed in master hostgroup
      • read_only = 0, MySQL instances will be placed in the reader hostgroup
    • Orchestrator doesn't have to notify anything
    • Issues
      • What happens if old master didn't really go away
        • now two writable servers
        • inherently too dangerous
      • Matthias
        • master-slave in master hostgroup - only one gets picked
          • slave also has read_only = 1
    • ProxySQL could be first to identify issues
      • Could be get notification to Orchestrator
        • Orchestrator could check more aggressively
      • This should be the case, since ProxySQL would be receiving connections from the application
  • Orchestrator notifies ProxySQL of failure
    • Issue
      • If 2 of 5 ProxySQL cannot be contacted, what is Orchestrator's reaction?
    • At end of the day, edge cases stop becoming edge cases
    • Only a problem if Orchestrator cannot talk to 2 ProxySQL servers, and master being partitioned and old master coming back
      • set my.cnf, READ_ONLY = 1 and only turn it on dynamically ALWAYS (Shlomi)
  • Clustering ProxySQL together?
    • If network partition, quorum would keep configuration in sync
      • considering it (Rene)
@shlomi-noach
Copy link
Collaborator

@leeparayno thank you so much for taking notes!

cc @renecannao

@renecannao
Copy link

These are very detailed notes!
Thanks

@tapuhi
Copy link

tapuhi commented May 12, 2017

Hi @leeparayno Thanks allot for your detailed summary of this discussion. I attended as well and told you guys I've done some basic tests with orchestrator + consul + ProxySQL.

We are currently, at Wix.com, building and designing the full solution and architecture . I will update in a Separate blog (I will send the link next week ) our progress of the building, designing and POC of the final solution and will be more than Happy to share with the community .

Meanwhile please have a look at a question/issue/discussion I've posted regarding the design of the orchestrator placement in the network topology: #172

@shlomi-noach
Copy link
Collaborator

Related: #175, a orchestrator/raft setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants