Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MXS-1690: Need Backpressure mechanism #169

Merged
merged 9 commits into from Mar 7, 2018

Conversation

ybbct
Copy link
Contributor

@ybbct ybbct commented Mar 2, 2018

I am contributing the new code of the whole pull request, including one or
several files that are either new files or modified ones, under the BSD-new
license.

Prevent buffer too much data buffer in write queue when client or server is busy.

  1. Add a single linked list to session for tracking all backend dcbs.
  2. Use poll_remove_dcb to block new data to comming, and use poll_add_dcb to release the blocking.
  3. Audit write queue length in dcb write operations, call releate callbacks when writeq length above high water mark or below low water mark.

Downstream throttling

  1. Register DCB_REASON_HIGH_WATER/DCB_REASON_LOW_WATER callbacks for every client_dcb at dcb_accept;

  2. Use struct session to get all backend dcbs, throttle on these backend dcbs;

Upstream throttling

  1. Register calllbacks for every backend dcbs at dcb_connect
  2. Use struct session to get client_dcb, throttle on client_dcb.

@@ -28,7 +28,8 @@ MXS_BEGIN_DECLS
#define DEFAULT_NTHREADS 1 /**< Default number of polling threads */
#define DEFAULT_QUERY_RETRIES 0 /**< Number of retries for interrupted queries */
#define DEFAULT_QUERY_RETRY_TIMEOUT 5 /**< Timeout for query retries */

#define DEFAULT_WRITEQ_HIGH_WATER 8096 /**< Defalut high water mark of dcb write queue */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable throttling by default to keep current behavior. The value 0 can be used to tell that the throttling is disabled.

@@ -3415,8 +3438,6 @@ int poll_add_dcb(DCB *dcb)
{
ss_dassert(dcb->dcb_role == DCB_ROLE_CLIENT_HANDLER ||
dcb->dcb_role == DCB_ROLE_BACKEND_HANDLER);
ss_dassert(Worker::get_current_id() != -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These must not be removed.

@@ -176,6 +178,7 @@ typedef struct dcb
bool ssl_write_want_read; /*< Flag */
bool ssl_write_want_write; /*< Flag */
bool was_persistent; /**< Whether this DCB was in the persistent pool */
bool high_warter_has_reached; /** High warter has reached, to determine whether need release throttle */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, should be high_water_has_reached.

@@ -128,6 +128,9 @@ void dcb_global_init()
this_unit.dcb_initialized.state = DCB_STATE_ALLOC;
this_unit.dcb_initialized.ssl_state = SSL_HANDSHAKE_UNKNOWN;
this_unit.dcb_initialized.poll.handler = dcb_poll_handler;
this_unit.dcb_initialized.high_warter_has_reached = false;
this_unit.dcb_initialized.low_water = 65536;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use values from config_writeq_high_water and config_writeq_low_water here. This way the value is not assigned twice.

@@ -888,6 +895,13 @@ dcb_write(DCB *dcb, GWBUF *queue)
dcb->stats.n_buffered++;
dcb_drain_writeq(dcb);

if (DCB_ABOVE_HIGH_WATER(dcb) && !dcb->high_warter_has_reached)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with something like this the router API changes could be removed:

bool add_func(DCB *dcb, void *data)
{
    MXS_SESSION* session = (MXS_SESSION*)data;

    if (dcb->session == session && dcb->dcb_role == DCB_ROLE_BACKEND_HANDLER)
    {
        poll_add_dcb(dcb);
    }

    return true;
}

bool remove_func(DCB *dcb, void *data)
{
    MXS_SESSION* session = (MXS_SESSION*)data;

    if (dcb->session == session && dcb->dcb_role == DCB_ROLE_BACKEND_HANDLER)
    {
        poll_remove_dcb(dcb);
    }

    return true;
}

/* Code in dcb.cc:898 */

    if (DCB_ABOVE_HIGH_WATER(dcb) && !dcb->high_water_has_reached)
    {
        dcb_foreach(remove_func, dcb->session)
        dcb->high_warter_has_reached = true;
        dcb->stats.n_high_water++;
    }

/* ... snip ... */

    if (dcb->high_warter_has_reached && DCB_BELOW_LOW_WATER(dcb))
    {
        dcb_foreach(add_func, dcb->session);
        dcb->high_warter_has_reached = false;
        dcb->stats.n_low_water++;
    }

Further improvements to efficiency can be done by linking the DCBs into the MXS_SESSION object. This can be done as a part of general refactoring work for the core.

@@ -176,6 +178,7 @@ typedef struct dcb
bool ssl_write_want_read; /*< Flag */
bool ssl_write_want_write; /*< Flag */
bool was_persistent; /**< Whether this DCB was in the persistent pool */
bool high_warter_has_reached; /** High warter has reached, to determine whether need release throttle */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it by intention this is called waRter and not water?

@@ -2453,6 +2478,10 @@ dcb_accept(DCB *dcb)
return NULL;
}

/* Register downstream throttling callbacks */
dcb_add_callback(client_dcb, DCB_REASON_HIGH_WATER, session_downstream_throttle_callback, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callbacks should only be added if the throttling is enabled.

@@ -508,6 +512,10 @@ dcb_connect(SERVER *server, MXS_SESSION *session, const char *protocol)
dcb_free_all_memory(dcb);
return NULL;
}

/* Register throttling callbacks */
dcb_add_callback(dcb, DCB_REASON_HIGH_WATER, session_upstream_throttle_callback, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callbacks should only be added if the throttling is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define DCB_BELOW_LOW_WATER(x)          ((x)->low_water && (x)->writeqlen < (x)->low_water)
#define DCB_ABOVE_HIGH_WATER(x)         ((x)->high_water && (x)->writeqlen > (x)->high_water)


if (DCB_ABOVE_HIGH_WATER(dcb) && !dcb->high_water_has_reached)
{
        dcb_call_callback(dcb, DCB_REASON_HIGH_WATER);
        ....
}

Since throttle is disable when high_water == 0, so I think it is safe to add throttling callbacks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe to add them but it will cause an extra memory allocation. For this reason the callbacks should not be added unless the throttling is enabled.

@@ -278,6 +279,9 @@ void session_link_backend_dcb(MXS_SESSION *session, DCB *dcb)
dcb->service = session->service;
/** Move this DCB under the same thread */
dcb->poll.thread.id = session->client_dcb->poll.thread.id;
/** Track session's backends */
dcb->next_backend = session->backends.head;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list must be updated every time a DCB is closed as otherwise the pointer will point to freed memory.
The correct place to do it seems to be in dcb_final_close inside the following branch.

    if (dcb->dcb_role == DCB_ROLE_BACKEND_HANDLER &&  // Backend DCB
        dcb->state == DCB_STATE_POLLING &&            // Being polled
        dcb->persistentstart == 0 &&                  /** Not already in (> 0) or being evicted from (-1)
                                                       * the persistent pool. */
        dcb->server)                                  // And has a server
    {
        /* May be a candidate for persistence, so save user name */
        const char *user;
        user = session_get_user(dcb->session);
        if (user && strlen(user) && !dcb->user)
        {
            dcb->user = MXS_STRDUP_A(user);
        }

       ///////////////////
       // ADD CODE HERE //
       ///////////////////

        if (dcb_maybe_add_persistent(dcb))
        {
            dcb->n_close = 0;
        }
    }

Creating a new function (e.g. void session_unlink_backend_dcb(MXS_SESSION *session, DCB *dcb)) that updates the list would solve the problem.

@@ -481,4 +486,24 @@ MXS_SESSION* session_get_current();
**/
uint64_t session_get_current_id();

/**
* @brief DCB callback for upstream throtting
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some more comments on when these are called and for what reason would help to explain their intended usage.

For example, explaining why and when the session_upstream_throttle_callback function is called and what happens when throttling is enabled or disabled.

@ybbct
Copy link
Contributor Author

ybbct commented Mar 6, 2018

CI build failed for the develop branch is not work well now

@markus456
Copy link
Contributor

Ah, the CI failure appears to be a bug in the Travis configuration file. We'll fix that right away.

@@ -176,11 +178,13 @@ typedef struct dcb
bool ssl_write_want_read; /*< Flag */
bool ssl_write_want_write; /*< Flag */
bool was_persistent; /**< Whether this DCB was in the persistent pool */
bool high_water_has_reached; /** High water mark has reached, to determine whether need release throttle */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, but high_water_reached would be better, as high_water_has_reached is not grammatically correct. high_water_has_been_reached would be, but that is unnecessarily long.

@@ -1700,6 +1711,32 @@ handle_global_item(const char *name, const char *value)

gateway.users_refresh_time = users_refresh_time;
}
else if (strcmp(name, CN_WRITEQ_HIGH_WATER) == 0)
{
char* endptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use here

gateway.writeq_high_water = get_suffixed_size(value);

which would allow the high and low water marks to be specified as described here: https://github.com/mariadb-corporation/MaxScale/blob/2.2/Documentation/Getting-Started/Configuration-Guide.md#sizes

Should there be a minimum value for the high water mark?

int intval = strtol(value, &endptr, 0);
if (*endptr == '\0' && intval >= 0)
{
gateway.writeq_high_water = intval;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add here a

MXS_NOTICE("Writeq high water mark set to: %d", intval);

@@ -508,6 +512,13 @@ dcb_connect(SERVER *server, MXS_SESSION *session, const char *protocol)
dcb_free_all_memory(dcb);
return NULL;
}

/* Register upstream throttling callbacks */
if (dcb->high_water && dcb->low_water && dcb->high_water > dcb->low_water)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a check somewhere that ensures that high_water > low_water and if it isn't logs an error and sets both to 0. There should also be MIN_WRITEQ_HIGH_WATER and MIN_WRITEQ_LOW_WATER below which the values are not allowed to be set.

@@ -878,6 +889,7 @@ dcb_log_errors_SSL(DCB *dcb, int ret)
int
dcb_write(DCB *dcb, GWBUF *queue)
{
dcb->writeqlen += gwbuf_length(queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note. We have had situations where calling gwbuf_length(...) caused a performance hit. In practice a situation where the GWBUF actually was a very long chain. We need to optimize that case so that the cost is always constant irrespective of how long a chain the GWBUF is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get every GWBUF's length only once, so I think maybe it is ok; I can test it by select a huge resultset and very row is small;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is OK. We created https://jira.mariadb.org/browse/MXS-1704 to remind us that it would be good to optimize gwbuf_length at some point.

@@ -1016,6 +1035,15 @@ int dcb_drain_writeq(DCB *dcb)
dcb_call_callback(dcb, DCB_REASON_DRAINED);
}

dcb->writeqlen -= total_written;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an assert here

ss_dassert(dcb->writeqlen >= total_written);

struct
{
struct dcb *next; /**< Next DCB in owning thread's list */
struct dcb *tail; /**< Last DCB in owning thread's list */
} thread;
struct dcb *next_backend; /** Next Backend DCB in owning session's list */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here. A DCB is basically an abstraction for a connection, while the session object binds together one client connection and indirectly a number of backend connections. With this change, that separation of concerns is broken and that is not good.

Copy link
Contributor

@markus456 markus456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks good.

gateway.writeq_high_water <= gateway.writeq_low_water)
{
rval = -1;
MXS_ERROR("Invaild configuration writeq_high_water should greater than writeq_low_water");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos should be corrected.

Invaild -> Invalid
configuration -> configuration,
should greater -> should be greater

gateway.writeq_high_water = get_suffixed_size(value);
if (gateway.writeq_high_water < MIN_WRITEQ_HIGH_WATER)
{
MXS_ERROR("Writeq high water mark should greater than %d", MIN_WRITEQ_HIGH_WATER);
Copy link
Contributor

@jhnwkmn jhnwkmn Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this into a warning

MXS_WARNING("The specified writeq high water mark %d, is smaller than the minimum allowed size %d. Changing to minimum.",
             gateway.writeq_high_water, MIN_WRITEQ_HIGH_WATER);
gateway.writeq_high_water = MIN_WRITEQ_HIGH_WATER;

And same approach for the low water.

Copy link
Contributor

@jhnwkmn jhnwkmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With those minor things corrected this looks good.

@jhnwkmn jhnwkmn merged commit 82bb624 into mariadb-corporation:develop Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants