Skip to content

Commit

Permalink
Remove serialized HTTP headers from storeClientCopy() (#1335)
Browse files Browse the repository at this point in the history
Do not send serialized HTTP response header bytes in storeClientCopy()
answers. Ignore serialized header size when calling storeClientCopy().

This complex change adjusts storeClientCopy() API to addresses several
related problems with storeClientCopy() and its callers. The sections
below summarize storeClientCopy() changes and then move on to callers.

Squid incorrectly assumed that serialized HTTP response headers are read
from disk in a single storeRead() request. In reality, many situations
lead to store_client::readBody() receiving partial HTTP headers,
resulting in parseCharBuf() failure and a level-0 cache.log message:

    Could not parse headers from on disk object

Inadequate handling of this failure resulted in a variety of problems.
Squid now accumulates storeRead() results to parse larger headers and
also handles parsing failures better, but we could not just stop there.

With the storeRead() accumulation in place, it is no longer possible to
send parsed serialized HTTP headers to storeClientCopy() callers because
those callers do not provide enough buffer space to fit larger headers.
Increasing caller buffer capacity does not work well because the actual
size of the serialized header is unknown in advance and may be quite
large. Always allocating large buffers "just in case" is bad for
performance. Finally, larger buffers may jeopardize hard-to-find code
that uses hard-coded 4KB buffers without using HTTP_REQBUF_SZ macro.

Fortunately, storeClientCopy() callers either do not care about
serialized HTTP response headers or should not care about them! The API
forced callers to deal with serialized headers, but callers could (and
some did) just use the parsed headers available in the corresponding
MemObject. With this API change, storeClientCopy() callers no longer
receive serialized headers and do not need to parse or skip them.
Consequently, callers also do not need to account for response headers
size when computing offsets for subsequent storeClientCopy() requests.

Restricting storeClientCopy() API to HTTP _body_ bytes removed a lot of
problematic caller code. Caller changes are summarized further below.

A similar HTTP response header parsing problem existed in shared memory
cache code. That code was actually aware that headers may span multiple
cache slices but incorrectly assumed that httpMsgParseStep() accumulates
input as needed (to make another parsing "step"). It does not. Large
response headers cached in shared memory triggered a level-1 message:

    Corrupted mem-cached headers: e:...

Fixed MemStore code now accumulates serialized HTTP response headers as
needed to parse them, sharing high-level parsing code with store_client.

Old clientReplyContext methods worked hard to skip received serialized
HTTP headers. The code contained dangerous and often complex/unreadable
manipulation of various raw offsets and buffer pointers, aggravated by
the perceived need to save/restore those offsets across asynchronous
checks (see below). That header skipping code is gone now. Several stale
and misleading comments related to Store buffers management were also
removed or updated.

We replaced reqofs/reqsize with simpler/safer lastStreamBufferedBytes,
while becoming more consistent with that "cached" info invalidation. We
still need this info to resume HTTP body processing after asynchronous
http_reply_access checks and cache hit validations, but we no longer
save/restore this info for hit validation: No need to save/restore
information about the buffer that hit validation does not use and must
never touch!

The API change also moved from-Store StoreIOBuffer usage closer to
StoreIOBuffers manipulated by Clients Streams code. Buffers in both
categories now contain just the body bytes, and both now treat zero
length as EOF only _after_ processing the response headers.

These changes improve overall code quality, but this code path and these
changes still suffer from utterly unsafe legacy interfaces like
StoreIOBuffer and clientStreamNode. We cannot rely on the compiler to
check our work. The risk of these changes exposing/causing bugs is high.

asHandleReply() expected WHOIS response body bytes where serialized HTTP
headers were! The code also had multiple problems typical for manually
written C parsers dealing with raw input buffers. Now replaced with a
Tokenizer-based code.

To skip received HTTP response headers, peerDigestHandleReply() helper
functions called headersEnd() on the received buffer. Twice. We have now
merged those two parsing helper functions into one (that just checks the
already parsed headers). This merger preserved "304s must come with
fetch->pd->cd" logic that was hidden/spread across those two functions.

urnHandleReply() re-parsed received HTTP response headers. We left its
HTTP body parsing code unchanged except for polishing NUL-termination.

netdbExchangeHandleReply() re-parsed received HTTP response headers to
find where they end (via headersEnd()). We improved handing of corner
cases and replaced some "tricky bits" code, reusing the new
Store::ParsingBuffer class. The net_db record parsing code is unchanged.

Mgr::StoreToCommWriter::noteStoreCopied() is a very special case. It
actually worked OK because, unlike all other storeClientCopy() callers,
this code does not get serialized HTTP headers from Store: The code
adding bytes to the corresponding StoreEntry does not write serialized
HTTP headers at all. StoreToCommWriter is used to deliver kid-specific
pieces of an HTTP body of an SMP cache manager response. The HTTP
headers of that response are handled elsewhere. We left this code
unchanged, but the existence of the special no-headers case does
complicate storeClientCopy() API documentation, implementation, and
understanding.

Co-authored-by: Eduard Bagdasaryan <eduard.bagdasaryan@measurement-factory.com>
  • Loading branch information
2 people authored and kinkie committed Oct 11, 2023
1 parent 29805b0 commit a27bf4b
Show file tree
Hide file tree
Showing 24 changed files with 1,090 additions and 727 deletions.
34 changes: 34 additions & 0 deletions src/HttpReply.cc
Expand Up @@ -21,7 +21,9 @@
#include "HttpReply.h"
#include "HttpRequest.h"
#include "MemBuf.h"
#include "sbuf/Stream.h"
#include "SquidConfig.h"
#include "SquidMath.h"
#include "Store.h"
#include "StrList.h"

Expand Down Expand Up @@ -456,6 +458,38 @@ HttpReply::parseFirstLine(const char *blk_start, const char *blk_end)
return sline.parse(protoPrefix, blk_start, blk_end);
}

size_t
HttpReply::parseTerminatedPrefix(const char * const terminatedBuf, const size_t bufSize)
{
auto error = Http::scNone;
const bool eof = false; // TODO: Remove after removing atEnd from HttpHeader::parse()
if (parse(terminatedBuf, bufSize, eof, &error)) {
debugs(58, 7, "success after accumulating " << bufSize << " bytes and parsing " << hdr_sz);
Assure(pstate == Http::Message::psParsed);
Assure(hdr_sz > 0);
Assure(!Less(bufSize, hdr_sz)); // cannot parse more bytes than we have
return hdr_sz; // success
}

Assure(pstate != Http::Message::psParsed);
hdr_sz = 0;

if (error) {
throw TextException(ToSBuf("failed to parse HTTP headers",
Debug::Extra, "parser error code: ", error,
Debug::Extra, "accumulated unparsed bytes: ", bufSize,
Debug::Extra, "reply_header_max_size: ", Config.maxReplyHeaderSize),
Here());
}

debugs(58, 3, "need more bytes after accumulating " << bufSize << " out of " << Config.maxReplyHeaderSize);

// the parse() call above enforces Config.maxReplyHeaderSize limit
// XXX: Make this a strict comparison after fixing Http::Message::parse() enforcement
Assure(bufSize <= Config.maxReplyHeaderSize);
return 0; // parsed nothing, need more data
}

void
HttpReply::configureContentLengthInterpreter(Http::ContentLengthInterpreter &interpreter)
{
Expand Down
7 changes: 7 additions & 0 deletions src/HttpReply.h
Expand Up @@ -125,6 +125,13 @@ class HttpReply: public Http::Message
/// parses reply header using Parser
bool parseHeader(Http1::Parser &hp);

/// Parses response status line and headers at the start of the given
/// NUL-terminated buffer of the given size. Respects reply_header_max_size.
/// Assures pstate becomes Http::Message::psParsed on (and only on) success.
/// \returns the number of bytes in a successfully parsed prefix (or zero)
/// \retval 0 implies that more data is needed to parse the response prefix
size_t parseTerminatedPrefix(const char *, size_t);

private:
/** initialize */
void init();
Expand Down
6 changes: 6 additions & 0 deletions src/MemObject.cc
Expand Up @@ -353,6 +353,12 @@ MemObject::policyLowestOffsetToKeep(bool swap) const
*/
int64_t lowest_offset = lowestMemReaderOffset();

// XXX: Remove the last (Config.onoff.memory_cache_first-based) condition
// and update keepForLocalMemoryCache() accordingly. The caller wants to
// remove all local memory that is safe to remove. Honoring caching
// preferences is its responsibility. Our responsibility is safety. The
// situation was different when ff4b33f added that condition -- there was no
// keepInLocalMemory/keepForLocalMemoryCache() call guard back then.
if (endOffset() < lowest_offset ||
endOffset() - inmem_lo > (int64_t)Config.Store.maxInMemObjSize ||
(swap && !Config.onoff.memory_cache_first))
Expand Down
9 changes: 9 additions & 0 deletions src/MemObject.h
Expand Up @@ -89,6 +89,15 @@ class MemObject
bool appliedUpdates = false;

void stat (MemBuf * mb) const;

/// The offset of the last memory-stored HTTP response byte plus one.
/// * HTTP response headers (if any) are stored at offset zero.
/// * HTTP response body byte[n] usually has offset (hdr_sz + n), where
/// hdr_sz is the size of stored HTTP response headers (zero if none); and
/// n is the corresponding byte offset in the whole resource body.
/// However, some 206 (Partial Content) response bodies are stored (and
/// retrieved) as regular 200 response bodies, disregarding offsets of
/// their body parts. \sa HttpStateData::decideIfWeDoRanges().
int64_t endOffset () const;

/// sets baseReply().hdr_sz (i.e. written reply headers size) to endOffset()
Expand Down
75 changes: 37 additions & 38 deletions src/MemStore.cc
Expand Up @@ -17,6 +17,8 @@
#include "MemObject.h"
#include "MemStore.h"
#include "mime_header.h"
#include "sbuf/SBuf.h"
#include "sbuf/Stream.h"
#include "SquidConfig.h"
#include "SquidMath.h"
#include "StoreStats.h"
Expand Down Expand Up @@ -311,19 +313,25 @@ MemStore::get(const cache_key *key)
// create a brand new store entry and initialize it with stored info
StoreEntry *e = new StoreEntry();

// XXX: We do not know the URLs yet, only the key, but we need to parse and
// store the response for the Root().find() callers to be happy because they
// expect IN_MEMORY entries to already have the response headers and body.
e->createMemObject();

anchorEntry(*e, index, *slot);

const bool copied = copyFromShm(*e, index, *slot);

if (copied)
return e;
try {
// XXX: We do not know the URLs yet, only the key, but we need to parse and
// store the response for the Root().find() callers to be happy because they
// expect IN_MEMORY entries to already have the response headers and body.
e->createMemObject();

anchorEntry(*e, index, *slot);

// TODO: make copyFromShm() throw on all failures, simplifying this code
if (copyFromShm(*e, index, *slot))
return e;
debugs(20, 3, "failed for " << *e);
} catch (...) {
// see store_client::parseHttpHeadersFromDisk() for problems this may log
debugs(20, DBG_IMPORTANT, "ERROR: Cannot load a cache hit from shared memory" <<
Debug::Extra << "exception: " << CurrentException <<
Debug::Extra << "cache_mem entry: " << *e);
}

debugs(20, 3, "failed for " << *e);
map->freeEntry(index); // do not let others into the same trap
destroyStoreEntry(static_cast<hash_link *>(e));
return nullptr;
Expand Down Expand Up @@ -469,6 +477,8 @@ MemStore::copyFromShm(StoreEntry &e, const sfileno index, const Ipc::StoreMapAnc
Ipc::StoreMapSliceId sid = anchor.start; // optimize: remember the last sid
bool wasEof = anchor.complete() && sid < 0;
int64_t sliceOffset = 0;

SBuf httpHeaderParsingBuffer;
while (sid >= 0) {
const Ipc::StoreMapSlice &slice = map->readableSlice(index, sid);
// slice state may change during copying; take snapshots now
Expand All @@ -491,10 +501,18 @@ MemStore::copyFromShm(StoreEntry &e, const sfileno index, const Ipc::StoreMapAnc
const StoreIOBuffer sliceBuf(wasSize - prefixSize,
e.mem_obj->endOffset(),
page + prefixSize);
if (!copyFromShmSlice(e, sliceBuf, wasEof))
return false;

copyFromShmSlice(e, sliceBuf);
debugs(20, 8, "entry " << index << " copied slice " << sid <<
" from " << extra.page << '+' << prefixSize);

// parse headers if needed; they might span multiple slices!
auto &reply = e.mem().adjustableBaseReply();
if (reply.pstate != Http::Message::psParsed) {
httpHeaderParsingBuffer.append(sliceBuf.data, sliceBuf.length);
if (reply.parseTerminatedPrefix(httpHeaderParsingBuffer.c_str(), httpHeaderParsingBuffer.length()))
httpHeaderParsingBuffer = SBuf(); // we do not need these bytes anymore
}
}
// else skip a [possibly incomplete] slice that we copied earlier

Expand Down Expand Up @@ -526,6 +544,9 @@ MemStore::copyFromShm(StoreEntry &e, const sfileno index, const Ipc::StoreMapAnc
debugs(20, 5, "mem-loaded all " << e.mem_obj->endOffset() << '/' <<
anchor.basics.swap_file_sz << " bytes of " << e);

if (e.mem().adjustableBaseReply().pstate != Http::Message::psParsed)
throw TextException(ToSBuf("truncated mem-cached headers; accumulated: ", httpHeaderParsingBuffer.length()), Here());

// from StoreEntry::complete()
e.mem_obj->object_sz = e.mem_obj->endOffset();
e.store_status = STORE_OK;
Expand All @@ -541,40 +562,18 @@ MemStore::copyFromShm(StoreEntry &e, const sfileno index, const Ipc::StoreMapAnc
}

/// imports one shared memory slice into local memory
bool
MemStore::copyFromShmSlice(StoreEntry &e, const StoreIOBuffer &buf, bool eof)
void
MemStore::copyFromShmSlice(StoreEntry &e, const StoreIOBuffer &buf)
{
debugs(20, 7, "buf: " << buf.offset << " + " << buf.length);

// from store_client::readBody()
// parse headers if needed; they might span multiple slices!
const auto rep = &e.mem().adjustableBaseReply();
if (rep->pstate < Http::Message::psParsed) {
// XXX: have to copy because httpMsgParseStep() requires 0-termination
MemBuf mb;
mb.init(buf.length+1, buf.length+1);
mb.append(buf.data, buf.length);
mb.terminate();
const int result = rep->httpMsgParseStep(mb.buf, buf.length, eof);
if (result > 0) {
assert(rep->pstate == Http::Message::psParsed);
} else if (result < 0) {
debugs(20, DBG_IMPORTANT, "Corrupted mem-cached headers: " << e);
return false;
} else { // more slices are needed
assert(!eof);
}
}
debugs(20, 7, "rep pstate: " << rep->pstate);

// local memory stores both headers and body so copy regardless of pstate
const int64_t offBefore = e.mem_obj->endOffset();
assert(e.mem_obj->data_hdr.write(buf)); // from MemObject::write()
const int64_t offAfter = e.mem_obj->endOffset();
// expect to write the entire buf because StoreEntry::write() never fails
assert(offAfter >= 0 && offBefore <= offAfter &&
static_cast<size_t>(offAfter - offBefore) == buf.length);
return true;
}

/// whether we should cache the entry
Expand Down
2 changes: 1 addition & 1 deletion src/MemStore.h
Expand Up @@ -80,7 +80,7 @@ class MemStore: public Store::Controlled, public Ipc::StoreMapCleaner
void copyToShm(StoreEntry &e);
void copyToShmSlice(StoreEntry &e, Ipc::StoreMapAnchor &anchor, Ipc::StoreMap::Slice &slice);
bool copyFromShm(StoreEntry &e, const sfileno index, const Ipc::StoreMapAnchor &anchor);
bool copyFromShmSlice(StoreEntry &e, const StoreIOBuffer &buf, bool eof);
void copyFromShmSlice(StoreEntry &, const StoreIOBuffer &);

void updateHeadersOrThrow(Ipc::StoreMapUpdate &update);

Expand Down
65 changes: 57 additions & 8 deletions src/StoreClient.h
Expand Up @@ -13,10 +13,23 @@
#include "base/AsyncCall.h"
#include "base/forward.h"
#include "dlink.h"
#include "store/ParsingBuffer.h"
#include "StoreIOBuffer.h"
#include "StoreIOState.h"

typedef void STCB(void *, StoreIOBuffer); /* store callback */
/// A storeClientCopy() callback function.
///
/// Upon storeClientCopy() success, StoreIOBuffer::flags.error is zero, and
/// * HTTP response headers (if any) are available via MemObject::freshestReply();
/// * HTTP response body bytes (if any) are available via StoreIOBuffer.
///
/// STCB callbacks may use response semantics to detect certain EOF conditions.
/// Callbacks that expect HTTP headers may call store_client::atEof(). Similar
/// to clientStreamCallback() callbacks, callbacks dedicated to receiving HTTP
/// bodies may use zero StoreIOBuffer::length as an EOF condition.
///
/// Errors are indicated by setting StoreIOBuffer flags.error.
using STCB = void (void *, StoreIOBuffer);

class StoreEntry;
class ACLFilledChecklist;
Expand Down Expand Up @@ -88,7 +101,13 @@ class store_client

void dumpStats(MemBuf * output, int clientNumber) const;

int64_t cmp_offset;
// TODO: When STCB gets a dedicated Answer type, move this info there.
/// Whether the last successful storeClientCopy() answer was known to
/// contain the last body bytes of the HTTP response
/// \retval true requesting bytes at higher offsets is futile
/// \sa STCB
bool atEof() const { return atEof_; }

#if STORE_CLIENT_LIST_DEBUG

void *owner;
Expand Down Expand Up @@ -123,32 +142,51 @@ class store_client
dlink_node node;

private:
bool moreToSend() const;
bool moreToRead() const;
bool canReadFromMemory() const;
bool answeredOnce() const { return answers >= 1; }
bool sendingHttpHeaders() const;
int64_t nextHttpReadOffset() const;

void fileRead();
void scheduleDiskRead();
void scheduleMemRead();
void readFromMemory();
void scheduleRead();
bool startSwapin();
void handleBodyFromDisk();
void maybeWriteFromDiskToMemory(const StoreIOBuffer &);

bool parseHttpHeadersFromDisk();
bool tryParsingHttpHeaders();
void skipHttpHeadersFromDisk();

void fail();
void callback(ssize_t);
void noteCopiedBytes(size_t);
void noteEof();
void noteNews();
void finishCallback();
static void FinishCallback(store_client *);

int type;
bool object_ok;

/// \copydoc atEof()
bool atEof_;

/// Storage and metadata associated with the current copy() request. Ought
/// to be ignored when not answering a copy() request.
StoreIOBuffer copyInto;

/// The number of bytes loaded from Store into copyInto while answering the
/// current copy() request. Ought to be ignored when not answering.
size_t copiedSize;
/// the total number of finishCallback() calls
uint64_t answers;

/// Accumulates raw bytes read from Store while answering the current copy()
/// request. Buffer contents depends on the source and parsing stage; it may
/// hold (parts of) swap metadata, HTTP response headers, and/or HTTP
/// response body bytes.
std::optional<Store::ParsingBuffer> parsingBuffer;

StoreIOBuffer lastDiskRead; ///< buffer used for the last storeRead() call

/* Until we finish stuffing code into store_client */

Expand All @@ -173,7 +211,18 @@ class store_client
} _callback;
};

/// Asynchronously read HTTP response headers and/or body bytes from Store.
///
/// The requested zero-based HTTP body offset is specified via the
/// StoreIOBuffer::offset field. The first call (for a given store_client
/// object) must specify zero offset.
///
/// The requested HTTP body portion size is specified via the
/// StoreIOBuffer::length field. The function may return fewer body bytes.
///
/// See STCB for result delivery details.
void storeClientCopy(store_client *, StoreEntry *, StoreIOBuffer, STCB *, void *);

store_client* storeClientListAdd(StoreEntry * e, void *data);
int storeUnregister(store_client * sc, StoreEntry * e, void *data);
int storePendingNClients(const StoreEntry * e);
Expand Down
3 changes: 3 additions & 0 deletions src/StoreIOBuffer.h
Expand Up @@ -43,6 +43,9 @@ class StoreIOBuffer
return Range<int64_t>(offset, offset + length);
}

/// convenience method for changing the offset of a being-configured buffer
StoreIOBuffer &positionAt(const int64_t newOffset) { offset = newOffset; return *this; }

void dump() const {
if (fwrite(data, length, 1, stderr)) {}
if (fwrite("\n", 1, 1, stderr)) {}
Expand Down

0 comments on commit a27bf4b

Please sign in to comment.