Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding arbitrary information to execution graph #166

Merged
merged 15 commits into from
Nov 2, 2021
Merged

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Oct 27, 2021

In this PR I introduce a feature to count arbitrary events in a per-message fashion. The motivation for this PR is to be able to shed light into the system's behaviour in a low-intrusion manner. Given that protobuf messages are first-class citizens in the system, it feels natural to include this tracing information in the message objects.

To start the recording, we must set the recordexecgraph flag in the message.

At the moment, the fields populated are not serialised to and from JSON. The reason being that serialising an arbitrary map will add a lot of complexity to the already convoluted methods.

After some local testing, I think it is a better option to go for #167 first.

@csegarragonz csegarragonz added the enhancement New feature or request label Oct 27, 2021
@csegarragonz csegarragonz self-assigned this Oct 27, 2021
@csegarragonz csegarragonz marked this pull request as ready for review October 27, 2021 16:32
@csegarragonz csegarragonz force-pushed the log-calls branch 2 times, most recently from bc3f886 to 7290d36 Compare October 27, 2021 16:49
@Shillaker
Copy link
Collaborator

Shillaker commented Oct 28, 2021

This is good, as any info from the message will automatically be put onto the execution graph result.

However, I have a couple of points/ questions:

  • Enabling this only in debug builds is somewhat against the original aim of this work, which was to trace distributed deployments (which will be using a release build). Our primary use-case will be when there's a big deployment and something is going wrong, and we don't want to have to rebuild and redeploy just to do the tracing.
  • I would suggest adding another boolean parameter to the Message called recordExecGraph, then wrap any exec graph interaction in a conditional based on that (defaulting to false). This way a user can reinvoke their function with this flag set to true, then query the exec graph without redeploying or recompiling.
  • If we have to add a new protobuf object for every type of tracing we are going to bloat the protobuf definitions quite a lot, and it makes it more fiddly to add new information. I would suggest having an arbitrary key/ value approach, e.g.
message ExecGraphDetail {
    string key = 1;
    string value = 2;
}

message Message {
...
    bool recordExecGraph = 38;
    repeated ExecGraphDetail execGraphDetails = 39;
}

Then adding info would be done through some utility functions:

void addExecGraphDetail(faabric::Message &msg, const std::string &key, const std::string &value) {
    // Add an entry to the execGraphDetails;
}

void incrementExecGraphCounter(faabric::Message &msg, const std::string &key, const std::string &value) {
    // Get the current value, parse as an int, increment, write back
}

Then for now, for simple MPI tracing we'd be interested in the following, however, I'm not sure if it's possible:

  • mpi_cross_host_msg_count - counter saying how many messages were sent outside this host
  • mpi_in_host_msg_count - counter saying how many messages were sent locally

The execution graph currently relies on Redis, which we'd like to remove one day, but I think we can switch this to use Faabric state under the hood instead (eventually).

@csegarragonz
Copy link
Collaborator Author

csegarragonz commented Oct 28, 2021

I agree with most of your comments, some observations.

  • I think there's value in starting and stoping the recording, keeping the records in memory, and then modifying the message once. This way we ensure: never editing the wrong message, no race-conditions, and more (?) efficient tracing/no-opping.
  • Given that this won't be disabled in Release builds, I've added a bit of complexity to the code in order to properly no-op functions if the message value is not set.
  • If we keep track of the number of messages sent to each rank in the exec graph, then using other message entries like mpirank and masterhost, it is possible to work out how many messages are cross host and in host. It becomes a matter of post-processing the execution graph.

@Shillaker
Copy link
Collaborator

Shillaker commented Oct 28, 2021

I think there's value in starting and stoping the recording, keeping the records in memory, and then modifying the message once. This way we ensure: never editing the wrong message, no race-conditions, and more (?) efficient tracing/no-opping.

I think this is unnecessary complexity, it means we have to maintain another map of message IDs and values, when it seems like we will always have a reference to the target message when doing the tracing (so could just set it directly). However, perhaps I don't understand the risks/ difficulties. When would we modify the wrong message? If we're passing a reference to the message object itself into the functions that add info (and modifying it in place), then by definition it's the right one; the place where I can see this going wrong is if we're passing a copy of the original message at some point, and therefore edits don't persist on the original. However, we should only ever be passing messages around by reference and never copying, so I'm not sure this would be a problem. I don't see race conditions being a problem either, as each message is only ever handled by a single executor (or single scheduler thread) IIRC.

Given that this won't be disabled in Release builds, I've added a bit of complexity to the code in order to properly no-op functions if the message value is not set.

I'm not sure I understand the points about no-opping and efficiency. What I'm saying is that we won't do any execution graph stuff by default in either Release or Debug builds (i.e. less than we do now, where we record the graph for every request), unless someone sets the recordExecGraph flag (i.e. before adding any info we'd have an if(msg.recordExecGraph()) { // do something }. Once this is set, I'm not sure we need to worry too much about performance (as the user has explicitly asked for it).

If we keep track of the number of messages sent to each rank in the exec graph, then using other message entries like mpirank and masterhost, it is possible to work out how many messages are cross host and in host. It becomes a matter of post-processing the execution graph.

Yes, good point, although i'm not sure how straightforward it will be to map ranks to hosts, especially if we ever do migration. I would say recording counts of messages to hosts as well as to ranks would be great if possible (each would just be a different key/ value).

@@ -33,6 +34,9 @@ static thread_local std::unordered_map<
std::unique_ptr<faabric::transport::AsyncSendMessageEndpoint>>
ranksSendEndpoints;

// Id of the message that created this thread-local instance
static thread_local int thisMsgId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thread-local state makes me nervous and I'd like to avoid it if at all possible. If we edit the message objects directly then perhaps we could avoid this.


checkMessageNotLinked();

linkedMsg = std::make_shared<faabric::Message>(msg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not copy of the message?

linkedMsg = std::make_shared<faabric::Message>(msg);

// If message flag is not set, no-op the increment functions for minimal
// overhead
Copy link
Collaborator

@Shillaker Shillaker Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of black magic and potentially premature optimisation. Could we avoid this complexity by editing messages directly, then putting in an if(!msg.recordexecgraph()) { return; } in those methods (or the same check but wrapping the call to those methods)?

@@ -148,6 +153,10 @@ message Message {
string sgxTag = 35;
bytes sgxPolicy = 36;
bytes sgxResult = 37;

// Exec-graph utils
bool recordExecGraph = 38;
Copy link
Collaborator

@Shillaker Shillaker Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag needs to be added to the message JSON serialisation/ deserialisation to allow clients to pass it in. I.e. the request JSON would look something like:

{
    "user": "mpi",
    "func": "some_mpi_func",
    ...
    "exec_graph": true
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, I think we should go for #167 . The amount of complexity reduced is vast, and definately worth the time.

Copy link
Collaborator

@Shillaker Shillaker Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, although for now we should still add it to what we have. The change in complexity may be good, but it's a bit of code that currently requires very little maintenance, and making the change would inevitably take at least a few hours (especially as it would require testing the operations from the two language clients and all the experiments).

Copy link
Collaborator

@Shillaker Shillaker Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I take the point around complexity of a map. Unfortunately this work isn't usable until we pass it back to the client, and using it to debug an issue with an experiment is our top priority. Is it possible to use the protobuf serialisation to serialise just this map to JSON, put that as a string into the JSON sent back to the client, then have the client deserialise it?

@@ -9,9 +9,9 @@ class MpiContext
public:
MpiContext();

int createWorld(const faabric::Message& msg);
int createWorld(faabric::Message& msg);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to de-constify this so that we can actually edit the message when tracing.

Copy link
Collaborator

@Shillaker Shillaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is looking good, just a couple of tweaks and it's good to go.

tests/test/scheduler/test_exec_graph.cpp Outdated Show resolved Hide resolved
const std::string& key,
const int valueToIncrement = 1);

static inline std::string const mpiMsgCountPrefix = "mpi-msgcount-torank-";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is MPI-specific and only used by MPI stuff, therefore should live in an MPI header (and I would also use a #define to fit with the style of the other constants we define, but an inline std::string is probably equivalent).

Copy link
Collaborator Author

@csegarragonz csegarragonz Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not how we define constants elsewhere, and I thought about the #define option, but I liked the idea of having the string constant sit inside a namespace, thus why I went for the inline option.

Orthogonally, the constant is MPI-specific but also exec-graph specific, i.e. it is only used to record this exec graph details, and is not needed in MPI headers. Thus why I placed it here, I can see us having some of this "prefix" strings together here; happy to move elsewhere though.

src/scheduler/MpiWorld.cpp Outdated Show resolved Hide resolved
const auto& map = msg.execgraphdetails();
for (const auto& it : map) {
out = fmt::format("{},{}:{}", out, it.first, it.second);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this result in a leading ,?

Could instead do this with a sstream, appending a comma and skipping the comma on the last element (a bit like here: https://github.com/faasm/faabric/blob/master/src/util/bytes.cpp#L77)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

auto& map = *msg.mutable_execgraphdetails();
map["foo"] = "bar";
auto& intMap = *msg.mutable_intexecgraphdetails();
intMap["foo"] = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test checking the serialisation of the maps? I can't see a string that looks like "foo:bar,qux:blah".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to actually hardcode the strings. The way we check for correctness is:

  • Set message fields ( =: msgA)
  • Serialise message
  • De-serialise message ( =: msgB)
  • Check msgA == msgB

I was missing the de-serialise and equality check bits.

@@ -266,6 +307,54 @@ std::string getStringFromJson(Document& doc,
return std::string(valuePtr, valuePtr + it->value.GetStringLength());
}

std::map<std::string, std::string> getStringStringMapFromJson(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually completely overlooked the string to json conversion, and updating the checkMessageEquality function.

@Shillaker Shillaker changed the title In-faabric tracing of arbitrary calls Adding arbitrary information to execution graph Nov 1, 2021
@csegarragonz csegarragonz merged commit 832aafe into master Nov 2, 2021
@csegarragonz csegarragonz deleted the log-calls branch November 2, 2021 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants