Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bucket logging endpoint changes #7426

Merged
merged 1 commit into from
Oct 10, 2023
Merged

bucket logging endpoint changes #7426

merged 1 commit into from
Oct 10, 2023

Conversation

aspandey
Copy link
Contributor

@aspandey aspandey commented Aug 2, 2023

Explain the changes

  1. This is a draft of bucket logging feature.
    2.Basic idea is to log bucket operation so that replication engine can use it to replicate
    that operation like put object.
    As soon as operation request return a reply, we get the status and other information for that operation and
    create a message with S3 log format and send it to noobaa log bucket configured by the user.
    Mainly this task is done in handle_request function of s3_rest.js

@aspandey aspandey marked this pull request as draft August 2, 2023 16:35
@aspandey
Copy link
Contributor Author

aspandey commented Aug 2, 2023

@dannyzaken

@aspandey
Copy link
Contributor Author

@Neon-White @tangledbytes This patch contains creation of log records at the endpoint handler request.
Could you please review this patch? Along with other comments , I need your specific comments on the log format and the information (in log records) which is required by replication engine for the replicating objects.

src/api/bucket_api.js Show resolved Hide resolved
@@ -533,6 +551,7 @@ function consume_usage_report() {
}



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this newline can be removed

src/endpoint/s3/s3_bucket_logging.js Outdated Show resolved Hide resolved
@@ -120,6 +120,9 @@ RUN chmod 775 /noobaa_init_files && \
chgrp -R 0 /noobaa_init_files/ && \
chmod -R g=u /noobaa_init_files/

RUN chmod 777 /etc/rsyslog.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove rsyslog related changes for now

@guymguym
Copy link
Member

guymguym commented Sep 7, 2023

@aspandey @dannyzaken Can we include in this PR an option for src/cmd/nsfs to enable bucket logging?

@guymguym
Copy link
Member

guymguym commented Sep 7, 2023

@aspandey @dannyzaken Can we include in this PR an option for src/cmd/nsfs to enable bucket logging?

To clarify - I can set up the syslog interface manually - but I do need a flag to enable sending out the logs for nsfs command line.

return format_logging_response(reply.logging);
}

function format_logging_response(log_set) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function doesn't seem correct to me.

  1. we shouldn't throw an error if there is no bucket logging enabled.
  2. it does not format the response. see the required response here.

Also, we should probably also implement s3_put_bucket_logging. I think both can be done in a different PR together with tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will make changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I have removed this change. Will send different PR as suggested by you.

@@ -120,6 +120,7 @@ RUN chmod 775 /noobaa_init_files && \
chgrp -R 0 /noobaa_init_files/ && \
chmod -R g=u /noobaa_init_files/


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let bucket_info = null;
if (req.params && req.params.bucket) {
try {
bucket_info = await req.object_sdk.internal_rpc_client.bucket.read_bucket_sdk_info({ name: req.params.bucket });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will perform an RPC call to core on every S3 request. object_sdk should have this data in its cache and we should not perform an RPC call to get it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it...will make appropriate changes .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try {
bucket_info = await req.object_sdk.internal_rpc_client.bucket.read_bucket_sdk_info({ name: req.params.bucket });
if (bucket_info) {
dbg.info("bucket_info =", bucket_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no dbg.info. use dbg.log0 to dbg.log5 for different log levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

dbg.info("bucket_info =", bucket_info);
}
} catch (err) {
dbg.log1("Could not get bucket info from cache", err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, for most errors (unless expected ones) you should use dbg.error and rethrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let bucket_info = null;
if (req.params && req.params.bucket) {
bucket_info = await req.object_sdk.read_bucket_sdk_config_info(req.params.bucket);
dbg.log0("read_bucket_sdk_config_info = ", bucket_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be dbg.log0 as it happens on every request and can overload the log. you can change it to log2 or 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
// Check if logging is enabled or not and send the S3 logs accordingly.
if (s3_logging.is_bucket_logging_enabled(bucket_info, req.params.bucket)) {
dbg.log0("Bucket logging is enabled on Bucket : ", req.params.bucket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

collect_bucket_usage(op, req, res);

let bucket_info = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to init it to null. it is initialized as undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dbg.log0("read_bucket_sdk_config_info = ", bucket_info);
}
// Check if logging is enabled or not and send the S3 logs accordingly.
if (s3_logging.is_bucket_logging_enabled(bucket_info, req.params.bucket)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this code should be under the if in line 153.

Also, passing req.params.bucket is redundant, since bucket_info should already contain the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


function endpoint_bucket_op_logs(op_name, req, res, source_bucket) {

dbg.info("Sending op logs for op name = ", op_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no dbg.info. you can change to dbg.log2 or dbg.log3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
values = values.slice(0, -2);
//Just to log it to endpoint logs. Need to remove it later.
dbg.log0("!!!Object operation log!!! :", values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we plan to replace it with a syslog call eventually, let's change it to dbg.log1 to avoid excessive logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/endpoint/s3/s3_bucket_logging.js Outdated Show resolved Hide resolved
request_id: req.request_id
};

const s3_log = new Aws_log_entry();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in response to Ben. I think we should format it in the (syslog) server side before writing the log object. I think that for now you can just return log as the log record and send it as JSON

Comment on lines 72 to 76
let values = "";
for (const key of Object.keys(s3_log)) {
const value = s3_log[key];
values += `${value} `;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should send it as a json (use JSON.stringify()). it is more generic and can be easily parsed later

Comment on lines 14 to 15
const enable = check_logging_config_perm(source_bucket.bucket_info.logging);
return enable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is unnecessary now. we can add it later if we see a reason to.

Suggested change
const enable = check_logging_config_perm(source_bucket.bucket_info.logging);
return enable;
return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dannyzaken
Copy link
Contributor

dannyzaken commented Sep 14, 2023

@aspandey also pay attention to the DCO and deepscan tests

collect_bucket_usage(op, req, res);

let bucket_info;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bucket_info is not used outside the following if scope.
it's better to remove this line and add const inside the scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/server/system_services/bucket_server.js Show resolved Hide resolved
}]
}
});
}

async function get_bucket_logging(req) {
dbg.log0('get_bucket_logging:', req.rpc_params);

dbg.log0('delete_bucket_logging:', req.rpc_params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/server/system_services/bucket_server.js Show resolved Hide resolved
function is_bucket_logging_enabled(source_bucket) {

if (!source_bucket || !source_bucket.bucket_info.logging) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dbg.log2("read_bucket_sdk_config_info = ", bucket_info);

if (s3_logging.is_bucket_logging_enabled(bucket_info)) {
dbg.log2("Bucket logging is enabled on Bucket : ", req.params.bucket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for bucket:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// 2 - Format it and send it to log bucket/syslog.

const s3_log = get_bucket_log_record(op_name, source_bucket, req, res);
dbg.log1("!!!BUCKET Operation Logs!!! ", s3_log);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit dramatic...

Suggested change
dbg.log1("!!!BUCKET Operation Logs!!! ", s3_log);
dbg.log1("bucket op log = ", s3_log);

Also, consider prefixing all logs realted to bucket op logs with a common prefix.
It would make it easier to grep for them.
Eg-
dbg.log("bucket op log: sending op logs for op name = ")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on prefix part. For now, I am keeping it as it is as we also need to decide on log format.


const client_ip = http_utils.parse_client_ip(req);

const log = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some fields are missing?
-time
-requester (if we have such a concept)
-error code
-bytes_send (if available?)
-object size
etc. see https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We may not log the exact S3 format. So will decide later.

@aspandey aspandey force-pushed the logging branch 3 times, most recently from a8d6644 to f6f1a6e Compare October 9, 2023 11:02
@alphaprinz alphaprinz dismissed dannyzaken’s stale review October 9, 2023 17:46

This version is meeting the current "definition of done". Further changes as warranted.

Signed-off-by: Ashish Pandey <aspandey@redhat.com>
@aspandey aspandey merged commit a3e960b into noobaa:master Oct 10, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants