Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain logging from Hive #33

Closed
wants to merge 1 commit into from
Closed

Retain logging from Hive #33

wants to merge 1 commit into from

Conversation

findepi
Copy link
Contributor

@findepi findepi commented Sep 25, 2018

When logging it silenced from Hive as a whole, it makes it cumbersome to troubleshoot problems with Avro.
This is because AvroSerDe logs errors instead of rethrowing them.
https://github.com/apache/hive/blob/e161b01136ea14d1f9358d9b71aea61951bfaab0/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L197-L202

While, same AvroSerDe logs a log on INFO level, we should address the problem with pre-packaged log.properties file.

@findepi
Copy link
Contributor Author

findepi commented Sep 25, 2018

Even if check & throw when org.apache.hadoop.hive.serde2.AbstractSerDe#getConfigurationErrors is present, we will get only the message, without stacktrace and causal chain.

Copy link

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% positive answer to my question

@@ -56,7 +55,7 @@

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<artifactId>slf4j-jdk14</artifactId>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, thanks to usage of slf4j-jdk we will be able to control that with Presto log.properties. Am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum
Copy link
Contributor

electrum commented Oct 2, 2018

My concern with this change is that the logs will be flooded with garbage. We want logging to be useful out of the box at the default INFO level. Users should not need to silence a bunch of things by default.

Can we solve this by subclassing AvroSerDe? Or doing a patch here?

@findepi
Copy link
Contributor Author

findepi commented Oct 2, 2018

Can we solve this by subclassing AvroSerDe? Or doing a patch here?

AvroSerDe has two problems:

  • it logs way to much at INFO level
  • it reports errors at WARN level instead of throwing exceptions

I thought about

  1. unlocking logging from Hive
    • i think we should not silence WARN/ERROR from Hive unless we see this as irrelevant as well
    • currently, when there a problem related to Hadoop lib (like Avro), there seems no way to get any useful information except of attaching a debugger
  2. providing some default log.properties in Presto installation package, so that users won't be flooded with garbage.
    • packaging log.properties is IMO good idea anyway, since it will make it easier to configure the system from scratch -- just find the file and edit, without having to think where is should be placed

Alternatively, we can override org.apache.hadoop.hive.serde2.avro.AvroSerDe#determineSchemaOrReturnErrorSchema with same code but sans the catch block (+ load our subclass whenever Avro serde is refernced). This solves the problem of Avro specifically. Is this the direction you would prefer?

@findepi
Copy link
Contributor Author

findepi commented Oct 2, 2018

Yes another option would be to inspect org.apache.hadoop.hive.serde2.AbstractSerDe#configErrors after initializing a SerDe (eg in com.facebook.presto.hive.HiveUtil#initializeDeserializer). This won't contain the stacktrace though.

@findepi
Copy link
Contributor Author

findepi commented Oct 9, 2018

For Avro, the following is helpful -- prestodb/presto#11673

Still, i think we should unlock logging from Hive. With the help of prestodb/presto#11671 we can provide silence all overly verbose Hive logging out of the box.

findepi added a commit to prestodb/presto that referenced this pull request Oct 10, 2018
For convenience of administrators, provide `log.properties` as part of
an RPM install. That way, people do not have to think where the file
should be put. Later, we can provide useful logging configuration:
- suppress overly verbose logging from Hive, see
  prestodb/presto-hive-apache#33
- logging configuration for troubleshooting problems ("uncomment ...
  when diagnosing ..."), see e.g.
  https://github.com/prestodb/presto/wiki/Security-Troubleshooting-Guide#before-you-begin
@findepi
Copy link
Contributor Author

findepi commented Oct 29, 2018

I propose airlift/airlift#683 as a way forward:

@findepi
Copy link
Contributor Author

findepi commented Nov 2, 2018

Superseded by #34

@findepi findepi closed this Nov 2, 2018
@findepi findepi deleted the findepi/master/retain-logging-from-hive-342ff7 branch November 2, 2018 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants