New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very long lines in an application log are loaded surprisingly slow when browsing from Console #195

Closed
szpak opened this Issue Jan 15, 2013 · 10 comments

Comments

Projects
None yet
3 participants
@szpak

szpak commented Jan 15, 2013

Viewing log files with very long lines from agent takes surprisingly long time for a file to load in a console web interface. For a line with ~20K chars it is about one minute (per line). Having more those line in file (e.g. SQL queries dumped by Hibernate) makes this feature unfit to use. Tested with Glu 4.5.1.
When reading a file one core is 100% busy with a Glu agent process. I haven't looked into code, but maybe it would be possible to change a way how Glu allocates memory in that place.

@ypujante

This comment has been minimized.

Member

ypujante commented Jan 15, 2013

I will look into the problem. In the end the code that gets executed is the following: https://github.com/linkedin/glu/blob/master/agent/org.linkedin.glu.agent-impl/src/main/groovy/org/linkedin/glu/agent/impl/capabilities/ShellImpl.groovy#L805 which essentially forks to execute the "native" tail command.

Would you mind trying a tail command (outside of glu) similar to what glu executes on one of your log file with very long lines? (tail -500 xxx)

Are you seeing the issue like this? Besides forking to execute this command, glu does not do much this is why I am trying to figure out where the problem is.

Thanks

@szpak

This comment has been minimized.

szpak commented Jan 15, 2013

On 2013-01-15 18:18, Yan Pujante wrote:

I will look into the problem. In the end the code that gets executed is the following: https://github.com/linkedin/glu/blob/master/agent/org.linkedin.glu.agent-impl/src/main/groovy/org/linkedin/glu/agent/impl/capabilities/ShellImpl.groovy#L805 which essentially forks to execute the "native" tail command.

Would you mind trying a tail command (outside of glu) similar to what glu executes on one of your log file with very long lines? (tail -500 xxx)

Are you seeing the issue like this? Besides forking to execute this command, glu does not do much this is why I am trying to figure out where the problem is.

I had this issue a few weeks age and can be wrong, but AFAIR I was
browsing the file with less in tail mode ("F") and worked fine.

I will probably be able to test it with pure tail on Monday.

Maybe the problem is with passing content from Agent to Console and
displaying it in a browser?

@szpak

This comment has been minimized.

szpak commented Jan 17, 2013

I checked a situation with "tail -500 log" from a shell and it shown me the last lines very quickly. Glu (4.6.1) for the same file slows down on long lines and shows them one by one with a large delay.

@ypujante

This comment has been minimized.

Member

ypujante commented Jan 17, 2013

Ok thanks for checking I will investigate.

@ypujante

This comment has been minimized.

Member

ypujante commented Jan 20, 2013

I found the culprit. What threw me off is that you said that the agent process was 100% CPU. This is not the case and the issue is in the console: the StreamFileContentPlugin (https://github.com/linkedin/glu/blob/v4.6.1/orchestration/org.linkedin.glu.orchestration-engine/src/main/groovy/org/linkedin/glu/orchestration/engine/plugins/builtin/StreamFileContentPlugin.groovy#L75) creates a DataMaskingInputStream which is the culprit as it issues regexp matching on every single line of the file (so with line of siez 20K, this is really where the slowness happen).

As a workaround, you can disable the StreamFileContentPlugin entirely by removing it from the array in the config file:

orchestration.engine.plugins = [
  'org.linkedin.glu.orchestration.engine.plugins.builtin.StreamFileContentPlugin'
]

=> 

orchestration.engine.plugins = [
]

Note that if you do that then the other part of the plugin which is checking for unrestricedLocation won't be executed anymore.

I will release a fix in the next couple of days (4.6.2) which will add a new config parameter to mask or not the output stream.

@szpak

This comment has been minimized.

szpak commented Jan 20, 2013

Great to hear you were able to find a reason. Thanks!

@sodul

This comment has been minimized.

Contributor

sodul commented Jan 21, 2013

Question on this. What is being searched and how do we add entries to be masked? The source code mention passwords, but I'm not sure which password it is.

@ypujante

This comment has been minimized.

Member

ypujante commented Jan 21, 2013

The class being used to do the masking is this one: https://github.com/linkedin/linkedin-utils/blob/master/org.linkedin.util-groovy/src/main/groovy/org/linkedin/groovy/util/io/DataMaskingInputStream.groovy

I did not write it but the comment explains what it does. The class is very specific in what it is doing (as in looking for a very specific pattern). Although it looks like you can change the pattern, the code around line 100 is very specific in the form of the pattern.

If you are really interested in masking your own passwords, the easiest is to write your own InputStream wrapper and use a plugin to inject it.

@ypujante

This comment has been minimized.

Member

ypujante commented Jan 21, 2013

added option to disable masking file content in 4.6.2. It does not "fix" the issue as the issue comes from the DataMaskingInputStream which applies regular expressions which slows down the process by a factor of 1000x easily. But if you have this kind of slow down issues this is the way to go.

@ypujante ypujante closed this Jan 21, 2013

@sodul

This comment has been minimized.

Contributor

sodul commented Jan 21, 2013

I might be interested in masking password but on the agent side. Sometimes we have config files that are stored under $mountPoint as part of the configuration step and being able to mask them would be good. Since these passwords differ from machine (mostly by fabric), it would be best to be able to mask them at the agent level.

That might be a feature request :-)

Stephane

On Jan 21, 2013, at 12:36 PM, Yan Pujante wrote:

The class being used to do the masking is this one: https://github.com/linkedin/linkedin-utils/blob/master/org.linkedin.util-groovy/src/main/groovy/org/linkedin/groovy/util/io/DataMaskingInputStream.groovy

I did not write it but the comment explains what it does. The class is very specific in what it is doing (as in looking for a very specific pattern). Although it looks like you can change the pattern, the code around line 100 is very specific in the form of the pattern.

If you are really interested in masking your own passwords, the easiest is to write your own InputStream wrapper and use a plugin to inject it.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment