Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting review #90

Closed
bmr-cymru opened this issue Dec 13, 2012 · 29 comments
Closed

Reporting review #90

bmr-cymru opened this issue Dec 13, 2012 · 29 comments
Assignees

Comments

@bmr-cymru
Copy link
Member

Reporting seems to be in a funny state at the moment. We have the old HTML and XML reporting code (the XML stuff seems to be dead right now, or at least, does not run when --report is given). The legacy HTML stuff works, just about, but is ugly and a maintenance headache.

The new Report class is pretty cool and gives /much/ cleaner looking code but only implements PlainTextReport as a concrete class.

I'm also wondering if we shouldn't just turn reporting on by default (and invert --report -> --no-report) since it seems to take up very little runtime.

@jhjaggars
Copy link
Contributor

I never removed the xml or html reports because they were so far down on the list, but I agree with you. I think there might be some value in implementing an HTML concrete class that uses the reporting stuff and dumping the old things.

RE: inverting --report, I think it's a good idea. I think that there is an issue around here somewhere that I never got around to to make --report on by default.

@adam-stokes
Copy link

What about dumping a json file with certain metadata that external tools could use for their own reporting? Not saying rip out the existing reporting (well maybe xml b/c its just ugh) but something in addition to whats there.

@bmr-cymru
Copy link
Member Author

This is the idea with SOMA (Sos object model archive) - making the archive more discoverable and presenting the data in an abstracted fashion. Discussions about this have been going on since $forever with little actual movement.

@ghost ghost assigned adam-stokes Aug 1, 2013
@adam-stokes
Copy link

This is a pretty aggressive time slot for resolving this bug but ill try to get it done by 3.1

@adam-stokes
Copy link

@bmr-cymru could we setup an irc meeting to discuss how we want to tackle SOMA and also the dbus interface.

Thanks!

@ghost ghost assigned bmr-cymru Oct 31, 2013
@adam-stokes
Copy link

For the html output generation should we use a template library like cheetah or jinja? Or are we thinking we should manually create the HTML and elements within a HTML report type class?

@adam-stokes adam-stokes modified the milestones: 3.2, 3.3 Aug 19, 2014
@bmr-cymru bmr-cymru modified the milestones: 3.3, 3.2 Sep 17, 2014
@bmr-cymru
Copy link
Member Author

Moving this to 3.3 as nothing is broken by it and we don't have time to get anything new in for 3.2.

@prayther
Copy link

augtool dump-xml /files > /tmp/augtool_dump_xml_all_files.xml

augtool, could maybe help with the lenses that have already been created ???

just a thought.

@bmr-cymru
Copy link
Member Author

Not really (we've looked at Augeas several times; if we are to use it it'll be via the Python API):
dumping yet-another-cryptic file in an awkward encoding (XML) into the reports does not help anyone.

If we address this it needs to be in a manner that's readily consumable and doesn't just layer on more inconvenience.

Anyone who wants augeas-formatted XML for an sosreport can easily get it right now by just pointing the tool at a report archive.

@Amitgb14
Copy link
Contributor

@bmr-cymru @battlemidget anybody work on this issue?

@adam-stokes
Copy link

Not yet

@Amitgb14
Copy link
Contributor

I would like to work on this issue, I share what point in my mind.

  1. First report is generate in json format and write temporary file inside /tmp directory.
  2. Create reporting directory and put html_report.py, xml_report.py and plaintext_report.py scripts
  3. and finally generate report in sos.html, sos.txt, sos.xml and sos.json format.

Any suggestions, please share.

@bmr-cymru
Copy link
Member Author

  1. First report is generate in json format and write temporary file inside /tmp directory.

Nack; there is no need for this. All the data to be reported is in-memory. Writing it to disk and then reading it back and writing it again is pointless make-work.

  1. Create reporting directory and put html_report.py, xml_report.py and plaintext_report.py scripts

Nack (unless I mis-understood): why do these need to be external scripts? The current project structure uses python modules to assemble various subsystems that interact via defined interfaces. The only time we use an exec() style of interface is when interacting with truly external components (e.g. commands run by plugins or during policy loading and evaluation).

  1. and finally generate report in sos.html, sos.txt, sos.xml and sos.json format.

This is an admirable goal to work toward but I do not think it depends on either point (1) or (2).

@Amitgb14
Copy link
Contributor

Point first 1) : large size of data is not efficient store in main-memory, so it's need to write temporary file and then get back read, reading data required only when writing report(.html, txt and xml).

Point second 2) : reporting scripts can be easily manage and reduce sosreport.py script size, In future developer can easily change report look structure.(example, developer want to change html style or plain text style then it do make easy) and also reduce complexity.

@bmr-cymru
Copy link
Member Author

  1. : large size of data is not efficient store in main-memory

It is already there - look at the current reporting code. It iterates over the set of loaded plugins and interrogates them for the data to be stored in the report fields. If you are making a case that that repetitive formatting (for XML, HTML, text, etc.) is inefficient that is a different argument and one that I don't see is solved by merely writing the JSON data out to disk.

  1. reporting scripts can be easily manage and reduce sosreport.py script size,

So would abstracting this out into sos/report.py (and if necessary xmlreport.py, jsonreport.py etc.). This would also drive UP the memory and IO costs that you seem concerned about - each script will start as a new process with a brand new address space. If we are lucky then shared data may reside in the pagecache but if that is then read in anew by those processes we are unlikely to benefit from sharing unless we use complex IO models like memory-mapping (not at all easy in Python).

@bmr-cymru
Copy link
Member Author

I think a good first step would be to move all the still-desired reporting functionality out of sosreport.py and into the current report.py - deleting the legacy report code at the same time and re-implementing it using Jesse's classes where it makes sense.

This would help to ensure the interfaces we have are sane and workable and de-clutters the main sosreport.py (another very worthy goal).

I think at this stage making any design decision on the basis of presumed performance improvements is a mistake - Knuth is right - "premature optimisation is the root of all evil". There are known parts of sos that have very suboptimal memory usage right now but the reporting code is certainly not one that I lose any sleep over (PackageManager is a different matter for e.g...).

@Amitgb14
Copy link
Contributor

ohh it's my bad about first point

@Amitgb14
Copy link
Contributor

If i get wrong please correct this : We add xmlreport.py, htmlreport.py as module don't need to call extra process, inside sosreport.py

@bmr-cymru
Copy link
Member Author

We add xmlreport.py, htmlreport.py as module don't need to call extra process, inside sosreport.py

Right - I think for now this is the best approach. It keeps to existing project conventions and it would be a big improvement in the code structure and maintainability. If at the end of all that work there are measurable performance concerns then we can look at optimisations like caching or writing data to the file system.

@Amitgb14
Copy link
Contributor

ok 👍 ..

@Amitgb14
Copy link
Contributor

Amitgb14 commented Apr 4, 2016

@bmr-cymru, I write small web application to list out and browse the reports.
https://github.com/Amitgb14/sosweb

@Amitgb14
Copy link
Contributor

Amitgb14 commented Apr 8, 2016

Is there any update?

@TurboTurtle
Copy link
Member

Cycling around on this, just dealt with a situation where a sosreport took over 4 hours to run, with the vast majority of that time (3+ hours) spent on generating the reports. I think the reason this happened was the shear volume of files that the sosreport created due to it being run on a heavily utilized OCP node - there were just shy of 114k files in the archive.

That is a lot, but is it really expected to take 3 hours at that volume, or is this indicative of a lower level issue? Also, what consumes the html and xml reports today? Would it be beneficial to dynamically set reporting to be on or off based how large the sosreport is by the time we finish running the plugins?

@TurboTurtle TurboTurtle modified the milestones: 3.3, 3.7 Jul 10, 2018
@bmr-cymru
Copy link
Member Author

  • there were just shy of 114k files in the archive.

Do we know why there was such a volume? I.e. is this sane, either in terms of the node configuration, or what we are attempting to collect?

@TurboTurtle
Copy link
Member

It was a fairly heavily used OCP node. 150 running containers, another 130 stopped, and a total of 1100 images on it. All the docker plugin bits on that but more importantly the cgroups plugin grabbing /sys/fs/cgroup/* bits for the kubernetes pods which is where the bulk of this came from:

$ find sys/fs/cgroup/ -type f | wc -l
88516

@TurboTurtle
Copy link
Member

Sorry, that didn't actually answer your question. The volume would be sane for the size of the OpenShift environment it was on, but that is probably in the upper-end of such environments. So I imagine there are other end users running into similarly long run times and just "dealing with it" at the moment.

@bmr-cymru
Copy link
Member Author

By biggest problem with reports is it kinda feels like it should be post-processable. We should be able to take an archive, and comprehend it to produce that output, entirely independently of the collection host (it's just pretty printing, effectively).

That way we could turn it off by default and let users do something like:

    $ sos report --html --from sosreport-blah-blah.tar.gz

(or whatever)

@bmr-cymru bmr-cymru removed this from the 3.7 milestone Mar 26, 2019
@TurboTurtle
Copy link
Member

Since 2018, we've overhauled the actual reports generation mechanisms. A previous informal survey on the RH side also showed that while HTML reports are not ubiquitously used they are consumed to some degree. Given those two points, I wonder if this can be closed?

Or is the post-processing suggestion above still desirable?

@bmr-cymru @pmoravec

@pmoravec
Copy link
Contributor

pmoravec commented Jul 23, 2020

+1 to close this. The HTML report generation was re-written in #1728, no issues since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants