Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor reporting #239

Closed
wants to merge 6 commits into from
Closed

WIP: Refactor reporting #239

wants to merge 6 commits into from

Conversation

saulshanabrook
Copy link
Collaborator

@saulshanabrook saulshanabrook commented Jul 3, 2017

to address: https://push-language.hampshire.edu/t/cleaning-up-clojush-reporting/864

What this will improve

This changes all logging (previously called reporting) to only generate the data to log if it needs it. It also de-deduplicates the computation of various summary stats on each individual, like the mean error, number of code points, and percent parenthesis.

Scope of this pull request

This PR does not add any new features and all existing logging will behave the same. It also doesn't touch any problem specific logging, leaving that for a later change.

How does it do this?

The basic idea is that we want the statistics for each generation, and for each individual, to be a lazy map. The values of this map are dynamically computed when they are needed. We use the Plumatic's graph library to implement this. Reading that intro first will help you get a sense of how that library is used.

Another way to think about it is a form of dependency injection

What I still need to do

I am pretty sure this approach will achieve the goals, but there are likely some bugs (because of all the code changes). So I still need to:

  • Add comprehensive testing
    • test multiple problems
    • test text output
    • test CSV output
    • test JSON output
    • test EDN output
    • make sure lexicase report works
    • make sure cosmos report works
    • add test for suceeding run with everything enabled
    • get more tests from people in the group
  • rebase
  • try to benchmark this to see what speed costs are
  • get passing on CI
  • move logging and profiling to another file
  • figure out repl bug from tom
  • improve profiling by increasing stuff computed in graphs
  • profile autoconstruction to see speed difference
  • merge individual into individual input
  • make everything with side effects have ! and make explicit

Future improvements

If this PR is accepted, we can then add on a few other things for free.

Allow arbitrary logging of any parts of the generation.

For example, all the text reporting could be replaced with just passing in a list of paths to log and having it print out those paths. For example, we could say:

lein run <whatever-problem> :text-log "{:config [[:config :git-hash] [:config :problem-file]] :generation [[:generation :best :mean-error][:generation :index]}"

and this would log:

:config
[:config :git-hash] sdfsdf-asdfas-asdf
[:config :problem-file] whatever.whatever
:generation
[:generation :best :mean-error] 1000
[:generation :index] 0
:generation
[:generation :best :mean-error] 900
[:generation :index] 1

If this PR is accepted, changing to something like this would remove a lot of code and also simplify understanding the logs.

~~Automatic profiling of all reporting~~~ Done

We could easily enable automatic profiling of everything we compute for the logs, using the graph/profiled function.

Expand the use of graphs for during the run as well

Having individuals be lazy-maps with dynamically generated attributes could be a nice way to represent the computation of things like errors or any other attribute. Also, if you check out the individual logging file, you can see how it pulls in stuff from the argmap to compute things. This concept could be extended for things like letting the push executor get this from the argmap.

Auto documenting the types of data available Done

We should be able to use the Graph tools to print out all the data available for the different events. Maybe this means making Clojush a real CLI app with multiple entrypoints.

@coveralls
Copy link

Coverage Status

Coverage increased (+3.5%) to 26.86% when pulling 638bf0a on refactor-reporting into 47b44e3 on master.

@saulshanabrook saulshanabrook force-pushed the refactor-reporting branch 2 times, most recently from ecab9e9 to 636d1f5 Compare July 4, 2017 02:52
@coveralls
Copy link

Coverage Status

Coverage increased (+3.6%) to 26.935% when pulling 636d1f5 on refactor-reporting into 47b44e3 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+3.6%) to 26.935% when pulling 636d1f5 on refactor-reporting into 47b44e3 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+3.6%) to 26.935% when pulling c642016 on refactor-reporting into 47b44e3 on master.

@saulshanabrook saulshanabrook force-pushed the refactor-reporting branch 7 times, most recently from 8183101 to 5f046d3 Compare July 7, 2017 21:34
Copy link
Collaborator

@thelmuth thelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saulshanabrook
Copy link
Collaborator Author

Closing for now. Not sure if the code disruption is worth the gains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants