Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor reporting #239

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
191 changes: 186 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,189 @@
[Here](https://gist.github.com/thelmuth/1361411) is a document describrining how
to contribute to this project.

## Computation Graphs

We use
[Plumatic's Graph framework](https://github.com/plumatic/plumbing#graph-the-functional-swiss-army-knife)
for dependency injection and lazy evaluation. C All graphs are compiled with `lazy-compile`,
so that their outputs are lazy maps. This means that the values are only calculated
when we ask for them.

The graphs we use are in `clojush.graphs`. The top level graphs all have a `:log`
subgraph, that has keys for each logger on that graph.


### Population
There are a bunch of bits things we might want to know about each
individual in the population for logging purposes. These might
be as simple as the mean of it's error vector and as complicated
as program string of the partially simplified version of itself. Some
of these things we just want to know about the best individual and others
about all individuals, and we never want to compute them unless we need to.

So the `generation` graph computes an augmented `population`, where
each item has all the original keys of the individual plus some extra
lazy dynamic ones. It does this by, again, computing a lazy map based on
a graph. It is defined at `clojush.graphs.individual/graph`
and takes as input the original individual, under the key `:individual` as
well as the argmap and some computed stats on the population. The lazy map output
is merged with the original individual record.

Since we need these extra values on the "best" individual, for logging,
we compute the best from this augmented population.

After the generation finished, the pushgp function needs to know if
we have succeeded and get the "best" individual, so that it can return it.
So it get's the computed data from the generation and accesses
the `outcome` and the `best`.

### Modifying grpah computation

#### Adding computed data
OK let's say you want to log some more data during the run. First, decide
which graph it should be in:

* Depends on the command line arguments? Then it belongs in `init`
* Computes something about the machine environment (like git hash) or depends
on the push argmap? Put that in `config`.
* Computed ever generation and is population wide? -> `generation`
* Computed for each individual in every generation? -> `individual`

Then, add it to that graph. The most straightforward way to do that
is to define a keyword function (`defnk`) in that file and put
that keyword function in the `compute-graph` in that same file. Use the above
logic and CLI commands to understand what you can ask for as input in the `defnk`.


#### Adding a handler
If you want to create a new handler to support logging in some new format
or new source, you should:

1. Add a toggle for the handler in `clojush.args/argmap`.
2. Create a `clojush.graphs.handlers.<label>` file. In it, define a `handler`
var that maps from event labels to handle functions. Make those handle
funtions execute based on the toggle you defined in the argmap.
3. Add that handler to `clojush.graphs.handlers/handlers`.


#### Example
For exmaple, let's create a handler that logs, to a file, the number of empty
genomes in each population every 20 generations.

First we add a couple of options to `clojush.args/argmap`:

```clojure
:print-empty-genome-logs false
:empty-genome-logs-every-n-generations 20
:emtpy-genome-logs-filename "empty-genomy-logs.txt"
```

Then, let's make a file for this handler at `clojush.graphs.handlers.empty-genome`:

```clojure
(ns clojush.graphs.handlers.empty-genome
(:require [plumbing.core :refer [defnk]]
[clojure.java.io :as io]))

(defnk handle-config
"Save the header of the file before the run starts"
[[:config [:argmap print-empty-genome-logs emtpy-genome-logs-filename]]]
(when print-empty-genome-logs
(spit emtpy-genome-logs-filename "Generation NumEmptyGenomes\n")))

(defnk handle-generation
"At every generation, if it's the nth generation, save the # of emtpy genomes"
[[:config [:argmap print-empty-genome-logs
emtpy-genome-logs-filename
empty-genome-logs-every-n-generations]]
[:generation index :as generation]]
(when (and print-empty-genome-logs
(= 0 (mod index empty-genome-logs-every-n-generations)))
(spit
emtpy-genome-logs-filename
(str index " " (:empty-genomes-n generation))
:append true)))

(def handler
{:config handle-config
:generation handle-generation})
```

Then add that `handler` to `clojush.graphs.handlers/handlers`. As you can
see, I didn't actually do any computation in the handler to figure out
the `empty-genomes-n`. Instead, I just asked for that value from the
`generation`. So let's define this key on this generation. First
we add a new keyword funciton in `clojush.graphs.events.generation`:

```clojure
(defnk empty-genomes-n [population]
(count (filter #(empty? (:genome %)) population)))


(def compute-graph
(graph/graph
...

empty-genomes-n))
```

We are getting the `population` that is also computed by a different keyword
function.

Then we add that keyword function to `clojush.graphs.events.generation/compute-graph`.
It infers the name by looking at the name of the keyword function.

By moving the computation out of the handler, any other handler can also access
this attribute now of the generation.

One other thing we could do to clean this up is to add a `genome-empty?` value on
each individual. To do this, add a keyword function in `clojush.graphs.events.generation.individual`:

```clojure
(defnk empty-genome? [genome]
(empty? genome))

(def compute-graph
(graph/graph
...
empty-genome?))
```

Then we can clean up the generation level attribute:

```clojure
(defnk empty-genomes-n [population]
(count (filter :empty-genome? population)))
```

### Debugging the graph

If you set the `CLOJUSH_DEBUG_GRAPH` environmental variables, then it will
print to stderr when all values in the graph are being calculated.


### Profiling the graph

We can also produce a flame graph for the run, to understand where time is being
spent. First, download the [FlameGraph](https://github.com/brendangregg/FlameGraph)
library. Then, do a run with the `CLOJUSH_FLAME_GRAPH_FILE` set to whatever
file you want the profiling output to be. Then you can `flamegraph.pl` on that
file to get an SVG output.

```bash
git clone git@github.com:brendangregg/FlameGraph.git ~/FlameGraph
env CLOJUSH_FLAME_GRAPH_FILE=profile.kern_folded lein run clojush.problems.integer-regression.nth-prime
~/FlameGraph/flamegraph.pl profile.kern_folded > profile.svg
open profile.svg
```

## Travic CI
Recently we have begun using [Travis CI](travis-ci.org) to automate multiple
parts of development.

We use [Travis CI](travis-ci.org) for...

### Testing

Primarily it serves as a way to test every branch and pull request, using commands
It tests every branch and pull request, using commands
like `lein check` and `lein test`.


Expand All @@ -34,21 +210,26 @@ fail. You will need to regenerate the saved output with:
lein run -m clojush.test.integration-test/regenerate [<label> ...]
```

If the tests are not passing, because something has changed, I often find it easier to regenerate
the test ouputs then use `git diff` to see what has changed, instead of having the Clojure
test checker do the diff.

Since there are some things that will always change (like the time and git hash)
there is some manual find and replace logic in `clojush.test.integration-test`
that tries to replace things will change with `xxx` in the test output.


### Docs

Docs are auto generated from function metadata using
The docs are auto generated from function metadata using
[`codox`](https://github.com/weavejester/codox).

On every commit to master, the docs are automatically regenerated and pushed
to the [`gh-pages` branch](http://lspector.github.io/Clojush/).

To generate them locally run `lein codox` and then open `doc/index.html`.

Currently, generating the docs have the side effect of running some examples,
Generating the docs have the side effect of running some examples,
[because I couldn't figure out how stop codox from loading all example files](https://github.com/weavejester/codox/issues/100).

In the metadata, you can [skip functions](https://github.com/weavejester/codox#metadata-options)
Expand Down
3 changes: 2 additions & 1 deletion project.clj
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
;; https://mvnrepository.com/artifact/org.apache.commons/commons-math3
[org.apache.commons/commons-math3 "3.2"]
[cheshire "5.7.1"]
[prismatic/plumbing "0.5.4"]]
[prismatic/plumbing "0.5.4"]
[mvxcvi/puget "1.0.1"]]
:plugins [[lein-codox "0.9.1"]
[lein-shell "0.5.0"]
[lein-gorilla "0.4.0"]
Expand Down
15 changes: 7 additions & 8 deletions src/clojush/args.clj
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
(ns clojush.args
(:require [clj-random.core :as random])
(:use [clojush globals random util pushstate]
[clojush.instructions.tag]
[clojush.pushgp report]))
[clojush.instructions.tag]))

(def push-argmap
(atom (sorted-map
Expand Down Expand Up @@ -89,8 +88,8 @@
:uniform-addition 0.0
:uniform-addition-and-deletion 0.0
:uniform-combination-and-deletion 0.0
:genesis 0.0
}
:genesis 0.0}

;; The map supplied to :genetic-operator-probabilities should contain genetic operators
;; that sum to 1.0. All available genetic operators are defined in clojush.pushgp.breed.
;; Along with single operators, pipelines (vectors) containing multiple operators are
Expand Down Expand Up @@ -376,11 +375,11 @@
;; The number of simplification steps that will happen during final report
;; simplifications.

:problem-specific-initial-report default-problem-specific-initial-report
:problem-specific-initial-report (fn [argmap] :no-problem-specific-initial-report-function-defined)
;; A function can be called to provide a problem-specific initial report, which happens
;; before the normal initial report is printed.

:problem-specific-report default-problem-specific-report
:problem-specific-report (fn [& args] :no-problem-specific-report-function-defined)
;; A function can be called to provide a problem-specific report, which happens before
;; the normal generational report is printed.

Expand Down Expand Up @@ -462,10 +461,10 @@
;; Should be in the format "<hostname>:<port>"
;; If set, will send logs of each run to a server running on this
;; host
:label nil
:label nil)))
;; If set, will send this in the configuration of the run, to the
;; external record
)))


(defn load-push-argmap
[argmap]
Expand Down
35 changes: 35 additions & 0 deletions src/clojush/cli/graphs.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
(ns clojush.cli.graphs
(:require [puget.printer :as puget]
[plumbing.fnk.pfnk :as pfnk]
[plumbing.core :refer [map-vals]])
(:import (schema.core.Predicate)
(schema.core.AnythingSchema)))

(def schema-handlers
{schema.core.Predicate (fn [_1 _2] nil)
schema.core.AnythingSchema (fn [_1 _2] nil)})

(defn my-print [form]
(puget/cprint
form
{:print-handlers schema-handlers}))

(defn symbol->value
"Takes in a symbol like 'my.project/function and returns
the value that the symbol refers to."
[s]
(let [namespace_ (namespace s)]
; (if namespace_
(-> namespace_ symbol require))
; (throw {:type ::no-namespace :symbol s :hint "Couldn't evaluate this symbol, should be like `project/function`"})))
(eval s))

(defn input [s]
(my-print
(pfnk/input-schema
(symbol->value (symbol s)))))

(defn output [s]
(my-print
(pfnk/output-schema
(symbol->value (symbol s)))))
28 changes: 11 additions & 17 deletions src/clojush/core.clj
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@
;; for more details.

(ns clojush.core
(:require [clojush.pushgp.record :as r])
(:use [clojush.pushgp pushgp report])
(:require [clojush.graphs.init :refer [->init]]
[clojush.graphs.utils :refer [end-profile!]]
[clojush.pushgp.pushgp :refer [pushgp]])
(:gen-class))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Expand All @@ -31,18 +32,11 @@
This allows one to run an example with a call from the OS shell prompt like:
lein run examples.simple-regression :population-size 3000"
[& args]
(r/new-run!)
(println "Command line args:" (apply str (interpose \space args)))
(let [param-list (map #(if (.endsWith % ".ser")
(str %)
(read-string %))
(rest args))]
(require (symbol (r/config-data! [:problem-file] (first args))))
(let [example-params (eval (symbol (str (first args) "/argmap")))
params (merge example-params (apply sorted-map param-list))]
(println "######################################")
(println "Parameters set at command line or in problem file argmap; may or may not be default:")
(print-params (into (sorted-map) params))
(println "######################################")
(pushgp params)
(shutdown-agents))))
(let [init (->init {:args args})]
(-> init :log :all!)
(try
(do
(pushgp (:params init))
(end-profile!))
(finally
(shutdown-agents)))))