# Odds and ends

The first notebook was just starting to give you a taste of things. I didn't present thing methodically. This notebook finishes up the
"odds and ends" introduction to Clojure and the environment -- still not methodical; we'll get to that. 

First things first: don't modify this notebook unless you've gotten it from the GitHub repository. I think you (Owen) said you
had git installed in your system. If not, install it. Then do: `git clone https://github.com/pdenno/some-clojure.git` in
a shell. That will create a directory with this notebook in it, among other things.

Second, the Clojure cheatsheet that I use is [here](https://clojure.org/api/cheatsheet). When I am programming, I keep it open.

The rest of this notebook is about (1) one of the most powerful and essential functions of Clojure `reduce` and (2) another
threading macro, `as->`. I don't expect that you will understand all of this right away, but we are still in the mode of looking
at the power and possibilities of the language. Then we'll get methodical. 

We were caclulating mode....

In [3]:
(def data [3 55 62 32 3 3243 4 56 4 454])

#'user/data

One tricky thing we did was use `group-by` with the function `identity`. `identity` just gives you back
the argument you sent it. Usually that's pretty useless! But it is just what we needed here.

In [8]:
(identity :foo)

:foo

`group-by` groups together things that return the same value when the first argument function is applied to them.
Thus if that first argument is `identity` and the arguments are numbers it groups together each number with
other instance of that number:

In [10]:
(group-by identity data)

{3 [3 3], 55 [55], 62 [62], 32 [32], 3243 [3243], 4 [4 4], 56 [56], 454 [454]}

Let's try `group-by` to group people by the color of their shoes. 

In [11]:
(def people-shoes [{:name "Fred" :shoe-color "brown"}
                   {:name "Wilma" :shoe-color "brown"}
                   {:name "Barney" :shoe-color "blue"}
                   {:name "Betty" :shoe-color "green"}])

#'user/people-shoes

Remember that you can use a keyword as a function. So to each of the maps in the `people-shoes` we can apply `:shoe-color` :

In [12]:
(group-by :shoe-color people-shoes)

{"brown" [{:name "Fred", :shoe-color "brown"} {:name "Wilma", :shoe-color "brown"}], "blue" [{:name "Barney", :shoe-color "blue"}], "green" [{:name "Betty", :shoe-color "green"}]}

Cool, right?

### Reduce

Okay, the second thing I did yesterday was use the result of `group-by`, the lists of numbers grouped, in a function `reduce-kv`.
`reduce` and the special flavor of it, `reduce-kv`, usually takes 3 arguments: a function, a starting thing in which you 'concentrate'
the result, and the collection of things you want to run through, concentrating them into the result. We could use it
to get the names of all the people-shoes:

In [13]:
(reduce
 (fn [result p-s] (conj result (:name p-s))) ; first argument, a function
 [] ; second argument, the starting thing to concentrate into
 people-shoes)

["Fred" "Wilma" "Barney" "Betty"]

The above used a __function__ that we didn't give a name. We introduced it with `fn`. The vector after
`fn`, specifically `[result p-s]` are the __arguments__ of the function. In this case they are are, respectively,
the thing we want to concentrate things in, called `result` here, and, in turn, one of the people-shoe things we
are running through the function, `p-s`.

`conj`, called in the function, is a function that returns a collection with a value added to it. 
For example we can add 333 to `data`:

In [14]:
(conj data 333)

[3 55 62 32 3 3243 4 56 4 454 333]

Remember, it didn't change `data`, it just returned a new thing that includes the stuff in `data`.
These vector are called `immutable` -- they can't be changed.

Okay, back to calculating the mode. Recall what our group-by did:

In [16]:
(group-by identity data)

{3 [3 3], 55 [55], 62 [62], 32 [32], 3243 [3243], 4 [4 4], 56 [56], 454 [454]}

Now we want to "concentrate" that result (which we never assigned to a variable like we had with `(def data ...)`)
by running through it, finding the which elements are used most often, and collecting just those. We did
that with:

In [None]:
(reduce-kv
 (fn [result k v]
     (cond (> (count v) (:cnt result))
           {:cnt (count v) :elems [v]},
           (= (count v) (:cnt result))
           (update result :elems conj v),
           :else result))
 {:cnt 0 :elems []}
 (group-by identity data))
           

{:cnt 2, :elems [[3 3] [4 4]]}

### cond

This uses a form `cond` that takes a sequence of forms `<test-form> <execute-form>`. What `cond` does is tries
each `<test-form>` in sequence and returns the result of the __first__ `<execute-form>` associated with
the `<test-form>` that returned true. So.... remember that `reduce` is running through each of the
things given as the third argument in turn. If we ran vanilla `reduce` those things would be couples
made up of a key and a value from the argument, which was:

`{3 [3 3], 55 [55], 62 [62], 32 [32], 3243 [3243], 4 [4 4], 56 [56], 454 [454]}`

So the function would get `(3 [3 3])` then `(55 [55])`, etc.

`reduce-kv`  (which is short for "reduce key value") is just like `reduce` but it splits those
tuples up into what they came from: a key and a value from the argument map. Thus the arguments
to the `reduce-kv` are `[result k v]`, not `[result some-kv-thing]`. 

## Putting it all together, `as->`

Okay, so now we've got a useful intermediate result for calculating the mode of data;
that intermediate result is a map `{:cnt 2, :elems [[3 3] [4 4]]}`.
For the mode, we need to take the average of these numbers.
You could just grab them with `:elem` and `flatten` them
(see the (Cheatsheet)[https://clojure.org/api/cheatsheet]), but we want to be more dignified than
that ;^). We'll take the first element of each list and take the average of that collection.

Q: How do you do that without adding more pesky variables using `def`?

A: You use a threading macro.

You've seen `->` already. 
Yesterday, to calculate the mode we used a mysterious `as->` form, another threading macro.
Recall  that `->` has a topic and flows an object (starting as the topic)
through each form inside the `->`, as the form's first argument. Each form can change the object,
then pass it on. This kind of thing is so much a part of functional programming....

But what if the object flowing doesn't belong as the first argument of the form? Then `->` isn't what you want.
There is another threading macro `->>` that flows the object through as the _last_ argument in each form.
Clojure programmers tend to use `->` and `->>` a lot. But what if where you need to put the argument isn't
always first or last? Well, if you think about it, you can switch from `->` to `->>` just by starting a `->>`
inside a `->`. But you can't go in the other direction; you can't switch back to `->` inside a `->>`.

There is another threading macro, one that I use quite a bit, it is `as->`. It takes an argument right after
the topic,  a variable. (Typically I use a variable that starts with a question mark for this, like `?r`.)
That variable gets bound to the topic, and then to the result of applying each of the forms in turn.

So here is what we ended up with. 

In [18]:
(as-> data ?r ; ?r for "result", the thing we bind to numbers and then each subsequent result.
      (reduce-kv
       (fn [result k v]
           (cond (> (count v) (:cnt result))
                 {:cnt (count v) :elems [v]},
                 (= (count v) (:cnt result))
                 (update result :elems conj v),
                 :else result))
       {:cnt 0 :elems []}
       (group-by identity ?r))    ; The first form (the result) ends way down here! We use ?r (bound to data) rather than data.
      (:elems ?r)                 ; This is the second form. We get the :elems out of ?r.
      (mapv first ?r)             ; This is the third form. We take the first element of [3 3] and [4 4] , getting [3 4]
      (/ (apply + ?r) (count ?r)) ; This is the fourth form. It take the average of [3 4] just as we did before.
      (float ?r))                 ; This is the fifth and final form, the average is 7/2. Let's see it as a floating point number      

3.5

TaDa! One tiny note: instead of doing things like `(/ (apply + ?r) (count ?r))` all over the place, you
can write your own function:

In [None]:
(defn avg
  "Take a vector V and return the average of its values."
  [v]
  (float (/ (apply + v) (count v)))) ; could also use -> to do this with less parentheses.  

## You try something!

Make a function out of mode above, then apply it as suggested yesterday using `assoc` to make a map
of all the statistics you collected.

## Extra credit

A (concordance)[https://en.wikipedia.org/wiki/Concordance_(publishing)] is a list of words used in
a text. A concordance is useful in (natural language processing)[https://en.wikipedia.org/wiki/Natural_language_processing]
(NLP), something we can screw around with. Here is a concordance, hardly perfect, for some text in a paragraph in the
subdirectory `resources/paragraph.txt`. See what you can do with it!

In [21]:
(-> "resources/paragraph.txt"
    slurp
    (clojure.string/split #"\s+")
    (->> (group-by identity)
         (reduce-kv
          (fn [result k v] (assoc result k (count v)))
          {})))          

{"technology" 1, "ability" 1, "queries" 1, "for." 1, "(such" 1, "means" 1, "comparable" 1, "identifying" 1, "word" 2, "semantic" 1, "techniques" 1, "of" 3, "reduced" 1, "latent" 1, "every" 1, "linguistic" 1, "offered" 1, "In" 2, "long" 1, "something" 1, "era," 1, "precomputing" 1, "they" 1, "mathematical" 1, "Today," 1, "results" 1, "for" 2, "addition," 1, "searching" 1, "words" 1, "likely" 1, "words)" 1, "was" 1, "that" 1, "have" 2, "based" 1, "a" 2, "on" 1, "concordance" 2, "context." 1, "and" 1, "publishing." 1, "concerning" 1, "indexing" 1, "such" 2, "terms" 1, "other" 1, "interest" 1, "would" 1, "unavailable," 1, "information" 1, "readers" 1, "has" 1, "works" 1, "combine" 1, "to" 3, "search" 3, "as" 4, "automatically" 1, "the" 4, "proposed" 1, "been" 2, "result" 1, "near" 1, "Bible" 1, "in" 1, "multiple" 1}