## Chapter 7
# Using associative containers

## Exercises

**7-0.** Compile, execute, and test the programs in this chapter.

Implemented in `counting_words.cpp`, `xref.cpp`, `sentence_generator.cpp` and `sentence_generator_main.cpp`:
```
$ gcc counting_words.cpp -o counting_words.out -lstdc++
$ ./counting_words.out
hey hey hey you you guys

guys    1
hey     3
you     2

$ gcc xref.cpp ../chapter_5/split.cpp -o xref.out -lstdc++
$ ./xref.out
hey there
I
am learning that there
c++
because c++
is a language I like
I occurs on line(s): 2, 6
a occurs on line(s): 6
am occurs on line(s): 3
because occurs on line(s): 5
c++ occurs on line(s): 4, 5
hey occurs on line(s): 1
is occurs on line(s): 6
language occurs on line(s): 6
learning occurs on line(s): 3
like occurs on line(s): 6
that occurs on line(s): 3
there occurs on line(s): 1, 3

$ gcc sentence_generator.cpp sentence_generator_main.cpp ../chapter_5/split.cpp -o sentence_generator.out -lstdc++
$ cat grammar.txt
<noun>          batman
<noun>          cutting board
<noun>          bachelor
<noun-phrase>   <noun>
<noun-phrase>   <adjective> <noun-phrase>
<adjective>     jovial
<adjective>     fearless
<adjective>     scrumtrulescent
<verb>          earns
<verb>          basks
<location>      in the city
<location>      out on the town
<location>      nowhere
<sentence>      the <noun-phrase> <verb> <location>

$ cat grammar.txt | ./sentence_generator.out
the bachelor earns out on the town
```

**7-1.** Extend the program from 7.2/124 to produce its output sorted by occurrence count. That is, the output should group all the words that occur once, followed by those that occur twice, and so on.

Implemented in `counting_words_sorted_by_count.cpp`:
```
$ gcc counting_words_sorted_by_count.cpp -o counting_words_sorted_by_count.out  -lstdc++
$ ./counting_words_sorted_by_count.out
hey hey yo yo there dude dude dude
there	1
hey	2
hey	2
dude	3
```

**7-2.** Extend the program in 4.2.3/64 to assign letter grades by ranges:  
```
A 90-100
B 80-89.99...
C 70-79.99...
D 60-69.99...
F < 60
```
The output should list how many students fall into each category.

Implemented in `grade_ranges.cpp`:
```
$ gcc grade_ranges.cpp ../chapter_6/grade.cpp ../chapter_6/Student_info.cpp -o grade_ranges.out  -lstdc++
$ ./grade_ranges.out
Karl 80 80 85 90 85 80 d
Jarl 100 90 95 90 85 85 d
Jade 70 70 75 76 80 d
Jake 50 50 90 50 40 60 d
Blake 90 90 100 100 95 d
Grade	Number of students
A	    2
B	    1
C	    1
F	    1
```

**7-3.** The cross-reference program from 7.3/126 could be improved. As it stands, if a word occurs more than once on the same input line, the program will report that line multiple times. Change the code so that it detects multiple occurrences of the same line number and inserts the line number only once.

Think the right solution here would be to change the return type of `map<string, vector<int> > xref` to return a map from strings to _sets_ of ints, to ensure uniquenuess "for free".  The closest data structure we've been given so far is a `map`, which ensures uniqueness in keys, but requires a matching value that would be superfluous in this case.

In the absence of a set, my solution is computationally expensive but works - manually search through the vector of line numbers and only insert a line number if it's not already mapped to the word:

```
vector<int>::const_iterator find(vector<int> ints, int i) {
    for (vector<int>::const_iterator it = ints.begin(); it != ints.end(); ++it) {
        if (*it == i) {
            return it;
        }
    }
    return ints.end();
}

...

        for (vector<string>::const_iterator it = words.begin(); it != words.end(); ++it) {
            vector<int> line_numbers = ret[*it];
            if (find(line_numbers, line_number) == line_numbers.end()) {
                line_numbers.push_back(line_number);
            }
        }
```

Here is a run:
```
$ ./xref.out
hey hey hey
hey occurs on line(s): 1
```

**7-4.** The output produced by the cross-reference program will be ungainly if the input file is large. Rewrite the program to break up the output if the lines get too long.

```
$ ./xref.out
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1 occurs on line(s): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
                     20, 21, 22, 23
2 occurs on line(s): 24
```

**7-5.** Reimplement the grammar program using a `list` as the data structure in which we build the sentence.

Implemented in `sentence_generator_with_list.cpp`:

```
$ gcc sentence_generator_with_list.cpp ../chapter_5/split.cpp -o sentence_generator_with_list.out -lstdc++
$ cat grammar.txt | ./sentence_generator_with_list.out
the bachelor earns out on the town
```

**7-6.** Reimplement the `gen_sentence` program using two `vector`s: One will hold the fully unwound, generated sentence, and the other will hold the rules and will be used as a stack. Do not use any recursive calls.

Implemented in `sentence_generator_with_stack.cpp`:

```
$ gcc sentence_generator_with_stack.cpp ../chapter_5/split.cpp -o sentence_generator_with_stack.out -lstdc++
$ cat grammar.txt | ./sentence_generator_with_stack.out
the cutting board basks in the city
```

**7-7.** Change the driver for the cross-reference program so that it writes `line` if there is only one line and `lines` otherwise.

```
./xref.out
1 2
1
1 occurs on lines: 1, 2
2 occurs on line: 1
```

**7-8.** Change the cross-reference program to find all the URLs in a file, and write all the lines on which each distinct URL occurs.

This amounts to simply specifying the `find_urls` function as an argument for the `find_words` parameter.

Here is an example run:

```
$ gcc xref.cpp ../chapter_5/split.cpp ../chapter_6/find_urls.cpp -o xref_with_urls.out -lstdc++
$ ./xref_with_urls.out
blah blah http://localhost:8888/notebooks/accelerated_c%2B%2B/chapter_6/chapter_6.ipynb blah blah https://www.khanacademy.org/math/statistics-probability blahhhhhh
blah blah http://localhost:8888/notebooks/accelerated_c%2B%2B/chapter_6/chapter_6.ipynb blah blah

http://localhost:8888/notebooks/accelerated_c%2B%2B/chapter_6/chapter_6.ipynb occurs on lines: 1, 2
https://www.khanacademy.org/math/statistics-probability occurs on line: 1
```

**7-9.** (difficult) The implementation of `nrand` in 7.4.4/135 will not work for arguments greater than `RAND_MAX`. Usually, this restriction is no problem, because RAND_MAX is often the largest possible integer anyway. Nevertheless, there are implementations under which RAND_MAX is much smaller than the largest possible integer. For example, it is not uncommon for RAND_MAX to be `32767` ($2^{15} - 1$) and the largest possible integer to be `2147483647` ($2^{31} - 1$). Reimplement `nrand` so that it works well for all values of `n`.

My solution is implemented in `large_nrand.cpp`.  The solution uses the fact that adding two samples generated from ranges $[0, n)$ and $[0, m)$ yields a uniform sample generated from the range $[0, n + m - 1)$.

My approach is to break up random generation into blocks of single draws, drawing from the range `[0, RAND_MAX)` while `nDraws * RAND_MAX < n`, and then to make the final draw from `[0, (n % RAND_MAX) + nDraws)`.  (Note that the `+ nDraws` is needed because the draws are right-exclusive.  The example below should make it clear:

_Example_: for simplicity, assume `RAND_MAX = 4` and `n = 13`. We want a result in the range `[0, 13) = [0, 12]`.

Then the result will be the sum of four uniform samples - three in the range `[0, 4) = [0, 3]`, and one in the range 
`[0, 13 % 4 + nDraws) = [0, 1 + 4) = [0, 4) = [0, 4]`. Thus, the total range will be `[0, 3 * 3 + 4] = [0, 13]`.

Here are some runs, with `n = MAX_INT`:
```
$ gcc large_nrand.cpp -o large_nrand.out -lstdc++
& ./large_nrand.out
1926931274%                                                                                                           $ ./large_nrand.out
61922876%                                                                                                             $ ./large_nrand.out
344398125%                                                                                                             $ ./large_nrand.out
623514601%                                                                                                             $ ./large_nrand.out
905989850%                                                                                                             $ ./large_nrand.out
1470940348%                                                                                                           $ ./large_nrand.out
2035890846%
```