## Chapter 5
# Using sequential containers and analyzing strings

## Exercises

**5-0.** Compile, execute, and test the programs in this chapter.

Splitting implemented in `split.h`, `split_main.cpp`, `split_test.cpp`.  
Character pictures implemented in `character_pictures.cpp` (also uses `split.h` for input).


**5-1.** Design and implement a program to produce a permuted index.  A permuted index is one in which each phrase is indexed by every word in the phrase. So, given the following input,  
```
The quick brown fox
jumped over the fence
```
the output would be . 
```
      The quick    brown fox
jumped over the    fence
The quick brown    fox
                   jumped over the fence
         jumped    over the fence
            The    quick brown fox
    jumped over    the fence
                   The quick brown fox
```

A good algorithm is suggested in _The AWK Programming Language_. That solution divides the problem into three steps:  
1. Read each line of the input and generate a set of rotations of that line. Each rotation puts the next word of the input in the first position and rotates the previous first word to the end of the phrase. So the output of this phrase for the first line of our input would be
```
The quick brown fox
quick brown fox The
brown fox The quick
fox The quick brown
```
Of course, it will be important to know where the original phrase ends and where the rotated beginning begins.
2. Sort the rotations.
3. Unrotate and write the permuted index, which involves finding the separator, putting the phrase back together, and writing it properly formatted.


**Solution**  
I took a slightly different approach that I believe is simpler than the proposed solution.  

I hold each permutation in a struct, `IndexPermutation` containing a single string for each of `index` and `phrase` (including spaces).  

I then iterate through all possible `pivotIndex`s, and for each `pivotIndex`, split the line and put the left half into `index` and the right half into `phrase`.  

I then sort the `IndexPermutation` vector by its consituent `phrase`s, and print the results in the correct format.

Example run:  
```
$ gcc permuted_index.cpp split.cpp -o permuted_index.out  -lstdc++
$ ./permuted_index.out
      the quick     brown fox
jumped over the     fence
the quick brown     fox
                    jumped over the fence
         jumped     over the fence
            the     quick brown fox
    jumped over     the fence
                    the quick brown fox
```

_**Note** the lack of capitalization.  I think the authors made a mistake in including capitalization since we haven't learned of a standard library way of comparing strings while ignoring case.  I think it's beyond the scope of the exercise to write a custom string `compare` function using `std::tolower` on each char_.

**5-2.** Write the complete new version of the student-grading program, which extracts records for failing students, using `vector`s. Write another that uses `list`s. Measure the performance difference on input files of ten lines, 1000 lines, and 10000 lines.


_**Note:** I am adding a 100,000 list as well, since 10,000 performs in under 0.5s in the vector case on my machine._

**Pt1: Implementation**  
To avoid duplication, I am using the original `Student_info`, `median` and `grade` definitions from Ch4:
```
$ gcc failing_students_vector.cpp ../chapter_4/grade.cpp ../chapter_4/Student_info.cpp ../chapter_4/median.cpp -o failing_students_vector.out  -lstdc++
$ gcc failing_students_list.cpp ../chapter_4/grade.cpp ../chapter_4/Student_info.cpp ../chapter_4/median.cpp -o failing_students_list.out  -lstdc++
$ cat grades
Karl 50 40 50 39 19 d
Jake 90 90 80 70 95 d%
$ cat grades | ./failing_students_vector.out
Passing students:

Jake 86
Failing students:

Karl 41.6
```

_**Also note** I used the modified extended version frmo the Ch4 exercises that reads hw input in the `grade` function and only stores `finalGrade` on the student object._

**Pt2.1: Timing: vector**  
```
$ wc -l grades1000
1000 grades1000
$ wc -l grades10000
10000 grades10000
$ wc -l grades100000
100000 grades100000
$ time cat grades1000 | ./failing_students_vector.out
...
./failing_students_vector.out  0.02s user 0.00s system 74% cpu 0.027 total
$ time cat grades10000 | ./failing_students_vector.out
...
./failing_students_vector.out  0.33s user 0.01s system 98% cpu 0.346 total
$ time cat grades100000 | ./failing_students_vector.out
...
./failing_students_vector.out  18.32s user 0.08s system 99% cpu 18.449 total

```
**Pt2.2: Timing: list**  
```
$ time cat grades1000 | ./failing_students_list.out
...
./failing_students_list.out  0.02s user 0.00s system 72% cpu 0.026 total
$ time cat grades10000 | ./failing_students_list.out
./failing_students_list.out  0.15s user 0.01s system 90% cpu 0.170 total
$ time cat grades100000 | ./failing_students_list.out
./failing_students_list.out  1.37s user 0.07s system 97% cpu 1.478 total
```

Thus, with a vector implementation, it takes 18.3s for a 100,000 student file and only 1.37s for the same file with a list implementation.  The list impl performs at 1.37/18.3 = 7.5% of the time for size = 100,000!



**5-3.** By using a `typedef`, we can write on version of the program that implements either a `vector`-based solution or a `list`-based one. Write and test this version of the program.

Implemented in `failint_students_generic.cpp`.  Note that the `sort` has been removed since the `algorithm::sort` function doesn't accept lists.

**5-4.** Look again at the driver functions you wrote in the previous exercise. Note that it is possible to write a driver that differs only in the declaration of the type for the data structure that holds the input file. If your vector and list test drivers differ in any other way, rewrite them so that they differ only in this declaration.

Yup! The only line that needs changing in the program to change from `list` to `vector` or vice-versa is this one:  
`typedef list<Student_info> student_collection;`

**5-5.** Write a function named `center(const vector<string>&)` that returns a picture in which all the lines of the original picture are padded out to their full width, and the padding is as evenly divided as possible between the left and right sides of the picture. What are the properties of pictures for which such a function is useful? How can you tell whether a given picture has those properties?

In [1]:
#include <vector>
#include <string>
#include <iostream>

std::vector<std::string> center(const std::vector<std::string>& picture) {
    std::string::size_type maxlen = 0;
    for (std::vector<std::string>::const_iterator it = picture.begin(); it != picture.end(); it++) {
        maxlen = std::max(maxlen, it->size());
    }
    
    std::vector<std::string> ret;
    for (std::vector<std::string>::const_iterator it = picture.begin(); it != picture.end(); it++) {
        std::string::size_type borderSize = (maxlen - it->size()) / 2;
        ret.push_back(std::string(borderSize, ' ') + *it + std::string(borderSize, ' '));
    }
    
    return ret;
}

std::vector<std::string> testPicture;
testPicture.push_back("*");
testPicture.push_back("***");
testPicture.push_back("*****");
testPicture.push_back("*******");
testPicture.push_back("*********");

std::vector<std::string> centeredPicture = center(testPicture);
for (std::vector<std::string>::const_iterator it = centeredPicture.begin(); it != centeredPicture.end(); it++) {
    std::cout << *it << std::endl;
}

return 0;


    *    
   ***   
  *****  
 ******* 
*********


(int) 0


As you can see above, one property of a picture for which a `center` function is useful is one in which at least some lines are of different width and need centering (such as the triangle shape).  One could also imagine centering text _before_ putting a border on it.  

One could programatically tell if a picture has these properties by seeing if lines are borderless and without the same width.

**5-6.** Rewrite the `extract_fails` function from $5.1.1/77 so that instead of erasing each failing student from the input vector `students`, it copies the records for the passing students to the beginning of `students`, and then uses the `resize` function to remove the extra elements from the end of `students`. How does the performance of this version compare with othe one in 5.1.1/77?

Implemented in `failing_students_copy_resize.cpp`.  
The performance, as timed with my 100,000 grades file using a `list`, is _slightly_ faster, but almost exactly the same (less than 1% speedup).

**5-7.** Given the implementation of `frame` in 5.8.1/93, and the following code fragmant  
```
vector<string> v;
frame(v);
```
describe what happens in this call. In particular, trace through how both the `width` function and the `frame` function operate. Now, run this code. If the results differ from your expectations, first understand why your expectations and the program differ, and then change one to match the other.

This call will produce a `vector<string>` with two entries, both 4-character strings filled with astrixes.  

I tested by making this call and printing the results:
```
$ ./empty_character_pictures.out
****
****
```

**5-8.** In the `hcat` function from 5.8.3/95, what would happen if we defined `s` outside the scope of the `while`? Rewrite and execute the program to confirm your hypothesis.

If we defined `s` outside the scope of the `while`, and `right` was shorter than `left`, the output would be the exact same.  However, if `left` was shorter than `right`, the returned list would contain duplicated strings from `right` for each of the lines where `i >= left.size() && i < right.size()`.  In other words, each line matching that criteria would contain all of the text in the previous line plus a set of spaces and the content from the next line in `right`.

**Not true!** After actually running the program, I realized I missed something:
```
libc++abi.dylib: terminating with uncaught exception of type std::length_error: basic_string
```

This is because the statement:
```
s += string(width1 - s.size(), ' ');
```
will have a negative size argument when `s` is larger than `width`, which happens in this case for the reasons described above.

**5-9.** Write a program to write the words in the input that do not contain any uppercase letters followed by the words that contain one or more uppercase letters.

Implemented in `case_separation.cpp`:
```
$ gcc case_separation.cpp -o case_separation.out  -lstdc++
$ ./case_separation.out
hi Bye dry Fye
hi
dry
Bye
Fye
```

**5-10.** Palindromes are words that are spelled the same right to left as left to right. Write a program to find all the palindromes in a dictionary. Next, find the longest palindrome.

```
$ gcc palindromes.cpp -o palindromes.out  -lstdc++
$ cat /usr/share/dict/words | ./palindromes.out
All palindromes:

A
a
aa
aba
acca
adda
affa
aga
aha
ajaja
aka
ala
alala
alula
ama
amma
ana
anana
...
waw
wow
X
x
Y
y
yaray
yoy
Z
z

Longest palindrome:
deedeed
```

**5-11.** In text processing it is sometimes useful to know whether a word has any ascenders or descenders. Ascenders are the parts of lowercase letters that extend above the text line; in the English alphabet, the letters b, d, f, h, k, l, and t have ascenders. Similarly, the descenders are the parts of lowercase letters that descend below the line; In English, the letters g, j, p, q, and y have descenders. Write a program to determine whether a word has any ascenders or descenders. Extend that program to find the longest word in the dictionary that has neither ascenders nor descenders.

```
$ gcc no_ascenders_or_descenders.cpp -o no_ascenders_or_descenders.out  -lstdc++
$ cat /usr/share/dict/words | ./no_ascenders_or_descenders.out

Longest word with no ascenders or descenders:
overconsciousness
```