# Chapter 3
## Working with batches of data

Due to standard input not working yet in C++ notebooks, all programs are in the same folder as this notebook in `.cpp` files.

## Exercises

**3-0.** Compile, execute, and test the programs in this chapter.

Done in `chapter_3_code.cpp`.

**3-1.** Suppose we wish to find the median of a collection of values. Assume that we have read some of the values so far, and that we have no idea how many values remain to be read.  
Prove that we cannot afford to discard any of the values that we have read.  
_Hint:_ One proof strategy is to assume that we can discard a value, and then find values for the unread - and therefore unknown - part of our collection that would cause the median to be the value that we discarded.

We are asked to prove that there exists _at least one_ possible set of current and future values such that discarding _at least one_ value from the current set will give us an incorrect result for the median after reading _at least one more_ value.  If we can show one such valid case, then it is enough to prove we cannot afford to discard any values, since our program is intended to generalize to any valid input.  

Assume we have a vector `V` of `n` elements.  Assume that all `n` values are unique, and that `n` is odd and greater than `1`.  Thus, the element at index `(n - 1) / 2` is the median (and is unique in the vector). Call this element `m`.  Also assume that the two neighboring values, `x` and `y` with indices `((n - 1) / 2) - 1` and `((n - 1) / 2) + 1` respectively, contain values such that `(y + x) / 2 != m`.

If we discard the value `m`, then the length of `V`, `n - 1`, will be even, and the median will thus equal the average of the two previously surrounding values, `x` and `y`.  Since `(y + x) / 2 != m`, the medium of the vector has changed.  
QED

**3-2.** Write a program to compute and print the quartiles (that is, the quarter of the numbers with the largest values, the next highest quarter, and so on) of a set of integers.

In [1]:
#include<vector>
#include<iostream>
#include<algorithm>

void computeQuartiles(std::vector<double> values) {
    std::vector<double>::size_type size = values.size();
    if (size == 0) {
        std::cout << "Empty values given" << std::endl;
        return;
    }

    std::sort(values.begin(), values.end());

    bool isEven = values.size() % 2 == 0;
    double firstQuartile = isEven ? (values[size / 4] + values[std::max((int) (size / 4 - 1), 0)]) / 2
                                  : values[size / 4];
    double secondQuartile = isEven ? (values[size / 2] + values[size / 2 - 1]) / 2
                                   : values[size / 2];
    double thirdQuartile = isEven ? (values[3 * size / 4] + values[3 * size / 4 - 1]) / 2
                                  : values[3 * size / 4];
    std::cout << "Quartiles: " << firstQuartile << ", " << secondQuartile << ", " << thirdQuartile << std::endl;
}

std::vector<double> v;
v.push_back(0);
v.push_back(10);
v.push_back(20);
v.push_back(30);
v.push_back(100);
v.push_back(90);
v.push_back(40);
v.push_back(50);
v.push_back(60);
v.push_back(70);
v.push_back(80);

computeQuartiles(v);

v.clear();
v.push_back(0);
computeQuartiles(v);

v.clear();
v.push_back(0);
v.push_back(1);
computeQuartiles(v);

Quartiles: 20, 50, 80
Quartiles: 0, 0, 0
Quartiles: 0, 0.5, 0.5


(void) @0x700008a84ea0


**3-3.** Write a program to count how many times each distinct word appears in its input.

Implemented in `distinct_words.cpp`

**3-4.** Write a program to report the length of the longest and shortest `string` in its input

Implemented in `longest_and_shortest_strings.cpp`

**3-5.** Write a program that will keep track of grades for several students at once. The program could keep two `vectors` in sync: The first should hold the student's names, and the second the final grades that can be computed as input is read.  For now, you should assume a fixed number of homework grades.

Implemented in `multiple_students.cpp`.

**3-6.** The average-grade computation might divide by zero if the student didn't enter any grades. Division by zero is undefined in C++, which means that the implementation is permitted to do anything it likes. What does your C++ implementation do in this case?  
Rewrite the program so that its behavior does not depend on how the implementation treats division by zero.

In [2]:
1 / 0;

 1 / 0;
[0;1;32m   ^ ~
[0m

(int) 0


As seen above, it errors out!

Here's a similar mean-calculation that avoids the divide-by-zero problem:

In [3]:
#include<vector>

double calculateMean(std::vector<double> values) {
    double runningTotal = 0;
    for (int i = 0; i < values.size(); i++) {
        runningTotal += values[i];
    }
    
    return values.size() == 0 ? 0 : runningTotal / values.size();
}



In [4]:
#include<iostream>

std::vector<double> allGrades;
allGrades.push_back(90.0);
allGrades.push_back(80.0);
allGrades.push_back(50.0);
allGrades.push_back(50.0);

std::cout << calculateMean(allGrades) << std::endl;

67.5


(std::__1::basic_ostream &) @0x7fffaf973660


In [5]:
allGrades.clear();

std::cout << calculateMean(allGrades) << std::endl;

0


(std::__1::basic_ostream &) @0x7fffaf973660
