# Data Abstraction

As we consider the wide set of things in the world that we would like to represent in our programs, we find that most of them have compound structure. For example, a geographic position has latitude and longitude coordinates. To represent positions, we would like our programming language to have the capacity to couple together a latitude and longitude to form a pair, a compound data value that our programs can manipulate as a single conceptual unit, but which also has two parts that can be considered individually.

The use of compound data enables us to increase the modularity of our programs. If we can manipulate geographic positions as whole values, then we can shield parts of our program that compute using positions from the details of how those positions are represented. The general technique of isolating the parts of a program that deal with how data are represented from the parts that deal with how data are manipulated is a powerful design methodology called data abstraction. Data abstraction makes programs much easier to design, maintain, and modify.

Data abstraction is similar in character to functional abstraction. When we create a functional abstraction, the details of how a function is implemented can be suppressed, and the particular function itself can be replaced by any other function with the same overall behavior. In other words, we can make an abstraction that separates the way the function is used from the details of how the function is implemented. Analogously, data abstraction isolates how a compound data value is used from the details of how it is constructed.

The basic idea of data abstraction is to structure programs so that they operate on abstract data. That is, our programs should use data in such a way as to make as few assumptions about the data as possible. At the same time, a concrete data representation is defined as an independent part of the program.

These two parts of a program, the part that operates on abstract data and the part that defines a concrete representation, are connected by a small set of functions that implement abstract data in terms of the concrete representation. To illustrate this technique, we will consider how to design a set of functions for manipulating rational numbers.

## Example: Rational Numbers

A rational number is a ratio of integers, and rational numbers constitute an important sub-class of real numbers. A rational number such as $1/3$ or $17/29$ is typically written as:

`<numerator>/<denominator>`

where both the `<numerator>` and `<denominator>` are placeholders for integer values. Both parts are needed to exactly characterize the value of the rational number. Actually dividing integers produces a float approximation, losing the exact precision of integers.

In [None]:
1/3

In [None]:
1/3 == 0.333333333333333300000

However, we can create an exact representation for rational numbers by combining together the numerator and denominator.

We know from using functional abstractions that we can start programming productively before we have an implementation of some parts of our program. Let us begin by assuming that we already have a way of constructing a rational number from a numerator and a denominator. We also assume that, given a rational number, we have a way of selecting its numerator and its denominator component. Let us further assume that the constructor and selectors are available as the following three functions:

- `rational(n, d)` returns the rational number with numerator n and denominator d.
- `numer(x)` returns the numerator of the rational number x.
- `denom(x)` returns the denominator of the rational number x.

We are using here a powerful strategy for designing programs: *wishful thinking*. We haven't yet said how a rational number is represented, or how the functions numer, denom, and rational should be implemented. Even so, if we did define these three functions, we could then `add`, `multiply`, `print`, and test equality of rational numbers:

In [None]:
def rational(n, d):
    return (n, d)


def numer(x):
    return x[0]


def denom(x):
    return x[1]


def add_rational(x, y):
    nx, dx = numer(x), denom(x)
    ny, dy = numer(y), denom(y)
    return rational(nx * dy + ny * dx, dx * dy)


def mul_rational(x, y):
    return rational(numer(x) * numer(y), denom(x) * denom(y))


def print_rational(x):
    print(numer(x), "/", denom(x))


def rationals_are_equal(x, y):
    return numer(x) * denom(y) == numer(y) * denom(x)

In [None]:
half = rational(1, 2)
print_rational(half)

In [None]:
third = rational(1, 3)

In [None]:
print_rational(mul_rational(half, third))

In [None]:
print_rational(add_rational(third, third))

As the example above shows, our rational number implementation does not reduce rational numbers to lowest terms. We can remedy this flaw by changing the implementation of `rational`. If we have a function for computing the greatest common denominator of two integers, we can use it to reduce the numerator and the denominator to lowest terms before constructing the pair. As with many useful tools, such a function already exists in the Python Library.

In [None]:
from math import gcd


def rational(n, d):
    g = gcd(n, d)
    return (n // g, d // g)

In [None]:
print_rational(add_rational(third, third))

This improvement was accomplished by changing the constructor without changing any of the functions that implement the actual arithmetic operations.

## Abstraction Barriers

Before continuing with more examples of compound data and data abstraction, let us consider some of the issues raised by the rational number example. We defined operations in terms of a constructor rational and selectors `numer` and `denom`. In general, the underlying idea of data abstraction is to identify a basic set of operations in terms of which all manipulations of values of some kind will be expressed, and then to use only those operations in manipulating the data. By restricting the use of operations in this way, it is much easier to change the representation of abstract data without changing the behavior of a program.

For rational numbers, different parts of the program manipulate rational numbers using different operations, as described in this table

<table>
<tr>
    <th>Parts of the program that...</th> 
    <th>Treat rationals as...</th>
    <th>Using only...</th>
</tr>

<tr>
    <td>Use rational numbers to perform computation</td>
    <td>whole data values</td>
    <td>add_rational, mul_rational, rationals_are_equal, print_rational</td>
</tr>

<tr>
    <td>Create rationals or implement rational operations</td>
    <td>numerators and denominators</td>
    <td>rational, numer, denom</td>
</tr>

<tr>
    <td>Implement selectors and constructor for rationals</td>
    <td>two-element sets</td>
    <td>set literals and element selection</td>
</tr>
</table>

In each layer above, the functions in the final column enforce an abstraction barrier. These functions are called by a higher level and implemented using a lower level of abstraction.

An abstraction barrier violation occurs whenever a part of the program that can use a higher level function instead uses a function in a lower level. For example, a function that computes the square of a rational number is best implemented in terms of `mul_rational`, which does not assume anything about the implementation of a rational number.

In [None]:
def square_rational(x):
    return mul_rational(x, x)

Referring directly to numerators and denominators would violate one abstraction barrier.

Assuming that rationals are represented as two-element sets would violate two abstraction barriers.

In [None]:
def square_rational_violating_once(x):
    return rational(numer(x) * numer(x), denom(x) * denom(x))

def square_rational_violating_twice(x):
        return (x[0] * x[0], x[1] * x[1])

Abstraction barriers make programs easier to maintain and to modify. The fewer functions that depend on a particular representation, the fewer changes are required when one wants to change that representation. All of these implementations of square_rational have the correct behavior, but only the first is robust to future changes. The `square_rational` function would not require updating even if we altered the representation of rational numbers. By contrast, `square_rational_violating_once` would need to be changed whenever the selector or constructor signatures changed, and `square_rational_violating_twice` would require updating whenever the implementation of rational numbers changed.