
# Lecture IV: Pointers and Arrays




- A **pointer** is a variable that contains the **address of a variable**. 
- pointers and arrays are related
- *Pointers have been lumped with the goto statement as a marvelous way to create impossible-to-understand programs*. This is certainly true when they are used carelessly, and it is easy to create pointers that point somewhere unexpected.
- Pointers are what distinguishes the C language, and define it as a **low-level programming language**
- With pointers, *memory management and code efficiency are in your hands*
- The C language **assumes** that you know what you are doing. With pointers you can mess up your codes and in some cases the whole machine in a very large, and sometimes unrecoverable,  numbers of ways. So **please** know what you are doing, the compiler will *not* help you in that.





# Pointers and Addresses


- computer RAM: adresses and content
- variables have two properties, their memory address and their content
- the content of a pointer is the memory address of a variable

![image.png](attachment:image.png)

The **unary operator &** gives the address of an object, so the statement

<code>
p = &c;
</code>

assigns the *address* of c to the *variable* p, and p is said to "point to" c. 

- The **& operator** only applies to objects in memory: variables and array elements (no register variables for example). 
- It cannot be applied to expressions, constants, or register variables.
- The **unary operator** *  is the **indirection** or **dereferencing** operator; when applied to a pointer, it accesses the object the pointer points to. (The content of the variable that has memory address p)

Example:
<code>
    int x = 1, y = 2, z[10];
    int * ip; /* ip is a pointer to int */
    ip = &x; /* ip now points to x */
    y = *ip; /* y is now 1 */
    *ip = 0; /* x is now 0 */
    ip = &z[O]; /* ip now points to x */
</code>


The declarations of x, y, and z are what we've seen all along. The declaration
of the pointer ip,
<code>
    int *ip;
 </code>
is intended as a mnemonic; it says that the expression 
<code>
    *ip
</code>
 is an int. 

**Pointer as arguments:**
<code>
    double *dp, atof(char *);
</code>
says that in an expression <code> *dp </code> and <code>atof(s)</code> have values of type double, and
that the argument of atof is a pointer to char.

- a pointer is constrained to point to a particular kind of object: 
- exception: a "pointer to void" is used to hold any type of pointer

If ip points to the integer x, then <code>*ip</code> can occur in any context where x
could, so

<code>
    *ip = *ip + 10
</code>
increments <code>*ip</code>  by 10.
The unary operators * and & *bind more tightly* than arithmetic operators, so
the assignment
<code>
    y = *ip + 1
</code>
takes whatever ip points at, adds 1, and assigns the result to y, while
<code>
    *ip += 1
</code>
    increments what ip points to, as do
    <code>
    ++*ip
    </code>
    and
    <code>
    (*ip) ++
    </code>
        
- pay attention to operator precedence
- pointers can be used withour dereferencing
</code>
    iq = ip
    </code>
copies the contents of ip into iq, thus making iq point to whatever ip pointed to.   

# Pointers and Function Arguments

- formal parameter are *local* to the function:

<code>
    swap(a,b)
    </code>
where the swap function is defined as
<code>
    void swap(int x, int y) /* WRONG */
    {
        int temp;
        temp = x;
        x = y;
        y = temp;
    }
    </code>
is **wrong**.
Because of call by value, swap can't affect the arguments a and b in the routine
that called it. The function above only swaps copies of a and b.

Using pointers:
<code>
    swap(&a, &b)
    </code>
- &a is a pointer to a. 
- In swap itself, the parameters are declared to be pointers, and the operands are accessed indirectly through them.
<code>
    void swap(int *px, int *py) /* interchange *px and *py */
    {
        int temp;
        temp = *px;
        *px = *py;
        *py = temp;
    }
</code>


**Pointer arguments enable a function to access and change objects in the
function that called it. **

*(Could you obtain the same result in another way?)*

As an example, consider a function getint that performs free-format input conversion by breaking a stream of characters into integer values, one integer per call. getint has to return the value it found and also signal end of file when there is no more input. These values have to be passed back by separate paths, for no matter what value is used for EOF, that could also be the value of an input integer.

One solution is to have getint return the end of file status as its function value, while using a pointer argument to store the converted integer back in the calling function. This is the scheme used by scanf as well; see Section 7.4.


The following loop fills an array with integers by calls to getint:
<code>
    int n, array[SIZE], getint(int *);
    for (n = 0; n < SIZE && getint(&array[n]) != EOF; n++)
        ;
     </code>
    
Each call sets <code>array[n]</code> to the next integer found in the input and increments
n. Notice that it is essential to pass the address of array[n] to getint.
Otherwise there is no way for getint to communicate the converted integer
back to the caller.
Our version of getint returns EOF for end of file, zero if the next input is
not a number, and a positive value if the input contains a valid number.   


In [11]:
#include <ctype.h>

int getch(void);
void ungetch(int);

/* getint: get next integer from input into *pn */
int getint(int *pn)
    {
        int c, sign;
        while (isspace(c = getch())) /* skip white space */
            ;
        if (!isdigit(c) && c != EOF && c != '+' && c != '-') 
        {
            ungetch(c);  /* it's not a number */
            return 0;
        }
        sign = (c == '-') ? -1 : 1;
        if (c == ' + ' || c == ' - ' )
            c = getch();

        for (*pn = 0; isdigit(c); c = getch())
            *pn = 10 * *pn + (c - '0');
        *pn *= sign;

        if (c != EOF)
            ungetch (c) ;

        return c;
    }

SyntaxError: invalid syntax (<ipython-input-11-cd05a808e215>, line 3)

Throughout getint, <code>*pn</code> is used as an ordinary int variable. We have also used getch and ungetch(described in Section 4.3) so the one extra character that must be read can be pushed back onto the input.
    
Exercise 5-1. As written, getint treats a + or - not followed by a digit as a
valid representation of zero. Fix it to push such a character back on the input.

**Supplementary exercize**. Write a code that finds the solution of ax^2+bx+c=0, where:
- a,b,c must be declared in the main
- they must be requested in a *function*. The I/O instruction for getting a float is:
<code>
    scanf("%f",&variable).
    </code> 
(why an address is used?)
- solutions must be calculated in a separated *function*. Note that you can have TWO solutions.
- solutions must be printed out in the main. (<code>printf("%f\n",variable)</code>; why here tha value is used and not the address?)

**Exercise 5-2.** Write getfloat, the floating-point analog of getint. What
type does getfloat return as its function value?


#  Pointers and Arrays

- strong relationship between pointers and array 

The declaration
<code>
    int a[10];
    </code>
defines an array a of size 10, that is, a block of 10 consecutive objects named
a[O], a[1], ... , a[9].


The notation a[i] refers to the i-th element of the array. If pa is a pointer to
an integer, declared as
<code>
    int *pa;
    </code>
then the assignment
<code>
    pa = &a[O];
</code>
sets pa to point to element zero of a; that is, pa contains the address of a[0].

![image.png](attachment:image.png)

Now the assignment
<code>
    x = *pa;
    </code>
    
will copy the contents of a[0] into x.

If <code>pa</code> points to a particular element of an array, then by definition <code>pa+ 1</code> points to the next element, pa+i points i elements after pa, and pa-i points i elements before. Thus, if pa points to <code>a[0]</code>,
<code>
    *(pa+1 )
</code>
refers to the contents of a[1], pa+i is the address of a[i], and
<code>
    *(pa +i)
</code>
the contents of a[i].


- pointer arithmetic: incrementing a pointer add to the memory address it contains the appropriate number of bytes, depending on the pointer type
- correspondence between indexing and pointer arithmetic is very close.
- the value of a variable or expression of type array is the address of element zero of the array. 

<code>
    pa = &a[O];
</code>

- pa and a have **identical values**. 
- the name of an array is a synonym for the location of the initial element.
the assignment <code>pa=&a[0]</code> can also be written as
<code>
    pa = a;
    </code>

*It means exactly the same thing and it is compiled to the same machine code!*

- a reference to a[i] can also be written as <code> * (a+i)</code>. 
- an array-and-index expression is equivalent to one written as a pointer and offset.
In evaluating a[i], C converts it to <code>* (a+i)</code> immediately; the two forms are equivalent. Applying the operator & to both parts of this equivalence, it follows that &a[i] and a+i are also identical: a+i is the address of the i-th element beyond a. As the other side of this coin, if pa is a pointer, expressions may use it with a subscript; pa[i] is identical to <code>*(pa+i)</code>. 


- A pointer is a variable, so pa=a and pa++ are legal. 
- But an array name is *not* a variable; constructions like a=pa and a++ are illegal.
- When an array name is passed to a function, what is passed is the location of the initial element. 

Within the called function, this argument is a local variable, and so an **array name parameter is a pointer not an array name**, that is, a variable containing an address. We can use this fact to write another version of strlen, which computes the length of a string.

Note also that when you pass an array to a function:
<code>
    ...
    int a[7]={0,1,2,3,4,5,6};
    x=f(a)
    ...    
    double f(int a[])
        {
        double x;
        a[5]=15;
        x=a[0]+a[5];
        return x;
        }
    </code>
the content of the array can **changed** into the function (because you eventually are passing to the funcion a pointer, not a value).

The result of the code fragment above is that x is set to 15.0 **and** after the call, a[5] is also set to 15.

In [None]:
/* strlen: return length of string s */
int strlen(char *s)
    {
        int n;

        for (n = 0; *s != '\0'; s++)
            n++;
        return n;
    }

Since s is a pointer, incrementing it is perfectly legal; s++ has no effect on the character string in the function that called strlen, but merely increments
strlen's private copy of the pointer. That means that calls like
<code>
    strlen( "hello, world"); /* string constant */
    strlen (array) ; /* char array[100]; */
    strlen(ptr); /* char *ptr; */
</code>
all work.

As formal parameters in a function definition,
<code>
    char s[]
    </code>
and
<code>
    char * s
    </code>
 **are equivalent**
 
It is possible to pass part of an array to a function, by passing a pointer to the beginning of the subarray. For example, if a is an array,
<code>
    f(&a[2])
    </code>
and
<code>
    f(a+2)
</code>       
both pass to the function f the address of the subarray that starts at a [2].
Within f, the parameter declaration can read
<code>
    f(int arr[])
         { ... }
</code>
or
<code>
    f(int * arr)
         { ... }
</code>
So as far as f is concerned, the fact that the parameter refers to part of a larger array is of no consequence.
If one is sure that the elements exist,  *it is also possible to index backwards in an array** ;  p[-1], p[-2], and so on are syntactically legal, and refer to the elements that immediately precede p[0]. Of course, it is illegal to refer to objects that are not within the array bounds.

**BEWARE. This will *not* always result in an error if the memory location is accessible. Another really nasty bug.**
 
   

# Address Arithmetic

- p++ increments p to point to the next element
- p-- decrements it
- p += 5 makes p to point to the fifth next element **in RAM** (not necessarily in a vector)



Let us illustrate by writing a rudimentary storage allocator. There
are two routines. The first, alloc(n), returns a pointer p to n .consecutive
character positions, which can be used by the caller of a1loc for storing char-
acters. The second, afree(p), releases the storage thus acquired so it can be
re-used later. The routines are "rudimentary" because the calls to afree must
be made in the opposite order to the calls made on alloc. That is, **the storage
managed by alloc and afree is a stack, or last-in, first-out list.** 

The standard library provides analogous functions called malloc and free that have no
such restrictions; in Section 8.7 we will show how they can be implemented.

**BEWARE: K&R are extremely optimistic here. Usually malloc and free are implemented using a stack, too!**


- The easiest implementation is to have alloc hand out pieces of a large **character array** (bytes!) that we will call allocbuf. 
- This array is private to alloc and afree. 
- Since they deal in pointers, not array indices, no other routine need know the name of the array, which can be declared *static* in the source file containing alloc and afree, and thus be invisible outside it. 

The other information needed is how much of allocbuf has been used. We use a pointer, called allocp, that points to the next free element. When alloc is asked for n characters, it checks to see if there is enough room left in allocbuf. If so, alloc returns the current value of allocp (i.e., the beginning of the free block), then increments it by n to point to the next free area. If there is no room, alloc returns zero. afree (p) merely sets allocp to p if p is inside allocbuf.

before call to alloc:
![cap4image1.png](attachment:cap4image1.png)

after call to alloc:
![cap4image2.png](attachment:cap4image2.png)



In [None]:

#define ALLOCSIZE 10000 /* size of available space */
 
static char allocbuf[ALLOCSIZE]; /* storage for alloc */
static char *allocp = allocbuf;  /* next free position */

char *alloc(int n) /* return pointer to n characters */
    {
        if (allocbuf + ALLOCSIZE - allocp >= n)  /* it fits */
            {
            allocp += n;
            return allocp - n; /* old p */
            } 
        else /* not enough room */
            return 0;
}

void afree(char *p) /* free storage pointed to by p */
    {
        if (p >= allocbuf && p < allocbuf + ALLOCSIZE)
        allocp = p;
    }


In general a pointer can be initialized just as any other variable can, though normally the only meaningful values are zero or an expression involving the addresses of previously defined data of appropriate type. The declaration
<code>
    static char *allocp = allocbuf;
</code>

defines allocp to be a character pointer and initializes it to point to the beginning of allocbuf, which is the next free position when the program starts.

The test
<code>
    if (allocbuf + ALLOCSIZE - allocp >= n) { /* it fits */
</code>
checks if there's enough room to satisfy a request for n characters. If there is,the new value of allocp would be at most one beyond the end of al1ocbuf.

If the request can be satisfied, alloc returns a pointer to the beginning of a block of characters (notice the declaration of the function itself). If not, allocmust return some signal that no space is left. **C guarantees that zero is never a valid address for data, so a return value of zero can be used to signal an abnormal event**, in this case, no space.

Pointers and integers are not interchangeable. Zero is the sole exception: the
constant zero may be assigned to a pointer, and a pointer may be compared
with the constant zero. **The symbolic constant NULL is often used in place of
zero,** as a mnemonic to indicate more clearly that this is a special value for a
pointer. NULL is defined in <stdio. h>. We will use NULL henceforth.
Tests like
<code>
    if (allocbuf + ALLOCSIZE - allocp >= n) { /* it fits */
</code>
and
<code>
    if (p >= allocbuf && p < allocbuf + ALLOCSIZE)
     </code>
show several important facets of pointer arithmetic. First, pointers may be compared under certain circumstances. If p and q point to members of the same array, then relations like ==, 1=, <, >=, etc., work properly. For example,
    <code>
    p < q
         </code>
is true if p points to an earlier member of the array than q does. Any pointer can be meaningfully compared for equality or inequality with zero. But the behavior is undefined for arithmetic or comparisons with pointers that do not point to members of the same array. (There is one exception: the address of the first element past the end of an array can be used in pointer arithmetic')
        
**BEWARE. This is not true nowadays. Pointer comparison is always legal. But can easily be not-meaningful for data not belonging to the same array.**        
        
Second, we have already observed that a pointer and an integer may be added or subtracted. The construction
    <code>
    p + n
        </code>
means the address of the n-th object beyond the one p currently points to. This is true regardless of the kind of object p points to; n is scaled according to the size of the objects p points to, which is determined by the declaration of p. If an int is four bytes, for example, the int will be scaled by four.

Pointer subtraction is also valid: if p and q point to elements of the same array, and p<q, then q-p+ 1 is the number of elements from p to q inclusive. This fact can be used to write yet another version of strlen:


In [None]:
/* strlen: return length of string s */
    int strlen(char *s)
    {
        char *p = s;
    
        while (*p != '\0')
            p++;
        return p - s;
    }

p-s gives the number of characters advanced over, that is, the string length. 

**Pointer arithmetic is consistent:** if we had been dealing with floats, which occupy more storage than chars, and if p were a pointer to float, p++ would advance to the next float. Thus we could write another version of alloc that maintains floats instead of chars, merely by changing char to float throughout alloc and afree. All the pointer manipulations automatically take into account the size of the object pointed to.

The **valid pointer operations** are 
- assignment of pointers of the same type, 
- adding or subtracting a pointer and an integer, 
- subtracting or comparing two pointers to members of the same array, 
- assigning or comparing to zero. 

All other pointer arithmetic is illegal. It is not legal to add two pointers, or to multiply or divide or shift or mask them, or to add float or double to them, or even, except for <code>void *</code>, to assign a pointer of one type to a pointer of another type without a cast.


# Character Pointers and Functions

A string constant, written as
~~~text
"I am a string"
~~~
is an array of characters. 

In the internal representation, the array is **terminated with the null character '\0'** so that programs can find the end. 
The length in storage is thus one more than the number of characters between the double quotes.
Perhaps the most common occurrence of string constants is as arguments to functions, as in
<code>
    printf( "hello, world\n");
</code>
When a character string like this appears in a program, access to it is through a **character pointer**; printf receives a pointer to the beginning of the character array. That is, a string constant is accessed by a pointer to its first element.
String constants need not be function arguments. If pmessaqe is declared
as
<code>
    char * pmessage;
</code>
then the statement
<code>
    pmessage = "now is the time";
</code>
assigns to pmessaqe **a pointer to the character array.** 
*Warning; this is a read-only string, whose value resides into the executable!*

This is not a string copy; only pointers are involved. C does not provide any operators for process-
ing an entire string of characters as a unit.

There is an important difference between these definitions:
<code>
    char amessage[] = "now is the time"; /* an array */
    char *pmessage = "now is the time"; /* a pointer */
    </code>
amessaqe is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed but amessaqe will always refer to the same storage. 

**On the other hand, pmessaqe is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents.**

<code>
    *(p+4) = 'I'
    </code>
can result in a run-time error (access violation) or nothing but it will **NOT** result in changing the string to
~~~text
    "now Is the time"
~~~

We will illustrate more aspects of pointers and arrays by studying versions of two useful functions adapted from the standard library. The first function is strcpy(s , t), which copies the string t to the string s. It would be nice just to say s=t but this copies the pointer, not the characters. To copy the characters, we need a loop. The array version is first:

<code>
/* strcpy: copy t to s; array subscript version */
    void strcpy(char *s, char *t)
    {
        int i;
        i = 0;
        while ((s[i] = t[i]) != '\0')
            i++; 
    }
</code>

For contrast, here is a **version of strcpy with pointers**:

<code>
    /* strcpy: copy t to s; pointer version 1 */
    void strcpy(char *s, char *t)
    {
        while((*s = *t) != '\0') 
        {
            s++;
            t++;
        }
    }
    </code>
Because arguments are passed by value, strcpy can use the parameters s and t in any way it pleases. Here they are conveniently initialized pointers, which are marched along the arrays a character at a time, until the ' \0' that terminates t has been copied to s.

In practice, strcpy would not be written as we showed it above. Experienced C programmers would prefer
    

In [None]:
/* strcpy: copy t to S; pointer version 2 */

void strcpy(char *s, char *t)
    {
        while ((*s++ = *t++) != '\0 )
            ;
    }


This moves the increment of s and t into the test part of the loop. The value of <code>*t++</code> is the character that t pointed to before t was incremented; the postfix ++ doesn't change t until after this character has been fetched. In the sameway, the character is stored into the old s position before s is incremented.

This character is also the value that is compared against '\0' to control the loop. The net effect is that characters are copied from t to s, up to and including the terminating' \0'.

As the final abbreviation, observe that a comparison against '\0' is redundant, since the question is merely whether the expression is zero. So the function would likely be written as


In [None]:
* strcpy: copy t to s; pointer version 3 */
void strcpy(char *s, char *t)
{
    while (*s++ = *t++)
        ;
}


Although this may seem cryptic at first sight, the notational convenienceis considerable, and the idiom should be mastered, because you will see it frequently in C programs.

**WARNING: K&R are absolutely right here. You will find a lot of code lines written as above!**

The strcpy in the standard library <string.h> returns the target string as its function value.

The second routine that we will examine is strcmp(s, t), which compares the character strings s and t, and returns negative, zero or positive if s is lexicographically less than, equal to, or greater than t. The value is obtained by subtracting the characters at the first position where sand t disagree.


In [None]:
/* strcmp: return <0 if s<t, 0 if s==t, >0 if s>t */
int strcmp(char *s, char *t)
{
    int i;

    for (i = 0; s[i] == t[i]; i++)
        if (s[i] == '\0')
            return 0;

    return s[i] - t[i];
}

the pointer version of strcmp:

In [None]:
/* strcmp: return <0 if s<t, 0 if s==t, >0 if s>t */
int strcmp(char *s, char *t)
{
    for ( ; *s == *t; s++, t++)
        if (*s == '\0' )
            return 0;
    return *s - *t;
}


Since ++ and -- are either prefix or postfix operators, other combinations of * and ++ and -- occur, although less frequently. For example, 

<code>
    *--p
    </code>
decrements p before fetching the character that p points to. In fact, the pair of expressions
<code>
    *p++ = val; /* push val onto stack */
    val = *--p; /* pop top of stack into val */
</code>
are the standard idioms for pushing and popping a stack.

The header <string. h> contains declarations for the functions mentioned
in this section, plus a variety of other string-handling functions from the standard library.

**Exercise 5-3**. Write a pointer version of the function strcat that we showed in Chapter 2: strcat(s ,t ) copies the string t to the end of s. *You could also write this one from scratch...*

**Exercise 5-4**. Write the function strend (s ,t), which returns 1 if the string t occurs at the end of the string s, and zero otherwise. 

Exercise 5-5. Write versions of the library functions strncpy, strncat, and strncmp; which operate on at most the first n characters of their argument strings. For example, strncpy (s,t , n) copies at most n characters of t to s.

Exercise 5-6. Rewrite appropriate programs from earlier chapters and exercises with pointers instead of array indexing. Good possibilities include getline (Chapters 1 and 4), atoi, i toa, and their variants (Chapters 2, 3, and 4), reverse (Chapter 3), and strindex and getop (Chapter 4). 


# Pointer Arrays; Pointers to Pointers

- pointers are themselves variables and can be organized in arrays

In Chapter 3 we presented a Shell sort function that would sort an array of integers, and in Chapter 4 we improved on it with a quicksort. 
The same algorithms will work, except that **now we have to deal with lines of text**, which are of different lengths, and which, unlike integers, can't be compared or moved in a single operation. We need a data representation that will cope efficiently and conveniently with variable-length text lines.

This is where the array of pointers enters. If the lines to be sorted are stored end-to-end in one long character array, then each line can be accessed by a pointer to its first character. The pointers themselves can be stored in an array.

Two lines can be compared by passing their pointers to strcmp. When two out-of-order lines have to be exchanged, the pointers in the pointer array are exchanged, not the text lines themselves.

![image.png](attachment:image.png)


The sorting process has three steps:

~~~text
read all the lines of input
sort them
print them in order
~~~

As usual, it's best to divide the program into functions that match this natural division, with the main routine controlling the other functions. Let us defer the sorting step for a moment, and concentrate on the data structure and the input and output.

The input routine has to collect and save the characters of each line, and build an array of pointers to the lines. It will also have to count the number of input lines, since that information is needed for sorting and printing. Since the input function can only cope with a finite number of input lines, it can return some illegal line count like -1 if too much input is presented. 

The output routine only has to print the lines in the order in which they appear in the array of pointers.

In [None]:
#include <string.h>
#include<stdio.h>

#define MAXLINES 5000   /* max Ilines to be sorted */
char *lineptr[MAXLINES]; /* pointers to text lines */
                         /* note that this is a GLOBAL variable */

int readlines(char *lineptr[], int nlines);
void writelines(char *lineptr[], int nlines);
void qsort(char *lineptr[], int left, int right);

/* sort input lines */
int main())
    {
        int nlines; /* number of input lines read */

        if ((nlines = readlines(lineptr, MAXLINE)) >= 0) 
        {
            qsort(lineptr, 0, nlines-1);
            writelines(lineptr, nlines);
            return 0;
        } 
        else 
        {
            printf("error: input too big to sort\n");
            return 1;
        }
    }


#define MAXLEN 1000 /* max length of any input line */
int getline(char *, int);
char *alloc(int);

/* readlines: read input lines */
int readlines(char *lineptr[], int maxlines)
    {
    int len, nlines;
    char *p, line[MAXLEN];

    nlines = 0;
    while((len = getline(line, MAXLEN)) > 0)
        if (nlines >= maxlines || (p = alloc(len)) == NULL)  /* note that HERE is where each string is allocated */
            return -1;
        else 
        {
            line[len-1] = '\0'; /* delete newline */
            strcpy(p, line);
            lineptr[nlines++] = p;    /* ...and HERE is where the POINTER to the string is stored in the array */
        }
    
    return nlines;
    }

/* writelines: write output lines */
void writelines(char *lineptr[], int nlines)
    {
    int i;

    for (i = 0; i < nlines; i++)
        printf("%s\n", lineptr[i]);
    }



The function getline is from Section 1.9.

The main new thing is the declaration for lineptr:
<code>
    char * lineptr[MAXLINES]
</code>
says that lineptr is an array of MAXLINES elements, **each element of which
is a pointer to a char**. That is, lineptr[i] is a character pointer, and
<code>*lineptr[i]</code> is the character it points to, the first character of the i-th saved text line.

**NOTE: this thing is often called double indirection**

Since lineptr is itself the name of an array, it can be treated as a pointer in the same manner as in our earlier examples, and writelines  can be written instead as

<code>
     /* writelines: write output lines */
    void writelines(char *lineptr[], int nlines)
    {
        while (nlines-- > 0)
            printf ("%s\n", *lineptr++);
    }
</code>

Initially <code>*lineptr</code> points to the first line; each increment advances it to the next line pointer while nlines is counted down.

With input and output under control, we can proceed to sorting. 
The quick-sort from Chapter 4 needs minor changes: the declarations have to be modified, and the comparison operation must be done by calling strcmp. The algorithm remains the same, which gives us some confidence that it will still work.



In [None]:
/* qsort: sort v[left] ...v[rightl into increasing order */

void qsort(char *v[], int left, int right)
{
    int i, last;
    void swap(char *v[], int i, int j);
    
     if (left >= right) /* do nothing if array contains */
        return;         /* fewer than two elements */

    swap(v, left, (left + right)/2);
    last = left;
    for (i = left+1; i <= right; i++)
        if (strcmp(v[i], v[left]) < 0) /* note the strcmp! here is where the double indirection is at work*/
            swap(v, ++last, i);
    swap(v, left, last);
    qsort(v, left, last-1);
    qsort(v, last+1, right);
}

/* swap with triviaL changes: */
/* interchange v[i] and v[j] */
void swap(char *v[], int i, int j)
{
    char *temp;

    temp = v[i];
    v[i] = v[j];
    v[j] = temp;
}


Since **any individual element of v (alias lineptr) is a character pointer**, temp must be also, so one can be copied to the other.

**Exercise 5-7. Rewrite readlines to store lines in an array supplied by main, rather than calling alloc to maintain storage. How much faster is the program?**


# Multi-dimensional Arrays

 - C provides rectangular multi-dimensional arrays
 - K&R say that they are not very used but this is **not** true in scientific computation!

Consider the problem of date conversion, from *day of the month* to *day of the year* and vice versa. For example, March 1 is the 60th day of a non-leap year, and the 61st day of a leap year. Let us define two functions to do the conversions: day_of_year converts the month and day into the day of the year, and month_day converts the day of the year into the month and day.

Since this latter function computes two values, the month and day arguments will be pointers:
<code>
    month_day(1988, 60, &m, &d)
</code>
sets m to 2 and d to 29 (February 29th). 
These functions both need the same information, a table of the number of days in each month ("thirty days hath September ..."). Since the number of days per month differs for leap years and non-leap years, it's easier to separate them into **two rows of a two-dimensional array** than to keep track of what happens to February during computation. The array and the functions for performing the transformations are as follows:



In [None]:
static char daytab[2][13] = {
{0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}, /* note: in C the LAST dimension is the fastest to vary */
{0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}  /* in FORTRAN is the opposite */
} ;

/* day_of_year: set day of year from month & day */
int day_of_year(int year, int month, int day)
{
    int i, leap;
    leap = year%4 == 0  && year%100 != 0 || year%400 == 0;
    for (i = 1; i < month; i++)
        day += daytab[leap][i];
    return day;
}

/* month_day: set month, day from day of year */
void month_day(int year, int yearday, int *pmonth, int *pday)
{
    int i, leap;

    leap = year%4 == 0 && year%100 != 0 || year%400 == 0;
    for (i = 1; yearday> daytab[leap][i]; i++)
        yearday -= daytab[leap][i];
    *pmonth = i;
    *pday = yearday;
}

**Recall that the arithmetic value of a logical expression, such as the one for leap, is either zero (false) or one (true), so it can be used as a subscript of the array day tab.**

The array day tab has to be external to both day_of_year and month_day, so they can both use it. We made it char to illustrate a legitimate use of char for storing small non-character integers.

day tab is the first two-dimensional array we have dealt with. In C, **a two-dimensional array is really a one-dimensional array, each of whose elements is an array**. Hence subscripts are written as
<code>
    daytab[i][j]  /* [row][col] */
</code>
rather than
<code>
    daytab[i,j]  /* WRONG */
</code>

Other than this notational distinction, a two-dimensional array can be treated in much the same way as in other languages. Elements are stored by rows, so the rightmost subscript, or column, varies fastest as elements are accessed in storage order.

An array is initialized by a list of initializers in braces; each row of a two- dimensional array is initialized by a corresponding sub-list. We started the array day tab with a column of zero so that month numbers can run from the natural 1 to 12 instead of 0 to 11. Since space is not at a premium here, this is clearer than adjusting the indices.

If a two-dimensional array is to be passed to a function, **the parameter declaration in the function must include the number of columns; the number of rows is irrelevant**, since what is passed is, as before, a pointer to an array of rows, where each row is an array of 13 ints. In this particular case, it is a pointer to objects that are arrays of 13 ints. Thus if the array day tab is to be passed to a function f, the declaration of f would be
<code>
    f(int daytab [2] [ 13]) { ... }
</code>
It could also be
<code>
    f(int daytab [ ] [ 13]) { ... }
</code>
since the number of rows is irrelevant, or it could be
<code>
    f(int (* daytab) [ 13]) { ... }
    </code>
which says that **the parameter is a pointer to an array of 13 integers**. The
parentheses are necessary since brackets [] have higher precedence than * .
*Without parentheses, the declaration*
<code>
    int * daytab[13]
</code>
*is an array of 13 pointers to integers*. 

More generally, **only the first dimension (subscript) of an array is free; all the others have to be specified.** 


**We will make wide use of multidimensional arrays in the rest of our course so no specific exercize here**

Exercise 5-8. There is no error checking in day_of_year or month_ day. Remedy this defect. 



# Initialization of Pointer Arrays

Consider the problem of writing a function month_name (n), which returns a pointer to a character string containing the name of the n-th month. This is an ideal application for an internal static  array. month_name contains a private array of character strings, and returns a pointer to the proper one when called. This section shows how that array of names is initialized. 
The syntax is similar to previous initializations:


In [None]:
/* month_name: return name of n-th month */
char *month_name(int n)
{
    static char *name[] = {
    "Illegal month",
    "January", "February", "March",
    "April", "May", "June",
    "July", "August", "September",
    "October", "November", "December"
    } ;

    return (n < 1 || n > 12) ? name[0] : name[n];
}

The declaration of name, which is **an array of character pointers**, is the same as liineptr  in the sorting example. 

The initializer is a list of character strings; each is assigned to the corresponding position in the array. 
*Note that with such an initialization it is not possible to modify any of the strings*.

The characters of the i-th string are placed somewhere, and a pointer to them is stored in name[i]. Since the size of the array name is not specified, the compiler counts the initializers and fills in the correct number.


# Pointers vs. Multi-dimensional Arrays

Newcomers to C are sometimes confused about the difference between a
two-dimensional array and an array of pointers, such as name in the example
above. Given the definitions
<code>
    int a [ 10][20] ;
    int *b[10];
    </code>
then a[3][4] and b [3][4] are both syntactically legal references to a single int 

But **a is a true two-dimensional array**: 200 int-sized locations have been set aside, and the conventional rectangular subscript calculation 20xrow+col is used to find the element a I[row Hcol]. 

For b, however, **the definition only allocates 10 pointers and does not initialize them**; initialization must be done explicitly, either statically or with code. 

Assuming that each element of b does point to a twenty-element array, then there will be 200 ints set aside, plus ten cells for the pointers. 

*The important advantage of the pointer array is that the rows of the array may be of different lengths*. That is, each element of b need not point to a twenty-element vector; some may point to two elements, some to fifty,
and some to none at all.

Although we have phrased this discussion in terms of integers, by far the
most frequent use of arrays of pointers is to store character strings of diverse
lengths, as in the function month_name. Compare the declaration and picture
for an array of pointers:
    <code>
    char *name[] = { "Illegal month", "Jan, "Feb", "Mar" };
    </code>
    
![cap4image3.png](attachment:cap4image3.png)

with those for a two-dimensional array:
<code>
    char aname[] [15] = { "Illegal month", "Jan", "Feb", "Mar" };
    </code>
   
![cap4image4.png](attachment:cap4image4.png)

...and note the wasted space because in C we need to define the leftmost dimensions. Also, instead that with constant objects, pointers can **point to memory locations that are dynamically allocated and deallocated runtime**. This is of course not possible with multidimensional arrays.

**Exercise 5-9. Rewrite the routines day_of_year and month_day with pointers instead of indexing.**


# Command-line Arguments

- a way to pass command-line arguments to a program
- main() receives two arguments. 
- First (conventionally argc, for "argument count"): the number of command-line arguments that the program receives.
- Second (argv, for "argument vector" is a pointer to an **array** of character strings. These strings are the command-line arguments.
- argv[0] is always the name of the program
- ..thus argc is aways *at least* one. If argc==1 no command-line arguments have been given

The simplest illustration is the program echo, which echoes its command-
line arguments on a single line, separated by blanks. That is, the command
<code>
    echo hello, world
    </code>
prints the output
~~~text
hello, world
~~~

In the example above, argc is 3, and argv [0], argv [1], and argv[2] are "echo", "hello, ", and "world" respectively. 
The first optional argument is argv[1] and **the last is argv[argc-1].**  
Additionally, the standard requires that argv[ argc] be a null pointer.

![cap4commandline.png](attachment:cap4commandline.png)

The first version of echo treats argv as an array of character pointers:
~~~text
    #include "stdio.h"
    /* echo command-line arguments; 1st version */
    main(int argc, char *argv[])
    {
        int i;
        for (i = 1; i < argc; i++)  /* does not start from 0, because it is the program name. */
            printf( "%s%s", argv[i], (i < argc-1) ? " " : ""); /* note the use of printf WITHOUT \n */
        printf ("\n");
        return 0;
    }    
~~~
- Since argv is a pointer to an array of pointers, *we can manipulate the pointer* rather than index the array. 
- Note that main() behaves exactly as every other function in this regard. <code>* argv[]</code> is considered to have been declared **outside** main. If it were declared *inside* main(), the *name* argc would have been fixed and impossible to change as every vector.

This next variation is based on incrementing argv, which is a pointer to pointer to char, while argc is counter down:
    
<code>
    #include <stdio.h> /* echo command-line arguments; 2nd version */
    main(int argc, char *argv[])
    {
        while (--argc > 0)
            printf ("%s%s", *++argv, (argc > 1) ? " " :"");
        printf( "\n");
        return 0;
    }
    </code>
    

Alternatively, we could write the printf statement as
<code>
    printf((argc > 1) ? "%s " : "%s", *++argv);
</code>

This shows that the format argument of printf can be an expression too.

As a second example, let us make some enhancements to the pattern-finding program from Section 4.1. 
If you recall, *we wired the search pattern deep into the program*, an obviously unsatisfactory arrangement. 

Following the lead of the UNIX program grep, let us change the program so the pattern to be matched is specified by the first argument on the command line.



In [None]:
#include <stdio.h>
#include <string.h>
#define MAXLINE 1000

int getline(char *line, int max);

/* find: print lines that match pattern from 1st arg */
main(int argc, char *arqv[])
{
    char line[MAXLINE];
    int found = 0;

    if (argc != 2)
        printf ("Usage: find pattern\n");
    else
        while (getline(line, MAXLINE) > 0)
            if (strstr(line, argv[1]) != NULL) 
            {
                printf("%s", line);
                found++;
            }
    return found;
}

The standard library function **strstr (s, t)** returns a pointer to the first occurrence of the string t in the string s, or NULL if there is none. It is declared in <string. h>.

The model can now be elaborated to illustrate further pointer constructions. Suppose we want to allow two optional arguments. One says "print all lines except those that match the pattern;" the second says "precede each printed line by its line number."

A common convention for C programs on UNIX systems is that an argument that begins with a minus sign introduces an optional flag or parameter. If we choose -x {for "except") to signal the inversion, and -n ("number") to request
line numbering, then the command
<code>
    find -x -n pattern
    </code>
will print **each line that doesn't match the pattern, preceded by its line number**.

*Optional arguments should be permitted in any order, and the rest of the program should be independent of the number of arguments that were present*. 

Furthermore, it is convenient for users if option arguments can be combined, as in
<code>
    find -nx pattern
    </code>
Here is the program:


In [None]:
#include <stdio.h>
#include <string.h>
#define MAXLINE 1000

int getline(char *line, int max);

/* find: print lines that match pattern from 1st arg */
main(int argc, char *argv[])
{
    char line[MAXLINE];
    long lineno = 0;
    int c, except = 0, number=0, found=0;

    while (--argc > 0 && (*++argv)[O] == '-')  /* note the increment of the pointer after "-" is found */
        while (c = *++argv[O])                 /* ...and after each character is parsed */
    switch (c)     /* here you have a very common use of the command "switch" */
        {
        case 'x':
            except = 1;
            break;
        case 'n':
            number = 1;
            break;
        default:
            printf("find: illegal option %c\n", c);
            argc = 0;
            found = -1;
            break;
        }
    if (argc != 1)
        printf("Usage:find -x -n pattern\n");
    else
        while (getline(line, MAXLINE) > 0) 
            {
            lineno++;
            if((strstr(line, *argv) != NULL) != except) 
                {
                if (number)
                    printf ("%ld:", lineno);
            printf("%s", line);
            found++;
            }
        }
    return found;
}


- argc is decremented and argv is incremented before each optional argument. 
- At the end of the loop, if there are no errors, argc tells how many arguments remain unprocessed and argv points to the first of these. 
- Thus argc should be 1 and <code>* argv</code> should point at the pattern. 
- Notice **that <code>* ++argv</code> is a pointer to an argument string, so <code>(* ++argv)[0]</code> is its first character**.  An alternate valid form would be <code>** ++argv</code>. 
- Because [] binds tighter than * and ++, the parentheses are necessary; without them the expression would be taken as <code>* ++(argv[0 ])</code>. In fact, that is what we used in the inner loop, where the task is to walk along a specific argument string. 
- In the inner loop, the expression <code>* ++argv[0]</code> increments the pointer argv[O].

It is rare that one uses pointer expressions more complicated than these; in such cases, breaking them into two or three steps will be more intuitive.

**I would say that also what K&R illustrate here is by far too complex to be easily read. Try to rephrase this code into something more readable, even if less elegant.**



Exercise 5-10. Write the program expr, which evaluates a reverse Polish
expression from the command line, where each operator or operand is a separate
argument. For example,
<code>
    expr 2 3 4 + *
</code>
evaluates 2 x (3+4).
 
Exercise 5-11. Modify the programs entab and detab (written as exercises in
Chapter 1) to accept a list of tab stops as arguments. Use the default tab set-
tings if there are no arguments. 


Exercise 5-12. Extend entab and detab to accept the shorthand
<code>
    entab -m +n
    </code>
to mean tab stops every n columns, starting at column m. Choose convenient (for the user) default behavior. 

**Exercise 5-13. Write the program tail, which prints the last n lines of its
input. By default, n is 10, let us say, but it can be changed by an optional
argument, so that
<code>
    tail -number
    </code>
prints the last number lines. The program should also accept the format 
<code>
    tail -n number
    </code>
The program should behave rationally no matter how unreasonable the input or the value of number. Write the program so it makes the best use of available storage; lines should be stored as in the sorting program of Section 5.6, not in a two-dimensional array of fixed size.** 


# Pointers to Functions

In C, a function itself is not a variable, but it is possible to define pointers to functions, which can be assigned, placed in arrays, passed to functions, returned by functions, and so on. 

We will illustrate this by modifying the sorting procedure written earlier in this chapter so that if the optional argument -n is given; it will sort the input lines numerically instead of lexicographically.

A sort often consists of three parts:
- a comparison that determines the ordering of any pair of objects
- an exchange that reverses their order, 
- a sorting algorithm that makes comparisons and exchanges until the objects are in order. 

**The sorting algorithm is independent of the comparison and exchange operations, so by passing different comparison and exchange functions to it, we can arrange to sort by different criteria**. 
This is the approach taken in our new sort.

Note that, using this approach, the *same* code can be reused and applied to different cases just changing the function pointers.

Lexicographic comparison of two lines is done by **strcmp**, as before; we will also need a routine **numcmp** that compares two lines on the basis of numeric value and returns the same kind of condition indication as strcmp does. 

These functions are declared ahead of main and **a pointer to the appropriate one is passed to qsort**. We have skimped on error processing for arguments, so as to concentrate on the main issues.



In [None]:
#include <stdio.h>
#include <string.h>

#define MAXLINES 5000 /* max #lines to be sorted */
char *lineptr[MAXLINES]; /* pointers to text lines */

int readlines(char *lineptr[], int nlines);
void writelines(char *lineptr[], int nlines);

void qsort(void *lineptr[], int left, int right,
    int (*comp) (void *, void )); /*here is where a pointer to a function, "(*comp)(void *, void *)" is used */

int numcmp(char *, char *);

/* sort input lines */
main(int argc, char *argv[])
{
    int nlines; /* number of input lines read */
    int numeric = 0; /* 1 if numeric sort */

    if (argc > 1 && strcmp(argv[1], "-n") == 0)
        numeric = 1;
    if ((nlines = readlines(lineptr, MAXLINE)) >= 0) 
        {
        qsort(void **) lineptr, 0, nlines-1,
            (int (*)(void*,void))(numeric ? numcmp : strcmp)); /* here is where the pointer to the correct function is passed */
        writelines(lineptr, nlines);
        return 0;
        } 
    else 
        {
        printf("input too big to sort\n");
        return 1;
        }
}

- In the call to qsort, strcmp and numcmp are addresses of functions. 
- Since they are known to be functions, the & operator is not necessary, in the same way that it is not needed before an array name.

We have written qsort so it can process any data type, not just character strings. As indicated by the function prototype, qsort expects an array of pointers, two integers, and a function with two pointer arguments. 

- **The generic pointer type void * is used for the pointer arguments**. 
- Any pointer can be cast to void * and back again without loss of information (so we can call qsort by
casting arguments to void * )  
- The elaborate cast of the function argument casts the arguments of the comparison function. These will generally have no effect on actual representation, but assure the compiler that all is well.



In [None]:
/* qsort: sort v[left] •..v[right] into increasing order */
void qsort(void *v[], int left, int right,
        int (*comp)(void *, void *) )
{
    int i, last;
    void swap(void *v[], int, int);

    if (left >= right)  /* do nothing if array contains */
        return;         /* fewer than two elements */
    swap(v, left, (left + right)/2);
    last = left;
    for (i = left+1; i <= right; i++)
        if ((*comp)(v[i], v[left]) < 0)   /* here the appropriate function is called. */
            swap(v, ++last, i);           /* so no need to write two DIFFERENT versions of qsort */
    swap(v, left, last);
    qsort(v, left, last-1, comp);
    qsort(v, last+1, right, comp);
} 

The declarations should be studied with some care. The fourth parameter of qsort is
<code> 
    int (* comp) (void * , void * )
</code>
which says that comp is a pointer to a function that has two void * arguments
and returns an int
The use of comp in the line
<code>
    if (* comp)(v[i], v[left]) < 0)
    </code>
    is consistent with the declaration: comp is a pointer to a function, <code>* comp</code> is the function, and
<code>
    (* comp)(v[i], v[left])
    </code>
is the call to it. The parentheses are needed so the components are correctly
associated; without them,
<code>
    int * comp(void *, void *) /* WRONG */
    </code>
says that comp is a function returning a pointer to an int, which is very different.
    
We have already shown strcmp, which compares two strings. Here is numcmp, which compares two strings on a leading numeric value, computed by calling atof:
~~~text
    #include <stdlib.h>
    /* numcmp: compare s1 and s2 numerically */
    int numcmp(char *s1, char *s2)
    {
        double v1, v2;
        v1 = atof(s1);
        v2 = atof(s2);    
        if (v1 < v2)
            return -1;
        else if (v1 > v2)
            return 1;
        else
            return 0;
     }
~~~
    
The swap function, which exchanges two pointers, is identical to what we presented earlier in the chapter, except that the declarations are changed to void *.
<code>
    void swap(void * v[], int i, int j)
    {
        void * temp;
        temp = veil;
        v[i] = v[j];
        v[j] = temp;
     }
</code>

A variety of other options can be added to the sorting program; some make challenging exercises.

**Exercise 5-14. Modify the sort program to handle a -r flag, which indicates sorting in reverse (decreasing) order. Be sure that -r works with -n Of course try to use pointer to functions!**

Exercise 5-15. Add the option -f to fold upper and lower case together, so that case distinctions are not made during sorting; for example, a and A compare equal. 

Exercise 5-16. Add the -d ("directory order") option, which makes comparisons only on letters, numbers and blanks. Make sure it works in conjunction with -f. 

Exercise 5-17. Add a field-handling capability, so sorting may be done on fields within lines, each field sorted according to an independent set of options. (The index for this book was sorted with -df for the index category and -n for the page numbers') 

**Supplementary exercize: generalize the integrator ot the trascendent equation to three different equations. 
Accept a command-line argument to choose between them. Use pointer to functions to select the appropriate equation. I suggest the following additional equations:**

**3x=exp(x)** 

**x^3 + x + 1=0** 


# Complicated Declarations

You can try to read and understand this paragraph in K&R text. However, such a level of complication is considered beyond the scope of the present course.

I have to add that I would strongly disadvise the use of those kind of declaration, and would be quite annoyed when finding them in a code written by somebody else, but intended to be comprehensible by other collaborator.

Personally I do prefer code readability to code elegance.
