# Lecure III: Functions and Program Structure

Functions break large computing tasks into smaller ones, and enable people to build on what others have done instead of starting over from scratch.

Appropriate functions hide details of operation from parts of the program.

C programs generally consist of many small functions rather than a few big one.

To begin, let us design and write a program to print each line of its input that contains a particular "pattern" or string of characters.

Consider <code>pattern="ould"</code> and this text

~~~text
Ah Love! could you and I with Fate conspire 
To grasp this sorry Scheme of Things entire, 
Would not we shatter it to bits –- and then
Re-mould it nearer to the Heart’s Desire!
~~~

If we apply our algoritm to the text we get print only the lines that includes "ould".

~~~text
Ah Love! could you and I with Fate conspire 
Would not we shatter it to bits –- and then
Re-mould it nearer to the Heart’s Desire!
~~~

The job falls neatly into three pieces:

~~~c
while (c’è un’altra linea)
    if (la linea contiene la stringa voluta)
        stampala;
~~~

Although it's certainly possible to put the code for all of this in main, a better way is to use the structure to advantage by making each part a separate function.

"while c’è un’altra linea" is getline, a function that we wrote in Lecture 2, and "stampala" is printf. This means we need only write a routine to decide whether the line contains an occurrence of the pattern that we call <code>strindex</code>. The standard library provides a similar function called <code>strstr</code>. 
<code>strindex</code> returns the position or index in the strings where the pattern begins or -1 if the string doesn't containt the pattern.



In [5]:
#include <stdio.h>
#define MAXLINE 1000 /* massima lunghezza di una linea di input */
int getline2(char line[], int max);
int strindex(char source[], char searchfor[]);
char pattern[]="ould"; /* stringa da ricercare */
/* trova tutte le linee contenenti la stringa voluta */ 

int main()
{
    char line[MAXLINE];
    int found=0;
    while (getline2(line, MAXLINE)>0)
        if (strindex(line, pattern)>=0) {
            printf("%s", line);
            found++; 
        }
    return found;
}

/* getline: carica una linea in s, ritorna la lunghezza */ 
int getline2(char s[], int lim) {
    int c, i;
    i=0;
    while (--lim>0 && (c=getchar())!=EOF && c!='\n')
        s[i++]=c;
    
    if (c=='\n')
        s[i++]=c;
    s[i]='\0';
    return i; 
}

/* strindex: ritorna l’indice di t in s, -1 se t non compare */ 
int strindex(char s[], char t[]) {
    int i, j, k;
    for (i=0; s[i]!='\0'; i++) {
        for (j=i, k=0; t[k]!='\0' && s[j]==t[k]; j++, k++) ;
        if (k>0  &&  t[k]=='\0')
            return i; 
    }
    return -1; 
}


Each function definition has the form

~~~C
tipo-ritornato nome-funzione(dichiarazioni argomenti)
{
    dichiarazioni ed istruzioni;
}
~~~
Various parts may be absent; a minimal function is

~~~c
dummy() {}
~~~

which does nothing and returns nothing.

If the return type is omitted, <code>int</code> is assumed.

A program is just a set of definitions of variables and functions. Communication between the functions is by arguments and values returned by the functions, and through external variables.

The <code>return</code> statement is the mechanism for returning a value from the called function to its caller. Any expression can follow return:
~~~c
return expression;
~~~

The _expression_ will be converted to the return type of the function if necessary. Parentheses are often used around the _expression_, but they are optional.

The calling function is free to ignore the returned value.

A  <code>main</code> can return a value (see previous example). This value is available for use by the environment that called the program. If you are using a shell at the exit of the program the <code> echo ?</code> will print the exit value of the last executed program. Exit values different from 0 can be used to trigger errors or warnings.

## Handling multiple source files

The mechanics of how to compile and load a C program that resides on multiple source files vary from one system to the next. In Linux we use GCC.

Suppose that the three functions are stored in three files called _main.c_, _getline.c_ and _strindex.c_. Then the command

~~~bash
> gcc main.c getline.c strindex.c -o main.x
~~~
compiles the three files, placing the resulting object code in files _main.o_, getline.o_, and _strindex.o_, then loads them all into an executable file called _main.x_. Note that the object files are not saved by gcc unless specified.

~~~bash
> gcc -c main.c getline.c strindex.c -o main.x
~~~

produces the 3 object files. Once the object files are present, if there is an error, say in _main.c_, that file can be recompiled by itself and the result loaded with the previous object files, with the command

~~~shell
> gcc main.c qetline.o strindex.o -o main.x
~~~

## Functions Returning Non-Integers


So far our examples of functions have returned either no value (void) or an int but fuctions can return also other types (e.g. sqrt(x) returns a double).

To illustrate how to deal with this, let us write and use the function <code>atof(s)</code>, which converts the string s to its double-precision floating-point equivalent.

<code>atof(s)</code>  must declare the type of value it returns, and the  calling routine must know that atof returns a non-int value. One way to ensure this is to declare <code>atof(s)</code>  explicitly  in the calling routine. As shown in the code below (note that in this case function and main are in the same file, but  usually this is not the case.

In [14]:
#include  <ctype.h>
#include <stdio.h>


int main(){
    double result = 0, atof(char s[]);
    char stringa[] = "1234.56"; 
    result = atof(stringa);
    printf ("String to double = %f", result);
    return 0;
}

/* atof: converte la stringa s in un double */
double  atof(char s[])
{
    double val, power;
    int i, sign;
    for(i=0;isspace(s[i]);i++) /*tralascia gli spazi*/ ;
          sign=(s[i]=='-')?-1:1;
    if (s[i]=='+'  ||  s[i]=='-')
        i++;
    for (val=0.0; isdigit(s[i]); i++)
        val=10.0*val+(s[i]-'0');
    if (s[i]=='.')
        i++;
    for (power=1.0; isdigit(s[i]); i++) {
        val=10.0*val+(s[i]-'0');
        power*=10.0;
    }
    return sign*val/power;
}

String to double = 1234.560000

~~~c
    double result = 0, atof(char s[]);
~~~
says that result is a double variable, and that <code>atof(s)</code> is a function that takes one char[] argument and returns a double.

The function <code>atof(s)</code> must be declared and defined consistently.

**Warning:** If the function takes arguments, declare them; if it takes no arguments,use _void_.

Given atof, properly declared, we could write <code>atoi </code> (convert a string to int) in terms of it

~~~c
/* atoi: converte la stringa s in un intero, usando atof */ 
int atoi(char s[])
{
    double atof(char s[]);
    return (int) atof(s); /*cast*/
}
~~~

The value of <code>atof</code>, a double, is converted automatically to _int_ when it appears in this return, since the function <code>atoi </code> returns an into.

## External Variables

A C program consists of a set of external objects, which are either variables or functions. The adjective "external" is used in contrast to "internal," which describes the arguments and variables defined inside functions. External variables are defined outside of any function, and are thus potentially available to many functions

Functions themselves are always external. 

All references to external variables by the same name, even from functions compiled separately, are references to the same thing.

Because external variables are globally accessible, they provide an alternative to function arguments and return values for communicating data between functions.  This reasoning should be applied with some caution, for it can have a bad effect on program structure, and lead to programs with too many data connections between functions.

External variables are also useful because of their greater scope and lifetime.

Let see an example of usage of external variables based on Polish calculator.

What is the Polish notation?

In reverse Polish notation, each operator follows its operands; an infix expression like

~~~shell
(1 - 2) * (4 + 5) 
~~~~

is entered as

~~~shell
1 2 - 4 5 + *
~~~

Parentheses are not needed; the notation is unambiguous as long as we know how many operands each operator expects.

The structure of the program is a loop that performs the proper opera- tion on each operator and operand as it appears:

~~~c
while (il prossimo operatore o operando non è EOF) 
    if (numero)
        inseriscilo nello stack;
    else if (operatore)
        preleva gli operandi
        esegui l’operazione
        inserisci il risultato nello stack
    else if (new line)
        preleva e stampa il valore in cima allo stack
    else
        errore
~~~

The operations of pushing and popping a stack are trivial, but by the time error detection and recovery are added, they are long enough that it is better to _put each in a separate function_ than to repeat the code throughout the whole program.

The main design decision that has not yet been discussed is where the stack is, that is, which routines access it directly: 

* _main_ can  pass the stack and the current stack position to the routines;
* store the stack and its associated information in external variables accessible to the _push_ and _pop_ functions.

_main_ does not need to know about the stack so the second option is more elegant and functional.

Translating this outline into code is easy enough. If for now we think of the program as existing in one source file, it will look like this:

~~~c
#includes 
#defines
dichiarazioni di funzione per il main 
main() { .... }
variabili esterne per push e pop
void push(double f) { .... }
double pop(void) { .... } 
int getop(char s[]) { .... } 
funzioni chiamate da getop
~~~

The function main is a loop containing a big switch on the type of operator or operand

~~~c
#include <stdio.h>
#include  <math.h>
#define  MAXOP  100
#define  NUMBER  ’0’
int  getop(char s[]);
void  push(double);
double  pop(void);
/* per atof() */
/* dimensione massima di operatori ed operandi */ /* segnala che è stato trovato un numero */
/* calcolatrice in notazione Polacca inversa */ main()
{
    int type;
    double op2;
    char s[MAXOP];
    while ((type=getop(s))!=EOF)
    {
        switch (type)
        {
            case NUMBER:
                push(atof(s));
                break;
            case ’+’:
                push(pop()+pop());
                break;
            case ’*’:
                push(pop()*pop());
                break;
            case ’-’:
                op2=pop();
                push(pop()-op2);
                break;
            case ’/’:
                op2=pop();
                if (op2!=0.0)
                    push(pop()/op2);
                else
                    printf(“Errore: divisione per zero\n”);
                break;
            case ’\n’:
                printf(“\t%.8g\n”, pop());
                break;
            default:
                printf(“Errore: comando %s sconosciuto\n”, s); 
                break;
              } 
    }
return 0; 
}
~~~

Because + and * are commutative operators, the order in which the popped operands are combined is irrelevant, but for - and / the left and right operands must be distinguished.

Now the stack functions
~~~c
#define  MAXVAL  100
int sp=0; /* prossima posizione libera */
double val[MAXVAL]; /* stack dei valori */

/* push: inserisce f in cima allo stack */
void  push(double f)
{
    if (sp<MAXVAL)
        val[sp++]=f;
    else
        printf(“Errore: stack pieno; %g non inseribile\n”, f);
}


/* pop: preleva e ritorna il valore in cima allo stack */ 
double pop(void)
{
    if (sp>0)
        return val[sp—];
    else {
        printf(“Errore: lo stack è vuoto\n”);
        return 0.0; 
    }
}
~~~


A variable is external if it is defined outside of any function. Thus the stack and stack index that must be shared by push and pop are defined outside of these functions

Let us now turn to the implementation of getop, the function that fetches the next operator or operand. 

~~~c
#include  <ctype.h>
int  getch(void);
void  ungetch(int);
/* getop: legge il successivo operatore o operando numerico */ 
int getop(char s[]) {
    int i, c;
    while ((s[0]=c=getch())==' ' || c=='\t') ;
    s[1]='\0';
    if (!isdigit(c)  &&  c!='.')
        return c; /* non è un numero */ 
    i=0;
    if (isdigit(c)) /* legge la parte intera */ 
        while (isdigit(s[++i]=c=getch()))
            ;
    if (c==’.’) /* legge la parte frazionaria */
        while (isdigit(s[++i]=c=getch())) ;
    s[i]=’\0’;

    if (c!=EOF)
        ungetch(c);
    return NUMBER;
}
~~~

<code>getch</code> delivers the next input character to be considered; <code>ungetch</code> remembers the characters put back on the input, so that subsequent calls to <code>getch</code> will return them before reading new input.

~~~c
#define  BUFSIZE  100
char  buf[BUFSIZE]; /* buffer per ungetch */
int  bufp=0; /* prossima posizione libera in buf[] */

int  getch(void) /* preleva un carattere (che potrebbe essere stato rifiutato in precedenza) */
{
    return (bufp > 0) ? buf[--bufp] : getchar();
}
void  ungetch(int c) /* rimette un carattere nell’input */
{
    if (bufp>=BUFSIZE)
        printf(“ungetch: troppi caratteri\n”);
    else
        buf[bufp++]=c;
}
~~~

How they work together is simple. <code>ungetch</code>  puts the pushed-back characters into a shared buffer-a character array. <code>getch</code>  reads from the buffer if there is anything there, and calls <code>getch</code>  if the buffer is empty. There must also be an index variable that records the Position of the current character in the buffer.

Since the buffer and the index are shared by getch and ungetch and must retain their values between calls, they must be external to both routines.

## Scope Rules

The scope of a name is the part of the program within which the name can be used. For an automatic variable declared at the beginning of a function, the scope is the function in which the name is declared. Local variables of the same name in different functions are unrelated. The same is true of the parameters of the function, which are in effect local variables.
The scope of an external variable or a function lasts from the point at which it is declared to the end of the file being compiled.

On the other hand, if an external variable is to be referred to before it is defined, or if it is defined in a different source file from the one where it is being used, then an extern declaration is mandatory.

It is important to distinguish between the _declaration_ of an external variable and its _definition_. A _declaration_ announces the properties of a variable (primarily its type); a _definition_ also causes storage to be set aside.

If the lines
~~~c
int sp;
double val[MAXVAL];
~~~

appear outside of any function, they define the external variables sp and val.

cause storage to be set aside, and also serve as the declaration for the rest of that source file. On the other hand, the lines
~~~c
extern int sp; 
extern double vall];
~~~
declare for the rest of the source file that _sp_ is an int and that _val_ is a double array (whose size is determined elsewhere), but they do not create the variables or reserve storage for them.

There must be only one definition of an external variable among all the files that make up the source program; other files may contain extern declarations to access it.

## Header Files
Let us now consider dividing the calculator program into several source files, as it might be if each of the components were substantially bigger. 

* <code>main.c</code>: main function
* <code>stack.c</code>: push, pop and their variables 
* <code>getop.c</code>: getop function
* <code>getch.c</code>:  getch and ungetch


Definitions and declarations should be centralize, so that there is only one copy to get right and keep right as the program evolves. Accordingly, we will place this common material in a header file, <code>calc.h</code>, which will be included as necessary.

~~~c
 // calc.h:
#define NUMBER '0' 
void push(double); 
double pop(void); 
int getop(char []); 
int getch(void); 
void ungetch(int);
~~~

So the main program becomes

~~~c
#include <stdio.h> 
#include <stdlib.h> 
#include "calc.h" 
#define MAXOP 100 
main() {
    ...;
}
~~~

and the other similarly will include the calc.h (**notice the ""**)

~~~c
#include  <stdio.h>
#include  <ctype.h>
#include  “calc.h”
getop()
{ .... }
~~~

Not all the files needs to include the calc.h, which one can avoid this?

Larger programs, more organization and more headers would be needed.

## Static variables

The static declaration, applied to an external variable or function, limits the scope of that object to the rest of the source file being compiled.

Static storage is specified by prefixing the normal declaration with the word _static_. For example:

~~~c
#define  BUFSIZE  100
static char buf[BUFSIZE]; /* buffer per ungetch */
static int bufp=0;        /* prossima posizione in buf */

int  getch(void)          /* preleva un carattere (che potrebbe essere stato 
                             rifiutato in precedenza) */
{
    return (bufp > 0) ? buf[--bufp] : getchar();
}

void  ungetch(int c) /* rimette un carattere nell’input */
{
    if (bufp>=BUFSIZE)
        printf(“ungetch: troppi caratteri\n”);
    else
        buf[bufp++]=c;
}
~~~

or also

~~~c
#define  MAXVAL  100
static int sp=0;            /* prossima posizione libera */
static double val[MAXVAL];  /* stack dei valori */

/* push: inserisce f in cima allo stack */
void  push(double f)
{
    if (sp<MAXVAL)
        val[sp++]=f;
    else
        printf(“Errore: stack pieno; %g non inseribile\n”, f);
}


/* pop: preleva e ritorna il valore in cima allo stack */ 
double pop(void)
{
    if (sp>0)
        return val[sp—];
    else {
        printf(“Errore: lo stack è vuoto\n”);
        return 0.0; 
    }
}
~~~

If a function is declared _static_,  its name is invisible outside of the file in which it is declared.
 
The _static_ declaration can also be applied to internal variables.  Internal _static_ variables provide private, permanent storage within a single function.

## Register Variables


A _register_ declaration advises the compiler that the variable in question will be heavily used. The idea is that _register_ variables are to be placed in machine registers, which may result in smaller and faster programs.

~~~c
register int x;
register char c;
~~~

The _register_ declaration can only be applied to automatic variables and to the formal parameters of a function.

~~~c
f(register unsigned m, register long n)
{
    register int i;
    .... 
}
~~~

## Block Structure

Variables can be defined in a block-structured fashion within a func- tion. Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function.

~~~c
if (n>0) {
    int i; /* dichiara una nuova variabile i */
    for (i=0; i<n; i++)
        ....; 
}
~~~

this _i_ is unrelated to any i outside the block.

This is also legal and used:

~~~c
int x; 
int y;

f(double x) {
    double y;
    .... 
}
~~~

## Initialization

In the absence of explicit initialization, external and static variables are guaranteed to be initialized to zero; automatic and register variables have undefined (i.e., garbage) initial values.

Scalar variables may be initialized when they are defined, by following the name with an equals sign and an expression:

~~~c
int x=1;
char squote=’\’;
long day=1000L*60L*60L*24L;
~~~

For automatic and register variables, initialization is done each time the function or block is entered.

For automatic and register variables, the initializer is not restricted to being a constant: it may be any expression involving previously defined values, even function calls.

~~~C
int  binsearch(int x, int v[], int n)
      {
          int low=0;
          int high=n-1;
          int mid;
          ....
}
~~~

An array may be initialized by following its declaration with a list of initial- izers enclosed in braces and separated by commas.

~~~c

int days[]={31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
char pattern[]=”ould”;
~~~

## Recursion

C functions may be used recursively; that is, a function may call itself either directly or indirectly. 

When a function calls itself recursively, each invocation gets a fresh set of all the automatic variables, independent of the previous set.

A sorting algorithm example: given an array, one element is chosen and the others are partitioned into two subsets-those less than the partition element and those greater than or equal to it.


In [32]:
#include <stdio.h>
void qsort1(int v[], int left, int right);

int main(){
  int v[] = {2,45,65,7,18,95,4,33};
  int l = 0;
  int r= 7;
  qsort1(v,l,r);
  for (int i=0; i<8; i++)
      printf("%d ", v[i]);
}

/* quicksort: ordina v[left]...v[right] in ordine crescente */ 
void qsort1(int v[], int left, int right)
{
    int i, last;
    void  swap(int v[], int i, int j);
    if (left>=right)   /* se il vettore contiene meno di */ 
        return;        /* due elementi, non fa niente */
    /* sposta l’elemento discriminante in v[left] */
    swap(v, left, (left+right)/2);
    last=left;
    for (i=left+1; i<=right; i++) /* suddivide */
        if (v[i]<v[left])
            swap(v, ++last, i);
/* ripristina l’elemento discriminante */ 
    swap(v, left, last);
    qsort1(v, left, last-1);
    qsort1(v, last+1, right);
}

/* swap: scambia v[i] con v[j] */
void  swap(int v[], int i, int j)
{
    int temp;
    temp=v[i];
    v[i]=v[j];
    v[j]=temp;
}

2 4 7 18 33 45 65 95 

Recursion may provide no saving in storage, since somewhere a stack of the values being processed must be maintained. Nor will it be faster.

## The C Preprocessor

C provides certain language facilities by means of a preprocessor, which is conceptually a separate first step in compilation.


File _inclusion_ makes it easy to handle collections of #def ines and declara- tions (among other things). Any source line of the form

~~~C
#include "filename"  /*local file*/
~~~
or
~~~c
#include <filename> /* search in Includ dirs */
~~~

_include_ is the preferred way to tie the declarations together for a large
program

A _macro_ is  defined in the form
~~~c
#define name replacement text
~~~

It calls for a macro substitution of the simplest kind-subsequent occurrences of the token name will be replaced by the replacement text.

The scope of a name defined with #define is from its point of definition to the end of the source file being compiled. A definition may use previous definitions.

Examples of define:

~~~c
#define forever for(;;) /* ciclo infinito */
~~~

or 

~~~c
#define  max(A, B)  ((A)>(B)?(A):(B))
~~~

Although it looks like a function call, a use of max expands into in-line code, so the line

~~~c
x=max(p+q, r+s);
~~~
is substituted buy the compipler into:

~~~c
x=((p+q)>(r+s)?(p+q):(r+s));
~~~

If you examine the expansion of max, you will notice some pitfalls. The expressions are evaluated twice; this is bad if they involve side effects like incre- ment operators or input and output. For instance,

~~~c
max(i++,j++) /*WRONG*/
~~~

also

~~~c
#define square(x) x*x /*WRONG*/
~~~

if it is invoked as square (z+1).

Names may be undefined with <code>#undef</code>, usually to ensure that a routine is really a function, not a macro:

~~~c
#undef getchar
int getchar (void) { ...}
~~~

If a parameter name is preceded by a _#_ in the replacement text, the combination will be expanded into a quoted string with the parameter replaced by the actual argument.
For example:

~~~c
#define dprint(expr) printf(#expr “ = %g\n”, expr)
~~~

when invoked

~~~c
dprint(x/y);
~~~

it is expanded into

~~~c
printf(“x/y” “ = %g\n”, x/y);
~~~

and the strings are concatenated, so the effect is 

~~~c
printf("x/y = %g\n", x/y);
~~~

The preprocessor operator _##_ provides a way to concatenate actual arguments during macro expansion.

~~~c
#define paste(front, back) front ## back
~~~

so paste(name,1) creates the token _name1_.

**Exercise4-14.** Define a macro swap(t,x,y) that interchanges two arguments
of type t.(Block structure will help).

It is possible to control preprocessing itself with conditional statements that are evaluated during preprocessing. This provides a way to include code selectively, depending on the value of conditions evaluated during compilation.

The _#if_ line evaluates a constant integere xpression (which may not include sizeof, casts, or enum constants). If the expression is non-zero, subsequent lines until an _#endif_ or _#elif_ or _#else_ are included. 

For example, to make sure that the contents of a file _hdr.h_ are included only once

~~~C
#if !defined(HDR) 
#define HDR
/* contents of hdr.h go here */ 

#endif
~~~~

or better using the _#ifndef_ expression

~~~c
#ifndef  HDR
#define  HDR
      /* contenuto dell’header hdr.h */
#endif
~~~

Another example
~~~c
#if SYSTEM==SYSV
    #define  HDR  “sysv.h”
#elif SYSTEM==BSD
    #define  HDR  “bsd.h”
#elif SYSTEM==MSDOS
    #define  HDR  “msdos.h”
#else
    #define  HDR  “default.h”
#endif

#include  HDR
~~~