# Lecture 3 : Introduction to C Programming

## Here are some major reasons for learning C (and why we use C in CMDA 3634):

* ### C code runs much faster than code written in high level languages such as Python and R.

* ### C code runs faster than Java code although the speedups are not as extreme as when comparing C to Python.

* ### Java and C share a large amount of syntax (in fact Java is a derivative of C).

* ### It is good to start with C if you want to learn more advanced programming languages such as C++.

* ### Python and C can be combined to get the performance of C along with the high level programming of Python.  

* ### Supercomputers are usually programmed in C, C++, or Fortran (with extensions to handle parallel execution).

* ### Most popular parallel computing libraries such as OpenMP, MPI, and CUDA work best with C or C++.

# Part 1 : Our First C Program

## We start by creating a C program to print *Hello World!*.

In [1]:
%%writefile hello.c
#include <stdio.h>

int main () {
    printf ("Hello World!\n");

    /* program completed successfully */
    // this return statement is optional in C99
    return 0;
}

Writing hello.c


## Notes:
* ### Line 1 is the magic command that creates the source file *hello.c*.  This line is only needed if you want to write C source code within a Jupyter notebook.  
* ### Line 2 instructs the C preprocessor to include the **header file** that includes the interfaces to the standard input/output functions such as *printf*.  
* ### Lines 4-10 are the main function that is run when the compiled program is executed.  Every C program **must** have a main function.  
* ### Line 5 prints the message using the *printf* function which stands for **print formatted**.  The \n at the end of the "Hello World!\n" **string literal** (a sequence of characters or escape sequences enclosed in double quotation mark symbols) stands for **new line**.
* ### Lines 7-8 are C comments.  The C++/Java style comment syntax used in line 8 is valid in C99 and later.
* ### Line 9 returns 0 which indicates that the program completed successfully.  We can use, for example, *return 1*, to indicate that the program encountered an error and did not complete successfully.  In C99 and later, this statement is optional.  If it is not included at the end of the main function of C99 and later, the return value will automatically be set to 0.  
* ### **Going forward we will omit the *return 0* statement at the end of main functions for brevity.**

## The file *hello.c* is called a C **source file**.  It must be compiled into a C **program** using a C **compiler**.

In [2]:
!gcc -o hello hello.c

## Notes:

* ### We use the **gcc** (GNU compiler collections) compiler.  

* ### Linux commands such as *gcc* given inside of a Jupyter notebook have to be preceded by the ! symbol.

* ### The *-o hello* part of the compilation command names the program.  If this part is not included the program will be called *a.out*


## Finally, we can run the *hello* program created by the compiler from the C source code hello.c using the command *!./hello*

In [3]:
!./hello

Hello World!


# Part 2 : Determining if a number is prime in C


## One way to determine if an integer $n$ is prime is to check all integers $d$ between 2 and $\sqrt{n}$ to see if any are a factor of n.

## Why do we not have to look for factors $d$ larger than $\sqrt{n}$?

## Fortunately, we can implement this algorithm without explicity calculating $\sqrt{n}$.



## Here is our first attempt at a primality test in C.

In [4]:
%%writefile prime_v1.c
#include <stdio.h>

int main () {
    int n = 1234567;
    for (int d = 2; d*d <= n; d++) {
        if (n % d == 0) {
            printf ("The number %d is not prime since %d divides it.\n",n,d);
            return 0;
        }
    }
    printf ("The number %d is prime.\n",n);
}

Writing prime_v1.c


## Notes:

* ### Lines 6-11 contain a C for loop.  The variable *d* is called a loop counter.  For loops in C have the same syntax and behavior as for loops in Java.  Note that by ending the loop when $d^2 > n$ we avoid computing $\sqrt{n}$.
* ### Lines 7-10 contain a C if statement.  If statements have the same syntax and behavior as if statements in Java.  
* ### In line 7 we check to see if d divides n by using the mod operator.  
* ### **Note that we use == to check for equality rather than =**
* ### Line 8 uses the *printf* function to print that n is not prime.  Note that *%d* is the C format specifier for **int**.  Also note that we can use printf with multiple format specifiers and arguments.
* ### In line 9 we use *return 0* to exit the main function with a successful termination.  
* ### If we make it to line 12, then $n$ is prime.

In [5]:
!gcc -o prime_v1 prime_v1.c

In [6]:
!./prime_v1

The number 1234567 is not prime since 127 divides it.


## Exercise 1

* ### Recompile and run the program with $n=161218349$.  What do you observe?

* ### It is known that the number $n=5261656080911617$ is prime.  Recompile and run the program with this value of $n$.  What do you observe?

# Part 3 : Handling Large Integers

## A C int (and Java int) has 32 bits of storage.

* ### One of the 32 bits is a sign bit.
* ### A C int has a range of $-2^{31}$ to $2^{31}-1$ or $-2147483648$ to $2147483647$.

## A C long long (and Java long) has 64 bits of storage.  

* ### One of the 64 bits is a sign bit.  
* ### A C long long has a range of $-2^{63}$ to $2^{63}-1$ or $-9223372036854775808$ to $9223372036854775807$.

## Here is a modification of our primality tester that handles larger $n$.

In [7]:
%%writefile prime_v2.c
#include <stdio.h>

int main () {
    long long n = 5261656080911617;
    for (long long d = 2; d*d <= n; d++) {
        if (n % d == 0) {
            printf ("The number %lld is not prime since %lld divides it.\n",n,d);
            return 0;
        }
    }
    printf ("The number %lld is prime.\n",n);
}

Writing prime_v2.c


## Notes:
* ### On lines 8 and 12 we use the format specifier *%lld* for variables of type *long long*.
* ### In Java's version of printf we use %d for variables of type *int* and variables of type *long*.

In [8]:
!gcc -o prime_v2 prime_v2.c

In [9]:
!./prime_v2

The number 5261656080911617 is prime.


# Part 4 : Command Line Arguments

## Command line arguments allow us to to alter the behavior of our program at runtime.  

## Here is a C program that prints out its command line arguments (one per line).  

In [10]:
%%writefile args.c
#include <stdio.h>

int main (int argc, char** argv) {
    for (int i=0;i<argc;i++) {
        printf ("%s\n",argv[i]);
    }
}

Writing args.c


## Notes:
* ### Line 4 includes the optional arguments *argc* and *argv*.  The variable *argc* tells us the number of command line arguments and *argv* is an array of pointers to the command line arguments.  We will discuss arrays and pointers in detail later.
* ### One key difference between arrays in C versus Java is that arrays in C are not objects and do not know how long they are.  
* ### Thus when using arrays in C we almost always need a separate variable that contains the array length.  
* ### Line 5-7 contain a C for loop.  Note that C (like Java and Python) is a zero-based language which is why we start the loop counter at 0 and go up to argc-1.  
* ### Line 6 uses the *printf* function to print one command line argument per line.  Note that *%s* is the C format specifier for **string**.  Unlike Java, C does not have a built in String datatype.  In C, strings are null-terminated arrays of characters.  We will discuss strings in detail later.


In [11]:
!gcc -o args args.c
!./args abc 123 hello world!

./args
abc
123
hello
world!


## Note that argv[0] is just the name of the C command *./args*.
## Thus the actual command line arguments are *argv[1]*, *argv[2]*, etc.
## In Java the actual command line arguments are *args[0]*, *args[1]*, etc.

## Next let's look at a C program to print a personalized Hello message.

In [12]:
%%writefile greet_v1.c
#include <stdio.h>

int main (int argc, char** argv) {
    printf ("Hello %s!  How are you?\n",argv[1]);
}

Writing greet_v1.c


In [13]:
!gcc -o greet_v1 greet_v1.c
!./greet_v1 Jason

Hello Jason!  How are you?


In [14]:
!./greet_v1

Hello (null)!  How are you?


## Note that running the command without a command line argument gives a strange result.  In particular, we went off the end of the *argv* array and no runtime error was given!  

## This example illustrates that C does **not** do arrays bounds checking.  

## **Reading or writing past the end (or beginning) of an array in C will not produce a runtime error but will likey produce unexpected results.**

## It is important to provide error checking in your code where reading/writing past the end of an array is possible.  One simple way of handling an error is to *return 1* from main which will terminate the program with an abnormal execution status.  

## Here is a version of the code with error checking.  Note that if an error is encountered we provide instructions on how to correctly use the command and abnormally terminate the program.

In [15]:
%%writefile greet_v2.c
#include <stdio.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s name\n",argv[0]);
        return 1; // abnormal exit
    }
    printf ("Hello %s!  How are you?",argv[1]);
}

Writing greet_v2.c


In [16]:
!gcc -o greet_v2 greet_v2.c
!./greet_v2 Jason

Hello Jason!  How are you?

In [17]:
!./greet_v2

command usage: ./greet_v2 name


## If no command line arguments are provided, the program terminates with instructions on how to use the program rather than attempt to print a greeting.

In [18]:
!./greet_v2 Jason Wilson

Hello Jason!  How are you?

## Extra command line arguments are ignored by our program.

# Part 5 : Primality Test Revisited

## Let's revise our primality test to specifiy $n$ using a command line argument.

### We also add an efficiency improvement by first checking to see if the given number is divisible by 2.

In [19]:
%%writefile prime_v3.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n\n",argv[0]);
        return 1; // abnormal exit
    }
    long long n = atoll(argv[1]);
    if (n % 2 == 0) {
        printf ("The number %lld is not prime since 2 divides it\n",n);
        return 0;
    }
    for (long long d = 3; d*d <= n; d+=2) {
        if (n % d == 0) {
            printf ("The number %lld is not prime since %lld divides it.\n",n,d);
            return 0;
        }
    }
    printf ("The number %lld is prime.\n",n);
}

Writing prime_v3.c


## Notes:

* ### On line 3 we include *stdlib.h* which includes interfaces for the C standard library including the function *atoll* that we are using on line 10.
* ### On line 10 we use the function *atoll* to convert the first command line argument string into a C *long long*.  Other useful conversion functions are *atoi* which converts a string into a C *int* and *atof* which converts a string into a C *double* (like in Java, a C *double* is a 64-bit double precision floating point number).

In [20]:
!gcc -o prime_v3 prime_v3.c

In [21]:
!./prime_v3 5261656080911617

The number 5261656080911617 is prime.


In [22]:
!./prime_v3 3439315899953761

The number 3439315899953761 is not prime since 58645681 divides it.


In [23]:
!./prime_v3 729476671297368179

The number 729476671297368179 is prime.


## It is known that 10918483718784063109 is prime.  

## Let's run our primality tester with this very large input.

In [24]:
!./prime_v3 10918483718784063109

The number 9223372036854775807 is not prime since 7 divides it.


## Our program gave the wrong answer because the number $10918483718784063109$ is too big to store in a variable of type C *long long*.
## Note that unlike in Java, C does not produce a runtime error in this case!
## We can add runtime error checking by using the function *strtoll* instead of *atoll* instead.

In [25]:
%%writefile prime_v4.c
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n\n",argv[0]);
        return 1; // abnormal exit
    }
    errno = 0; // reset error status before calling strtoll
    long long n = strtoll(argv[1],NULL,10);
    if (errno == ERANGE) {
        printf ("error: the input provided is out of range!\n");
        return 1; // abnormal exit
    }

    if (n % 2 == 0) {
        printf ("The number %lld is not prime since 2 divides it\n",n);
        return 0;
    }
    for (long long d = 3; d*d <= n; d+=2) {
        if (n % d == 0) {
            printf ("The number %lld is not prime since %lld divides it.\n",n,d);
            return 0;
        }
    }
    printf ("The number %lld is prime.\n",n);
}

Writing prime_v4.c


In [26]:
!gcc -o prime_v4 prime_v4.c

In [27]:
!./prime_v4 10918483718784063109

error: the input provided is out of range!


# Part 6 : Working with Real Numbers in C

## Let's start with a program that computes the average of the command line arguments which are assumed to be integers.

In [28]:
%%writefile average_v1.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n_1 n_2 ... n_k\n",argv[0]);
        return 1; // abnormal exit
    }
    int sum = 0;
    for (int i=1;i<argc;i++) {
        int next = atoi(argv[i]);
        sum += next;
    }
    printf ("average = %d\n",sum/(argc-1));
}

Writing average_v1.c


In [29]:
!gcc -o average_v1 average_v1.c

In [30]:
!./average_v1

command usage: ./average_v1 n_1 n_2 ... n_k


In [31]:
!./average_v1 1 2 3

average = 2


## Note that the average of the numbers $1$, $2$, $3$ is indeed $2$.

In [32]:
!./average_v1 1 2 3 4

average = 2


## In this case the average should be $10/4 = 2.5$ but the program prints $2$ since we performing integer division and printing the average as an integer.

## Since the average of a set of numbers is typically a real number, we need to use a floating point type such as a float or double to compute the average.

In [33]:
%%writefile average_v2.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n_1 n_2 ... n_k\n",argv[0]);
        return 1; // abnormal exit
    }
    int sum = 0;
    for (int i=1;i<argc;i++) {
        int next = atoi(argv[i]);
        sum += next;
    }
    float average = sum/(argc-1);
    printf ("average = %f\n",average);
}

Writing average_v2.c


## Note that on line 16 we use *%f* for the format specifier for printing a variable of float type.

In [34]:
!gcc -o average_v2 average_v2.c

In [35]:
!./average_v2 1 2 3 4

average = 2.000000


## We see that this version still computes the incorrect average.  

## Note that in line 15 we are computing the quantity sum/(argc-1) and assigning it to average.  

## However, since both variables sum and argc have *int* type, the division performed is integer division.

## The next version corrects this problem using a *typecast*.


In [36]:
%%writefile average_v3.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n_1 n_2 ... n_k\n",argv[0]);
        return 1; // abnormal exit
    }
    int sum = 0;
    for (int i=1;i<argc;i++) {
        int next = atoi(argv[i]);
        sum += next;
    }
    float average = (float)sum/(argc-1);
    printf ("average = %f\n",average);
}

Writing average_v3.c


## In line 15 we first cast the integer sum to a float before performing the division.

## Since the numerator is a float, the integer denominator is automatically converted to a float and floating point division is performed.


In [37]:
!gcc -o average_v3 average_v3.c

In [38]:
!./average_v3 1 2 3 4

average = 2.500000


## Version 3 gives us the correct answer!

## In our final version, we drop the assumption that the command line inputs are integers and change the format of our printf statement.

In [39]:
%%writefile average_v4.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n_1 n_2 ... n_k\n",argv[0]);
        return 1; // abnormal exit
    }
    float sum = 0;
    for (int i=1;i<argc;i++) {
        float next = atof(argv[i]);
        sum += next;
    }
    float average = sum/(argc-1);
    printf ("average = %.2f\n",average);
}

Writing average_v4.c


## Here are some notes on version 4.

* ### In line 10 we change sum to have type float.
* ### In line 12 we change next to have type float and use *atof* to parse the command line argument as a float.
* ### On line 15 we discard the typecast which is no longer needed since sum is a float.
* ### On line 16 we use the format specifier *%.2f* which instructs printf to print only 2 digits after the decimal point.

In [40]:
!gcc -o average_v4 average_v4.c

In [41]:
!./average_v4 1 0.75 0.25

average = 0.67


## The exact average in this case is $2/3$ which was rounded to two decimal places in our output.


# Part 7 : Float or Double?

## A C float has 32 bits and a C double has 64 bits (same as Java).

## To see the difference in accuracy between a float and a double, let's approximate the value of:

$$\large{e = 2.718281828459045}$$

## We can approximate e using the Taylor series formula:

$$\large{e \approx 1 + \frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \cdots}$$

## Version 1 adds up the first n terms of the above formula and accumulates the result in a *float*.  

In [42]:
%%writefile approx_v1.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n\n",argv[0]);
        return 1;
    }
    int n = atoi(argv[1]);
    float approx_e = 0;
    long long fact = 1;
    for (int i=1;i<=n;i++) {
        approx_e += 1.0/fact; // use 1.0 to ensure floating point division
        fact *= i;
    }
    printf ("exact  value of e is %.15f\n",2.718281828459045);
    printf ("approx value of e is %.15f\n",approx_e);
}

Writing approx_v1.c


In [43]:
!gcc -o approx_v1 approx_v1.c

In [44]:
!./approx_v1 10

exact  value of e is 2.718281828459045
approx value of e is 2.718281745910645


## With 10 terms we estimated the value of $e$ correctly to 6 decimal digits.  

In [45]:
!./approx_v1 20

exact  value of e is 2.718281828459045
approx value of e is 2.718281984329224


## With 20 terms we still only estimated the value of $e$ correctly to 6 decimal digits.  

## Version 2 adds up the first n terms of the above formula and accumulates the result in a *double*.

In [46]:
%%writefile approx_v2.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s n\n",argv[0]);
        return 1;
    }
    int n = atoi(argv[1]);
    double approx_e = 0;
    long long fact = 1;
    for (int i=1;i<=n;i++) {
        approx_e += 1.0/fact; // use 1.0 to ensure floating point division
        fact *= i;
    }
    printf ("exact  value of e is %.15f\n",2.718281828459045);
    printf ("approx value of e is %.15f\n",approx_e);
}

Writing approx_v2.c


In [47]:
!gcc -o approx_v2 approx_v2.c

In [48]:
!./approx_v2 10

exact  value of e is 2.718281828459045
approx value of e is 2.718281525573192


## With 10 terms we estimated the value of $e$ correctly to 6 decimal digits.

In [49]:
!./approx_v2 20

exact  value of e is 2.718281828459045
approx value of e is 2.718281828459046


## With 20 terms we estimated the value of $e$ correctly to 14 decimal digits.

## The decision to use float versus double is a tradeoff between accuracy and storage/performance.

## In certain machine learning applications, high accuracy is not needed so it is common to use *float* instead of *double* (or even floating point types that use fewer than 32 bits)!  This is especially true for large scale ML algorithms implemented using GPUs where high performance is essential.