# Lecture 6 : Arrays

# Part 1 : Introduction to Arrays

## Here is an example illustrating basic C array allocation, initialization, and usage.

In [1]:
%%writefile arrays.c
#include <stdio.h>

int main () {
    int a[2] = { 1, 2 };
    int b[] = { 1, 2, 3 }; // size of array inferred
    int c[4] = { 1, 2 }; // missing elements initialized to 0
    int d[5] = { 0 }; // initialize all elements to 0
    printf ("a = [ %d %d ]\n\n",a[0],a[1]);
    printf ("b = [ %d %d %d ]\n\n",b[0],b[1],b[2]);
    printf ("c = [ %d %d %d %d ]\n\n",c[0],c[1],c[2],c[3]);
    printf ("d = [ %d %d %d %d %d ]\n\n",d[0],d[1],d[2],d[3],d[4]);
}

Overwriting arrays.c


In [2]:
!gcc -o arrays arrays.c

In [3]:
!./arrays

a = [ 1 2 ]

b = [ 1 2 3 ]

c = [ 1 2 0 0 ]

d = [ 0 0 0 0 0 ]



## Here is an example illustrating that C arrays are not automatically initialized.

## If you do not initialize an array it will contain whatever values that memory has in it.  This could depend on what code was run before you access the unitialized array!

## Note that this example also includes C code for generating random integers.

In [4]:
%%writefile random.c
#include <stdio.h>
#include <stdlib.h> // for srandom and random functions
#include <time.h> // for time function

void random3 () {
    long long x = random();
    long long y = random();
    long long z = random();
    long long sum = x + y + z;
    printf ("here is the sum of three random integers : %lld\n\n",sum);
}

void fun_array () {
    int a[3]; // warning: arrays in C are not automatically initialized!
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);
}

int main () {
    srandom(time(NULL)); // seed random number generator
    fun_array();
    random3();
    fun_array();
}

Overwriting random.c


In [5]:
!gcc -o random random.c

In [6]:
!./random

a = [ 0 0 0 ]

here is the sum of three random integers : 1684863820

a = [ 0 309859899 0 ]



## Exercise : Change the fun_array function so that d is initialized to contain all zeros.

## Allocating arrays on the stack as in the above example is only good practice for small arrays.  

## Note that starting with C99 the size of an array can be determined at runtime.  

## However as shown below, using this new C feature can be quite risky!

In [7]:
%%writefile bigarray.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {
    if (argc < 2) {
        printf ("command usage: %s %s\n",argv[0],"size");
        return 1;
    }
    int size = atoi(argv[1]);
    int A[size]; // starting in C99 array size can be a variable
    printf ("A uses %ld bytes of storage\n",sizeof(A));
    for (int i=0;i<size;i++) {
        A[i] = i+1;
    }
    printf ("last element of A is %d\n",A[size-1]);
}

Writing bigarray.c


In [8]:
!gcc -o bigarray bigarray.c

## Note that we can redirect the output to a file.

In [9]:
!./bigarray 1000000 > out.txt
!cat out.txt

A uses 4000000 bytes of storage
last element of A is 1000000


 ## Thus storing an array of 1 million ints will require 4 megabytes or 4 million bytes (each int requires 4 bytes of storage).

In [10]:
!./bigarray 2500000 > out.txt
!cat out.txt

/bin/bash: line 1: 16249 Segmentation fault      (core dumped) ./bigarray 2500000 > out.txt


## **A segmentation fault in C is a runtime error that occurs when a program tries to access memory that it is not allowed to access.**

## Use *ulimit -s* to check the stack size.  The result is in kilobytes (one kilobyte is a thousand bytes).  

In [11]:
!ulimit -s

8192


## Exercise : Given the stack size (in kilobytes) given above, explain why we could allocate and use an array of 1 million ints on the stack but not an array of 2.5 million ints on the stack.  

## **For large arrays we will need to use dynamic memory allocation.**  

## Memory allocated dynamically is put on the heap which is generally much larger than the stack.  

## We will discuss dynamic memory allocation later in the course.

# Part 2 : Arrays and Pointers

## Here is an example illustrating how pointers can be used with arrays.  

In [12]:
%%writefile arrayptrs.c
#include <stdio.h>

int main () {
    int a[3] = { 1, 2, 3 };
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

    int* b = a; // b is an integer pointer that points to the beginning of a
    *b = 4;
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

    b[1] = 5; // we can also use the "array syntax" for the pointer b!
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

    int* c = b+2; // using "pointer arithmetic" -> int* c = &(b[2])
    *c = 6;
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

}

Writing arrayptrs.c


In [13]:
!gcc -o arrayptrs arrayptrs.c

In [14]:
!./arrayptrs

a = [ 1 2 3 ]

a = [ 4 2 3 ]

a = [ 4 5 3 ]

a = [ 4 5 6 ]



## Here is an example illustrating how arrays are passed to functions in C.

## One big thing to remember is that **arrays are passed by pointer in C** for efficiency reasons (i.e. it would be very expensive to always have to copy large arrays when calling a function)

In [15]:
%%writefile arrayfun.c
#include <stdio.h>

// arrays in C are always passed by pointer
void swap (int* b) {
    int temp = b[0];
    b[0] = b[1];
    b[1] = temp;
}

int main () {
    int a[2] = { 1, 2 };
    printf ("a = [ %d %d ]\n\n",a[0],a[1]);

    swap(a); // pass a pointer to the beginning of a (i.e. &(a[0]))
    printf ("a = [ %d %d ]\n\n",a[0],a[1]);
}

Writing arrayfun.c


In [16]:
!gcc -o arrayfun arrayfun.c

In [17]:
!./arrayfun

a = [ 1 2 ]

a = [ 2 1 ]



## The last examples illustrates that arrays and pointers are quite similar in C.

## However there some big differences.  

## An array declaration such as **int a[3]** also sets aside memory to store 3 integers.  

## A pointer declation such as **int* b** only sets aside memory to store the pointer b.  

## A pointer must be set to point to a variable (or array) before it can be dereferenced.  

## Dereferencing an uninitialized pointer usually causes a **segmentation fault** since the pointer is likely pointing to memory that is illegal to access.  

## In particular, dereferencing a **null pointer** always causes a segmentation fault.  

In [18]:
%%writefile danger.c
#include <stdio.h>

int main () {
    int* a;
    a[0] = 3; // dereferencing an uninitialized pointer!
    printf ("a[0] = %d\n",a[0]);
}

Writing danger.c


In [19]:
!gcc -o danger danger.c

In [20]:
!./danger > out.txt

/bin/bash: line 1: 16278 Segmentation fault      (core dumped) ./danger > out.txt


# Part 3 : Sample Standard Deviation

## Here is a C program that computes the average and sample standard deviation of real number points in *stdin*.  For the standard deviation we use the formula:

$$\sigma = \sqrt{ \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2}$$

## where

$$\bar{x} = \displaystyle\frac{1}{N} \displaystyle\sum_{i=1}^N x_i$$

## In this case we will have to store the data in an array since we need to make two passes over it.

## Note that we set a maximum number of points to prevent a potential stack overflow!

In [21]:
%%writefile stdev.c
#include <stdio.h>
#include <math.h>

#define MAX_POINTS 100000

int main () {
    float data[MAX_POINTS];
    float next;
    int num_points = 0;
    while (scanf("%f",&next) == 1) {
        if (num_points < MAX_POINTS) {
            data[num_points++] = next; // increment num_points after reading it
        } else {
            printf ("too many data points!\n");
            return 1;
        }
    }
    float sum = 0;
    for (int i=0;i<num_points;i++) {
        sum += data[i];
    }
    float mean = sum/num_points;
    printf ("mean = %.2f\n",mean);
    float sum_sqs = 0;
    for (int i=0;i<num_points;i++) {
        sum_sqs += (data[i]-mean)*(data[i]-mean);
    }
    float var = sum_sqs/(num_points-1);
    printf ("standard deviation = %.2f\n",sqrt(var));
}

Overwriting stdev.c


In [22]:
!gcc -o stdev stdev.c -lm

In [23]:
!echo 86.5 81.0 92.5 86.5 74.5 57.5 76.5 94.5 66.5 98.5 23.5 47.5 74.5 77.5 88.0 | ./stdev

mean = 75.03
standard deviation = 19.83


# Part 4 : Bubble Sort

## Here is a C program that bubble sorts the input data.

In [24]:
%%writefile bubble.c
#include <stdio.h>
#include <stdlib.h>

#define MAX_POINTS 100000

// bubble sort (returns the number of rounds needed)
int bubble (int* data, int num_points) {
    int done = 0;
    int round = 0;
    while (!done) {
        round += 1;
        done = 1;
        for (int i=0;i<num_points-round;i++) {
            if (data[i] > data[i+1]) {
                int temp = data[i];
                data[i] = data[i+1];
                data[i+1] = temp;
                done = 0;
            }
        }
    }
    return round;
}

int main () {
    int data[MAX_POINTS];
    int next;
    int num_points = 0;
    while (scanf("%d",&next) == 1) {
        if (num_points < MAX_POINTS) {
            data[num_points++] = next;
        } else {
            printf ("too many data points!\n");
            return 1;
        }
    }
    int num_rounds = bubble(data,num_points);
    printf ("num_rounds of bubble sort: %d\n",num_rounds);
    for (int i=0;i<num_points;i++) {
        printf ("%d\n",data[i]);
    }
}

Overwriting bubble.c


In [25]:
!gcc -o bubble bubble.c

In [26]:
!echo 1 2 3 4 | ./bubble

num_rounds of bubble sort: 1
1
2
3
4


In [27]:
!echo 4 3 2 1 | ./bubble

num_rounds of bubble sort: 4
1
2
3
4


In [28]:
!echo -45 65 -121 32 -456 34 -213423 3434 4533 -343 | ./bubble

num_rounds of bubble sort: 8
-213423
-456
-343
-121
-45
32
34
65
3434
4533


In [29]:
# clone cmda3634_materials repo to download a dataset
!git clone https://code.vt.edu/jasonwil/cmda3634_materials.git

fatal: destination path 'cmda3634_materials' already exists and is not an empty directory.


In [30]:
# copy the 100000 point dataset to our working directory
!cp cmda3634_materials/L06/* .

## Compile with optimization flags turned on in quiet mode.

In [31]:
!gcc -O3 -march=native -o bubble bubble.c

In [32]:
!time cat num100k.txt | ./bubble > bubble100k.txt


real	0m32.474s
user	0m31.917s
sys	0m0.037s


In [33]:
!head -5 bubble100k.txt

num_rounds of bubble sort: 99850
-999983249
-999965924
-999942764
-999934988


## We use the linux command sed (stream editor) to delete the first line of the bubble sorted file.

In [34]:
!sed '1d' bubble100k.txt > sorted100k.txt

In [35]:
!time cat sorted100k.txt | ./bubble > bubble100k.txt


real	0m0.046s
user	0m0.036s
sys	0m0.008s


In [36]:
!head -5 bubble100k.txt

num_rounds of bubble sort: 1
-999983249
-999965924
-999942764
-999934988


## Note that for bubble sort the difference between the best case (file is already sorted) and the worst case (need all or almost all rounds) is huge!

## As fast as C is it cannot overcome a bad algorithm.

## We will learn about a much faster way to sort numbers later in the course!

## Exercise 1 : Write a program called *median* that finds median of a dataset containing real numbers.  You can assume that the dataset has a maximum of 100000 points.  

### Hint: You can start by bubble sorting the data.