# Lecture 5 : Arrays and Strings

# Warmup Exercise : Pointer Review

## What is the following code designed to do?


In [1]:
%%writefile sort2_v1.c
#include <stdio.h>
#include <stdlib.h>

void swap (int a, int b) {
    int temp = a;
    a = b;
    b = temp;
}

int main (int argc, char* argv[]) {
    if (argc < 3) {
        printf ("command usage: %s %s %s\n",argv[0],"a","b");
        return 1;
    }
    int a = atoi(argv[1]);
    int b = atoi(argv[2]);
    if (b < a) {
        swap(a,b);
    }
    printf ("here are your two numbers sorted: %d %d\n",a,b);
}

Writing sort2_v1.c


In [2]:
!gcc -o sort2_v1 sort2_v1.c

In [3]:
!./sort2_v1 10 20

here are your two numbers sorted: 10 20


In [4]:
!./sort2_v1 20 10

here are your two numbers sorted: 20 10


## Note that the given code does not work if the two numbers need to be swapped.  

## Write a new version *sort2_v2.c* that works as expected by fixing the swap function using pointers.  

## Test your new version to make sure it works if the two numbers need to be swapped.

# Part 1 : Introduction to Arrays

## Here is an example illustrating basic array allocation, initialization, and usage.

## Note also that the example generates some random integers.

In [5]:
%%writefile arrays.c
#include <stdio.h>

int main () {
    int a[3] = { 1, 2, 3};
    int b[4] = { 1, 2 }; // missing elements initialized to 0
    int c[5] = { 0 }; // initialize all elements to 0
    int d[] = { 1, 2, 3, 4, 5, 6 }; // size of array inferred
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);
    printf ("b = [ %d %d %d %d ]\n\n",b[0],b[1],b[2],b[3]);
    printf ("c = [ %d %d %d %d %d ]\n\n",c[0],c[1],c[2],c[3],c[4]);
    printf ("d = [ %d %d %d %d %d %d]\n\n",d[0],d[1],d[2],d[3],d[4],d[5]);
}

Writing arrays.c


In [6]:
!gcc -o arrays arrays.c

In [7]:
!./arrays

a = [ 1 2 3 ]

b = [ 1 2 0 0 ]

c = [ 0 0 0 0 0 ]

d = [ 1 2 3 4 5 6]



## Here is an example illustrating that C arrays are not automatically initialized.

## If you do not initialize an array it will contain whatever values that memory has in it.  This could depend on what code was run before you access the unitialized array!

## Note that this example also includes C code for generating random integers.

In [8]:
%%writefile random.c
#include <stdio.h>
#include <stdlib.h> // for srandom and random functions
#include <time.h> // for time function

void random3 () {
    long long x = random();
    long long y = random();
    long long z = random();
    long long sum = x + y + z;
    printf ("here is the sum of three random integers : %lld\n\n",sum);
}

void fun_array () {
    int d[6]; // warning: arrays in C are not automatically initialized!
    printf ("d = [ ");
    for (int i=0;i<6;i++) {
        printf ("%d ",d[i]);
    }
    printf ("]\n\n");
}

int main () {
    srandom(time(NULL)); // seed random number generator
    fun_array();
    random3();
    fun_array();
}

Writing random.c


In [9]:
!gcc -o random random.c

In [10]:
!./random

d = [ 0 0 0 0 0 0 ]

here is the sum of three random integers : 3625960855

d = [ 1676751753 0 890479681 0 1058729421 0 ]



## Exercise : Change the fun_array function so that d is initialized to contain all zeros.

## Allocating arrays on the stack as in the above example is only good practice for small arrays.  

## Note that starting with C99 the size of an array can be determined at runtime.  

In [11]:
%%writefile bigarray.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char* argv[]) {
    if (argc < 2) {
        printf ("command usage: %s %s\n",argv[0],"size");
        return 1;
    }
    int size = atoi(argv[1]);
    int A[size]; // starting in C99 array size can be a variable
    printf ("an int uses %ld bytes of storage\n",sizeof(int));
    for (int i=0;i<size;i++) {
        A[i] = i+1;
    }
    printf ("last element of A is %d\n",A[size-1]);
}

Writing bigarray.c


In [12]:
!gcc -o bigarray bigarray.c

## Note that we can redirect the output to a file.

In [13]:
!./bigarray 1000000 > out.txt
!cat out.txt

an int uses 4 bytes of storage
last element of A is 1000000


## Using *sizeof* we can see that an int requires 4 bytes of storage (same as a Java int).  

## Thus storing an array of 1 million ints will require 4 megabytes or 4 million bytes.

In [14]:
!./bigarray 2500000 > out.txt
!cat out.txt

/bin/bash: line 1:   269 Segmentation fault      (core dumped) ./bigarray 2500000 > out.txt


## **A segmentation fault in C is a runtime error that occurs when a program tries to access memory that it is not allowed to access.**

## Use *ulimit -s* to check the stack size.  The result is in kilobytes (one kilobyte is a thousand bytes).  

In [15]:
!ulimit -s

8192


## Exercise : Given the stack size (in kilobytes) given above, explain why we could allocate and use an array of 1 million ints on the stack but not an array of 2.5 million ints on the stack.  

## **For large arrays we will need to use dynamic memory allocation.**  

## Memory allocated dynamically is put on the heap which is generally much larger than the stack.  

## We will discuss dynamic memory allocation later in the course.

# Part 2 : Arrays and Pointers

## Here is an example illustrating how pointers can be used with arrays.  

In [16]:
%%writefile arrayptrs.c
#include <stdio.h>

int main () {
    int a[3] = { 1, 2, 3 };
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

    int* b = a; // b is an integer pointer that points to the beginning of a
    *b = 4;
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

    b[1] = 5; // we can also use the "array syntax" for the pointer b!
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

    *(b+2) = 6; // using "pointer arithmetic" -> equivalent to b[2] = 6
    printf ("a = [ %d %d %d ]\n\n",a[0],a[1],a[2]);

}

Writing arrayptrs.c


In [17]:
!gcc -o arrayptrs arrayptrs.c

In [18]:
!./arrayptrs

a = [ 1 2 3 ]

a = [ 4 2 3 ]

a = [ 4 5 3 ]

a = [ 4 5 6 ]



## Here is an example illustrating how arrays are passed to functions in C.

## One big thing to remember is that **arrays are passed by pointer in C** for efficiency reasons (i.e. it would be very expensive to always have to copy large arrays when calling a function)

In [19]:
%%writefile arrayfun.c
#include <stdio.h>

// arrays in C are always passed by pointer
void fun1 (int* b) {
    b[0] = 3;
}

// the following alternate syntax is commonly used.
// both methods of passing an array are equivalent.
void fun2 (int c[]) {
    c[1] = 4;
}

int main () {
    int a[2] = { 1, 2 };
    printf ("a = [ %d %d ]\n\n",a[0],a[1]);

    fun1(a); // pass a pointer to the beginning of a
    printf ("a = [ %d %d ]\n\n",a[0],a[1]);

    fun2(a); // pass a pointer to the beginning of a
    printf ("a = [ %d %d ]\n\n",a[0],a[1]);
}

Writing arrayfun.c


In [20]:
!gcc -o arrayfun arrayfun.c

In [21]:
!./arrayfun

a = [ 1 2 ]

a = [ 3 2 ]

a = [ 3 4 ]



## The last example illustrates that arrays and pointers are quite similar in C.

## However there some big differences.  

## An array declaration such as **int a[3]** also sets aside memory to store 3 integers.  

## A pointer declation such as **int* b** only sets aside memory to store the pointer b.  

## A pointer must be set to point to a variable (or array) before it can be dereferenced.  

## Dereferencing an uninitialized pointer usually causes a **segmentation fault** since the pointer is likely pointing to memory that is illegal to access.  

## In particular, dereferencing a **null pointer** always causes a segmentation fault.  

In [22]:
%%writefile danger.c
#include <stdio.h>

int main () {
    int* a;
    a[0] = 3; // dereferencing an uninitialized pointer!
    printf ("a[0] = %d\n",a[0]);
}

Writing danger.c


In [23]:
!gcc -o danger danger.c

In [24]:
!./danger > out.txt

/bin/bash: line 1:   296 Segmentation fault      (core dumped) ./danger > out.txt


# Part 2 : Characters and Strings

## The **C char type is one byte that is used to store characters and letters**.  

## To see the characters and letters that certain values correspond to we use an [ASCII-TABLE](https://www.ascii-code.com/).

In [25]:
%%writefile char.c
#include <stdio.h>

int main () {
    char c = 'A';
    printf ("c as a number is %d\n",c);
    printf ("c as a character is %c\n",c);
}

Writing char.c


In [26]:
!gcc -o char char.c

In [27]:
!./char

c as a number is 65
c as a character is A


## **In C a string is an array of char**.

In [28]:
%%writefile hello.c
#include <stdio.h>
#include <string.h>

int main () {
    char str[] = "Hello World!";
    int lower = 0;
    for (int i=0;i<strlen(str);i++) {
        if ((str[i] >= 'a') && (str[i] <= 'z')) {
            lower += 1;
        }
    }
    printf ("The string %s contains %d lower case letters\n",str,lower);
}

Writing hello.c


In [29]:
!gcc -o hello hello.c

In [30]:
!./hello

The string Hello World! contains 8 lower case letters


## Exercise : Modify the above code to also count the number of upper case letters.

## The next example illustrates that **strings in C are null terminated**.

## This null terminator is how functions such as printf and strlen know how long a string is.

In [31]:
%%writefile terminate.c
#include <stdio.h>
#include <string.h>

int main () {
    char str[] = "Go Hokies!";
    for (int i=0;i<strlen(str)+1;i++) {
        printf ("%c character has ASCII code %d\n",str[i],str[i]);
    }
}


Writing terminate.c


In [32]:
!gcc -o terminate terminate.c

In [33]:
!./terminate

G character has ASCII code 71
o character has ASCII code 111
  character has ASCII code 32
H character has ASCII code 72
o character has ASCII code 111
k character has ASCII code 107
i character has ASCII code 105
e character has ASCII code 101
s character has ASCII code 115
! character has ASCII code 33
  character has ASCII code 0


## You can create a string using an array of characters but be sure that it is null terminated!

In [34]:
%%writefile forgot.c
#include <stdio.h>
#include <string.h>

int main () {
    char str1[] = { 'H', 'e', 'l', 'l', 'o' };
    char str2[] = "other stuff";
    printf ("length of str1 is %ld\n",strlen(str1));
    printf ("length of str2 is %ld\n",strlen(str2));
}

Writing forgot.c


In [35]:
!gcc -o forgot forgot.c

In [36]:
!./forgot

length of str1 is 16
length of str2 is 11


## Be careful when using **char* str** that you understand the memory you are pointing to!

In [37]:
%%writefile careful.c
#include <stdio.h>

int main () {
    char* str = "Hello World!";
    str[0] = 'h';
    printf ("%s",str);
}


Writing careful.c


In [38]:
!gcc -o careful careful.c

In [39]:
!./careful > out.txt
!cat out.txt

/bin/bash: line 1:   331 Segmentation fault      (core dumped) ./careful > out.txt


## Exercise : Fix the above code by changing **char* str** to **char str[]**.

## With the first version *str* is a pointer to constant memory than can be read but not written to.  

## With the second version *str* is a read/write array that has been initialized to contain the given string of characters (plus the null terminator at the end).  

# Part 3 : Working with a list of possible Wordle answers.

## Let's use wget to grab a file containing possible Wordle answers.

In [40]:
!wget -O answers.txt https://gist.githubusercontent.com/cfreshman/a7b776506c73284511034e63af1017ee/raw/60531ab531c4db602dacaa4f6c0ebf2590b123da/wordle-nyt-answers-alphabetical.txt

--2024-01-31 22:44:30--  https://gist.githubusercontent.com/cfreshman/a7b776506c73284511034e63af1017ee/raw/60531ab531c4db602dacaa4f6c0ebf2590b123da/wordle-nyt-answers-alphabetical.txt
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13853 (14K) [text/plain]
Saving to: ‘answers.txt’


2024-01-31 22:44:30 (20.3 MB/s) - ‘answers.txt’ saved [13853/13853]



## The number of words in the file:

In [41]:
!wc -l answers.txt

2308 answers.txt


## The first 10 answers:

In [42]:
!head -10 answers.txt

aback
abase
abate
abbey
abbot
abhor
abide
abled
abode
abort


## Here is a program that searches the Wordle answer list for a given word.  

In [43]:
%%writefile search.c
#include <stdio.h>
#include <string.h>

int main (int argc, char* argv[]) {
    if (argc < 2) {
        printf ("command usage: %s %s\n",argv[0],"word");
        return 1;
    }
    char* word = argv[1];
    char next[6]; // Need 5 chars for Wordle word and 1 for null terminator.
    while (scanf("%5s",next) == 1) { // %5s tells scanf to read at most 5 characters
        if (strcmp(word,next) == 0) { // strcmp returns 0 if the strings are equal
            printf ("%s is a possible Wordle answer.\n",word);
            return 0;
        }
    }
    printf ("%s is not a possible Wordle answer.\n",word);
}

Writing search.c


In [44]:
!gcc -o search search.c

In [45]:
!cat answers.txt | ./search hello

hello is a possible Wordle answer.


In [46]:
!cat answers.txt | ./search aargh

aargh is not a possible Wordle answer.


## Here is a program that determines the most frequent letter for a given blank number.  

## The command line argument blank is a number from 0 to 4 where 0 is the first blank, 1 is the second blank, etc.

In [47]:
%%writefile frequent.c
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char* argv[]) {
    if (argc < 2) {
        printf ("command usage: %s %s\n",argv[0],"blank");
    }
    int blank = atoi(argv[1]); // blank is a number from 0 to 4
    int count[26] = { 0 };
    char next[6];
    int total_words = 0;
    while (scanf("%5s",next) == 1) {
        count[next[blank]-'a'] += 1;
        total_words += 1;
    }
    int max_count = 0;
    char most_common;
    for (int i=0;i<26;i++) {
        if (count[i] > max_count) {
            max_count = count[i];
            most_common = 'a'+i;
        }
    }
    printf ("The most frequently occuring letter in blank %d is %c.\n",
            blank,most_common);
    printf ("The letter %c occurs %d times in blank %d out of %d total words.\n",
            most_common,max_count,blank,total_words);
}

Writing frequent.c


In [48]:
!gcc -o freqeunt frequent.c

In [49]:
!cat answers.txt | ./freqeunt 0

The most frequently occuring letter in blank 0 is s.
The letter s occurs 365 times in blank 0 out of 2309 total words.


## Exercise: Add error checking to *common.c*.  In particular, what errors should you check for?