# THE C PROGRAMMING LANGUAGE 

by Brian W. Kernighan and Dennis M. Ritchie (1988)

C retains the basic philosophy that 

> programmers know what they are doing; it only requires that they state their intentions explicitly

**Additional resources**

1. [The C Programming Language: A Very Brief Introduction](https://www.linfo.org/c.html) (LINFO)

1. "C Programming: A Modern Approach" by K.N. King - This book is often recommended as a modern alternative to Kernighan and Ritchie's book. It covers C programming in a clear and comprehensive manner, and also includes modern practices and techniques.

1. "The C Programming Language" (2nd Edition) by Brian W. Kernighan and Dennis M. Ritchie - This is an updated version of the original book, which includes the ANSI C standard as well as new material on the C99 standard. It's a great choice if you want to stick with the same authors and writing style.

1. "Head First C" by David Griffiths and Dawn Griffiths - This book takes a unique and engaging approach to teaching C programming, and covers modern C language features and best practices.

# <b>0. Introduction</b>

Many of the important ideas of C stem from the language BCPL, developed by Martin Richards. The influence of BCPL on C proceeded indirectly through the language B, which was written by Ken Thompson in 1970 for the first UNIX system on the DEC PDP−7.

BCPL and B are **typeless** languages. By contrast, C provides a variety of data types. The fundamental types are characters, and integers and floating point numbers of several sizes. In addition, there is a hierarchy of derived data types created with pointers, arrays, structures and unions. Expressions are formed from operators and operands; any expression, including an assignment or a function call, can be a statement. Pointers provide for machine−independent address arithmetic.

## Control Flow

C provides the fundamental control−flow constructions required for well−structured programs: 
- statement grouping, 
- decision making (if−else), 
- selecting one of a set of possible values (switch), 
- looping with the termination test 
    - at the top (while, for) or 
    - at the bottom (do), and 
- early loop exit (break).

## Functions

Functions may return values of 
- basic types, 
- structures, 
- unions, or 
- pointers. 

Any function may be called recursively. 

Local variables are typically **automatic**, or created anew with each invocation. Function definitions may not be nested but variables may be declared in a block−structured fashion. 

The functions of a C program may exist in separate source files that are compiled separately. 

Variables may be 
- internal to a function, 
- external but known only within a single source file, or 
- visible to the entire program.

A preprocessing step performs 
- macro substitution on program text, 
- inclusion of other source files, and 
- conditional compilation.

## Low-level language

C is a relatively **low−level** language. This characterization is not pejorative; it simply means that C deals with the same sort of objects that most computers do, namely 
- characters, 
- numbers, and 
- addresses. 

These may be combined and moved about with the arithmetic and logical operators implemented by real machines.

C provides no operations to deal directly with composite objects such as 
- character strings, 
- sets, 
- lists or 
- arrays. 

There are no operations that manipulate an entire array or string, although structures may be copied as a unit. 

The language does not define any storage allocation facility other than static definition and the stack discipline provided by the local variables of functions; there is no heap or garbage collection. 

Finally, C itself provides no input/output facilities; there are no `READ` or `WRITE` statements, and no built−in file access methods. All of these higher−level mechanisms must be provided by explicitly called functions. Most C implementations have included a reasonably standard collection of such functions.

Similarly, C offers only straightforward, single−thread control flow: 
- tests, 
- loops, 
- grouping, and 
- subprograms, 

but not multiprogramming, parallel operations, synchronization, or coroutines.

Since C is relatively small, it can be described in small space, and learned quickly. A programmer can reasonably expect to know and understand and indeed regularly use the entire language.

### Heap

ChatGPT:

In the context of programming, the "heap" refers to a section of a computer's memory where dynamic memory allocation occurs. When a program needs to allocate memory at runtime (i.e., when the size of the memory needed is not known until the program is running), it will typically request memory from the heap.

In languages with manual memory management, such as C, the programmer is responsible for allocating and freeing memory from the heap. This means that the programmer must explicitly request memory from the heap when it is needed and also release it when it is no longer required to prevent memory leaks.

In contrast, languages with garbage collection, such as Java or C#, automatically manage memory allocation and deallocation for the programmer, freeing them from the responsibility of explicitly releasing memory. This helps to prevent memory leaks and makes memory management less error-prone.

So when Kernighan and Ritchie said that "there is no heap or garbage collection" in C, they meant that memory management in C is manual, and the programmer must handle memory allocation and deallocation for dynamic memory themselves.

# <b>1. A Tutorial Introduction</b>

## 1.1 Hello, world!

In [1]:
#include <stdio.h>

int main() {

    printf("Hello, world!\n");
    return 0;

}

Hello, world!


Ways of compiling in the terminal emulator:

```sh
cc hello.c
# or
gcc hello.c

./a.out
```

Or using `clang`:
```sh
clang hello hello.c
./hello
```

which is more comfortable as you can name the compiled file as you want.

### Escape sequences

|sequence|description|
|-|-|
|`\b`|backspace|
|`\?`|question mark|
|`\f`|formfeed|
|`\'`|single quote|
|`\n`|newline|
|`\"`|double quote|
|`\r`|carriage return|
|`\oo`| octal number|
|`\t`|horizontal tab|
|`\xh`| hexadecimal number|
|`\v`|vertical tab|

In [4]:
#include <stdio.h>

int main() {

    printf("Hello,\thorizontal tab!\n");
    printf("Hello,\bbackspace!\n");
    printf("Hello,\vvertical tab!\n");
    return 0;

}

Hello,	horizontal tab!
Hellobackspace!
Hello,vertical tab!


## <a name="section_1_2">1.2 Variables and Arithmetic Expressions</a>

In [5]:
#include <stdio.h>

/*
 print Fahrenheit−Celsius table
   for fahr = 0, 20, ..., 300
*/

int main(void)
{
	int fahr, celsius;
	int lower, upper, step;

	lower = 0;		// lower limit of temperature scale
	upper = 300;	// upper limit
	step = 20;		// step size

	fahr = lower;
	printf("fahr:\tcelsius:\n");
	while (fahr <= upper)
	{
		celsius = 5 * (fahr - 32) / 9;
		printf("%d\t%d\n", fahr, celsius);
		fahr += step;
	}
}

fahr:	celsius:
0	-17
20	-6
40	4
60	15
80	26
100	37
120	48
140	60
160	71
180	82
200	93
220	104
240	115
260	126
280	137
300	148


### Declaration

In C, all variables must be declared before they are used, usually at the beginning of the function before any executable statements. A **declaration** announces the properties of variables; it consists of a name and a list of variables, such as 
```c
int fahr, celsius;
int lower, upper, step;
```

#### Types

The type `int` means that the variables listed are integers; by contrast with `float`, which means floating point, i.e., numbers that may have a fractional part. The range of both `int` and `float` depends on the machine you are using; 16−bits `int`s, which lie between `−32768` and `+32767`, are common, as are 32−bit `int`s. A `float` number is typically a 32−bit quantity, with at least six significant digits and magnitude generally between about $10^{−38}$ and $10^{38}$.

In [1]:
# python
print(10 ** (-38))
print(10 ** 38)

1e-38
100000000000000000000000000000000000000


C provides several other data types besides `int` and `float`, including:
|name|description|
|-|-|
|`char`| character − a single byte|
|`short`| short integer|
|`long`| long integer|
|`double`| double−precision floating point|

The size of these objects is also machine−dependent. There are also `arrays`, `structures` and `unions` of these basic types, `pointers` to them, and `functions` that return them, all of which we will meet in due course.

#### `float`

_ChatGPT:_  
The size and range of floating-point numbers in computers are determined by the IEEE 754 standard for floating-point arithmetic. In a 32-bit floating-point representation, the numbers are typically divided into three components: 
- the sign bit, 
- the exponent, and 
- the mantissa (also called the significand).

The sign bit determines the sign of the number (positive or negative), while the exponent and mantissa together determine the magnitude and precision of the number. The exponent is used to represent the scale of the number, allowing for a wide range of values, while the mantissa provides the precision necessary for the significant digits.

In Python, you can use the `sys.float_info` to get information about the range and precision of floating-point numbers. Let's look at an example:

In [2]:
import sys

print(sys.float_info)

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)


This output tells us that the maximum and minimum representable floating-point numbers (64-bit) are in the range of about $10^-308$ to $10^308$. The number of significant digits (`dig`) is 15, and the mantissa digits (`mant_dig`) is 53, which provides the precision necessary for the significant digits.

In summary, the large range and precision of 64-bit floating-point numbers are achieved through a combination of the exponent and mantissa components, which allow for a wide range of values with at least six significant digits.

There isn't a direct equivalent for `float32` (single precision, 32-bit binary floats in the IEEE 754 standard) in the `sys` module.

If you want to obtain similar information for `float32`, you would typically use the `numpy` library, which allows you to work with single-precision floats and provides ways to inspect their properties.

In [3]:
import numpy as np

# The largest positive number representable in 32-bit float
float32_max = np.finfo(np.float32).max

# The smallest positive number representable in 32-bit float
float32_min = np.finfo(np.float32).tiny

# The difference between 1 and the least value greater than 1 that is representable in 32-bit float
float32_eps = np.finfo(np.float32).eps

# The number of decimal digits that can be reliably represented in a 32-bit float (approximate)
float32_dig = np.finfo(np.float32).precision

# The number of base-10 digits in the mantissa of a 32-bit float
float32_mant_dig = np.finfo(np.float32).nmant

# The number of bits in the exponent portion of a 32-bit float
float32_max_exp = np.finfo(np.float32).maxexp

# The actual information display
print("32-bit float information:")
print(f"Maximum float32 value: {float32_max}")
print(f"Minimum positive float32 value: {float32_min}")
print(f"Epsilon for float32: {float32_eps}")
print(f"Decimal precision for float32: {float32_dig}")
print(f"Mantissa digits for float32: {float32_mant_dig}")
print(f"Maximum exponent for float32: {float32_max_exp}")

32-bit float information:
Maximum float32 value: 3.4028234663852886e+38
Minimum positive float32 value: 1.1754943508222875e-38
Epsilon for float32: 1.1920928955078125e-07
Decimal precision for float32: 6
Mantissa digits for float32: 23
Maximum exponent for float32: 128


In [4]:
np.finfo(np.float32)

finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)

### Assignment statements

```c
lower = 0;
upper = 300;
step = 20;
```

which set the variables to their initial values. Individual statements are terminated by semicolons.

### Integer division truncates

- any fractional part is discarded. 

Since `5` and `9` are integers, `5/9` would be truncated to `0` and so all the Celsius temperatures would be reported as zero:

In [1]:
#include <stdio.h>

int main(void)
{
    printf("%d\n", 5/9);
}

0


Thus, we must do:

```c
celsius = 5 * (fahr−32) / 9;
```

### Right−justified output

- augment each %d in the printf statement with a width:

```c
printf("%3d %6d\n", fahr, celsius);
```

In [3]:
#include <stdio.h>


int main(void)
{
	int fahr, celsius;
	int lower, upper, step;

	lower = 0;
	upper = 300;
	step = 20;

	fahr = lower;
	printf("fahr:\tcelsius:\n");
	while (fahr <= upper)
	{
		celsius = 5 * (fahr - 32) / 9;
		printf("%4d %10d\n", fahr, celsius);
		fahr += step;
	}
}

fahr:	celsius:
   0        -17
  20         -6
  40          4
  60         15
  80         26
 100         37
 120         48
 140         60
 160         71
 180         82
 200         93


###  Floating−point arithmetic

In [4]:
#include <stdio.h>

int main(void)
{
	float fahr, celsius;
	float lower, upper, step;

	lower = 0;
	upper = 300;
	step = 20;

	fahr = lower;
	printf("fahr:\tcelsius:\n");
	while (fahr <= upper)
	{
		celsius = (5.0/9.0) * (fahr - 32.0);
		printf("%4.0f %10.1f\n", fahr, celsius);
		fahr += step;
	}
}

fahr:	celsius:
   0      -17.8
  20       -6.7
  40        4.4
  60       15.6
  80       26.7
 100       37.8
 120       48.9
 140       60.0
 160       71.1
 180       82.2
 200       93.3
 220      104.4
 240      115.6
 260      126.7
 280      137.8
 300      148.9


- If an arithmetic operator has integer operands, an integer operation is performed. 
- If an arithmetic operator has one floating−point operand and one integer operand, however, the integer will be converted to floating point before the operation is done. 

If we had written (fahr−32), the 32 would be automatically converted to floating point. Nevertheless, writing floating−point constants with explicit decimal points even when they have integral values emphasizes their floating−point nature for human readers.

The `printf` conversion specification `%3.0f` says that a floating−point number (here `fahr`) is to be printed at least three characters wide, with no decimal point and no fraction digits. `%6.1f` describes another number (`celsius`) that is to be printed at least six characters wide, with `1` digit after the decimal point.

Width and precision may be omitted from a specification: %6f says that the number is to be at least six characters wide; %.2f specifies two characters after the decimal point, but the width is not constrained; and %f merely says to print the number as floating point.
|sequence|description|
|-|-|
|`%6d`| print as decimal integer, at least 6 characters wide|
|`%f`| print as floating point|
|`%6f`|  print as floating point, at least 6 characters wide|
|`%.2f`|  print as floating point, 2 characters after decimal point|
|`%6.2f`| print as floating point, at least 6 wide and 2 after decimal point|

celsius = (5.0/9.0) * (fahr - 32.0)

(5.0/9.0) * fahr = (5.0/9.0) * 32.0 + celsius

fahr = ((5.0/9.0) * 32.0 + celsius) / (5.0/9.0)

fahr = ((5.0/9.0) * 32.0) / (5.0/9.0) + (celsius / (5.0/9.0))

fahr = 32.0 + (celsius / (5.0/9.0))

In [5]:
#include <stdio.h>

/*
 print Celsius-Fahrenheit table
   for celsius = -40, -30, ..., 140
*/

int main(void)
{
	float fahr, celsius;
	float lower, upper, step;

	lower = -40;		// lower limit of temperature scale
	upper = 140;	// upper limit
	step = 10;		// step size

	celsius = lower;
	printf("Celsius: Fahrenheit:\n");
	while (celsius <= upper)
	{
		fahr = celsius * 9.0 / 5.0 + 32.0;
		printf("%5.0f\t%9.1f\n", celsius, fahr);
		celsius += step;
	}
}

Celsius: Fahrenheit:
  -40	    -40.0
  -30	    -22.0
  -20	     -4.0
  -10	     14.0
    0	     32.0
   10	     50.0
   20	     68.0
   30	     86.0
   40	    104.0
   50	    122.0
   60	    140.0
   70	    158.0
   80	    176.0
   90	    194.0
  100	    212.0
  110	    230.0
  120	    248.0
  130	    266.0
  140	    284.0


## 1.4 Symbolic Constants

A `#define` line defines a **symbolic name** or **symbolic constant** to be a particular string of characters:

```c
#define name replacement_text
```

Thereafter, any occurrence of `name` (not in quotes and not part of another name) will be replaced by the corresponding `replacement text`. The `name` has the same form as a **variable name**: a sequence of letters and digits that begins with a letter. The `replacement text` can be any sequence of characters; it is not limited to numbers.

In [6]:
#include <stdio.h>

#define LOWER 0 	// lower limit of table
#define UPPER 300	// upper limit
#define STEP 20		// step size

/* print Fahrenheit-Celsius table */

int main(void)
{
	int fahr;

	for (fahr = LOWER; fahr <= UPPER; fahr += STEP)
		printf("%3d %6.1f\n", fahr, (5.0/9.0)*(fahr-32));
}

  0  -17.8
 20   -6.7
 40    4.4
 60   15.6
 80   26.7
100   37.8
120   48.9
140   60.0
160   71.1
180   82.2
200   93.3
220  104.4
240  115.6
260  126.7
280  137.8
300  148.9


The quantities `LOWER`, `UPPER` and `STEP` are **symbolic constants**, not variables, so they do not appear in declarations. 

Symbolic constant names are conventionally written in upper case so they can be readily distinguished from lower case variable names. Notice that there is no semicolon at the end of a `#define` line.

## 1.5 Character Input and Output

The model of input and output supported by the standard library is very simple. Text input or output, regardless of where it originates or where it goes to, is dealt with as **streams of characters**. 

> A **text stream** is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character (`\n`). 

It is the responsibility of the library to make each input or output stream conform to this model; the C programmer using the library need not worry about how lines are represented outside the program.

### `getchar`

The standard library provides several functions for reading or writing one character at a time, of which `getchar` and `putchar` are the simplest. 

Each time it is called, `getchar` reads the next input character from a text stream and _returns_ that as its value. That is, after

```c
c = getchar()
```
    
the variable `c` contains the next character of input. The characters normally come from the keyboard; input from files is discussed in Chapter 7.

```c
/*	$OpenBSD: getchar.c,v 1.8 2015/08/31 02:53:57 guenther Exp $ */
/*-
 * Copyright (c) 1990, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * This code is derived from software contributed to Berkeley by
 * Chris Torek.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#include <stdio.h>

/*
 * A subroutine version of the macro getchar_unlocked.
 */
#undef getchar_unlocked

int
getchar_unlocked(void)
{
	return (getc_unlocked(stdin));
}
DEF_WEAK(getchar_unlocked);


/*
 * A subroutine version of the macro getchar.
 */

#undef getchar

int
getchar(void)
{
	return (getc(stdin));
}
DEF_STRONG(getchar);
```

[source](https://github.com/openbsd/src/blob/master/lib/libc/stdio/getchar.c)

```c
#include <stdio.h>
#include "local.h"

/*
 * A subroutine version of the macro getc_unlocked.
 */
#undef getc_unlocked

int
getc_unlocked(FILE *fp)
{
	return (__sgetc(fp));
}
DEF_WEAK(getc_unlocked);

/*
 * A subroutine version of the macro getc.
 */
#undef getc

int
getc(FILE *fp)
{
	int c;

	FLOCKFILE(fp);
	c = __sgetc(fp);
	FUNLOCKFILE(fp);
	return (c);
}
DEF_STRONG(getc);
```

[source](https://github.com/openbsd/src/blob/master/lib/libc/stdio/getc.c)

### `putchar`

The function putchar _prints_ a character each time it is called:
```c
putchar(c)
```

prints the contents of the integer variable `c` as a character, usually on the screen. Calls to `putchar` and `printf` may be interleaved; the output will appear in the order in which the calls are made.

### 1.5.1 File Copying

Given `getchar` and `putchar`, you can write a surprising amount of useful code without knowing anything more about input and output. The simplest example is a program that copies its input to its output one character at a time:

```
read a character
while (character is not end-of-file indicator)
         output the character just read
         read a character
```

In [7]:
#include <stdio.h>

/* copy input to output; 1st version */
int main(void)
{
	int c;

	c = getchar();
	while (c != EOF)
	{
		putchar(c);
		c = getchar();
	}
}

What appears to be a character on the keyboard or screen is of course, like everything else, stored internally just as a bit pattern. The type `char` is specifically meant for storing such character data, but any integer type can be used. We used `int` for a subtle but important reason.

The problem is distinguishing the end of the input from valid data. The solution is that `getchar` returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called `EOF`, for “end of file.” We must declare `c` to be a type big enough to hold any value that `getchar` returns. We can’t use `char` since `c` must be big enough to hold `EOF` in addition to any possible `char`. Therefore we use `int`.

`EOF` is an integer defined in `<stdio.h>`, but the specific numeric value doesn’t matter as long as it is not the same as any `char` value. By using the symbolic constant, we are assured that nothing in the program depends on the specific numeric value.

```sh
less /usr/include/stdio.h

/* The value returned by fgetc and similar functions to indicate the
   end of the file.  */
#define EOF (-1)
```

In [1]:
cat /usr/include/stdio.h

/* Define ISO C stdio on top of C++ iostreams.
   Copyright (C) 1991-2022 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <https://www.gnu.org/licenses/>.  */

/*
 *	ISO C99 Standard: 7.19 Input/output	<stdio.h>
 */

#ifndef _STDIO_H
#define _STDIO_H	1

#define __GLIBC_INTERNAL_STARTING_HEADER_IMPLEMENTATION
#include <bit

ChatGPT:

The program reads characters from the user's input one by one until `EOF` is encountered. This can happen if the user signals end-of-file (which is usually done by pressing `Ctrl+D` on Unix/Linux or `Ctrl+Z` followed by the `Enter` key on Windows systems at the start of a new line) or if an input error occurs.

The program for copying would be written more concisely by experienced C programmers. In C, any assignment, such as

```c
c = getchar()
```

is an expression and has a value, which is the value of the left hand side after the assignment. This means that 

> an assignment can appear as part of a larger expression. 

If the assignment of a character to `c` is put inside the test part of a while loop, the copy program can be written this way:

In [None]:
#include <stdio.h>

/* copy input to output; 2nd version */
int main(void)
{
	int c;

	while ((c = getchar()) != EOF)
		putchar(c);
}

This version centralizes the input—there is now only one reference to `getchar` — and shrinks the program. The resulting program is more compact, and, once the idiom is mastered, easier to read. You’ll see this style often. (It’s possible to get carried away and create impenetrable code, however, a tendency that we will try to curb.)

The parentheses around the assignment within the condition are necessary. The _precedence_ of `!= `is higher than that of `=`, which means that in the absence of parentheses the relational test `!=` would be done before the assignment `=`. So the statement

```c
c = getchar() != EOF
```

is equivalent to

```c
c = (getchar() != EOF)
```

This has the undesired effect of setting `c` to `0` or `1`, depending on whether or not the call of `getchar` encountered end of file. (More on this in Chapter 2.)

In [4]:
#include <stdio.h>

/* verify that the expression getchar() != EOF is 0 or 1 */
int main(void)
{
	printf("%d\n", getchar() != EOF);
}

0


```
hello
1
```

In [3]:
#include <stdio.h>

/* print the value of EOF */
int main(void)
{
	printf("%d\n", EOF);
}


-1


### 1.5.2 Character Counting

In [5]:
#include <stdio.h>

/* count characters in input; 1st version */
int main(void)
{
	long nc;

	nc = 0;
	while (getchar() != EOF)
		++nc;
	printf("%ld\n", nc);
}

0


```
./a.out 
hello
world
12
```

The operators `++` and `--` can be either **prefix** operators (`++nc`) or **postfix** (`nc++`); these two forms have different values in expressions, as will be shown in Chapter 2, but `++nc` and `nc++` both increment `nc`. For the moment we will stick to the prefix form.

The character counting program accumulates its count in a `long` variable instead of an `int`. `long` integers are at least `32` bits. Although on some machines, `int` and `long` are the same size, on others an `int` is `16` bits, with a maximum value of `32767`, and it would take relatively little input to overflow an `int` counter. The conversion specification `%ld` tells `printf` that the corresponding argument is a `long` integer.

It may be possible to cope with even bigger numbers by using a `double` (double precision float). We will also use a `for` statement instead of a while, to illustrate another way to write the loop.

In [1]:
import sys

print("Integer in Python:\n", sys.int_info)

Integer in Python:
 sys.int_info(bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640)


ChatGPT:

- `bits_per_digit=30`: This argument specifies the number of bits used to represent each digit of an integer. In this case, each digit uses 30 bits.

- `sizeof_digit=4`: This argument specifies the size in bytes of each digit. In this case, each digit occupies 4 bytes of memory.

- `default_max_str_digits=4300`: This argument sets the default maximum number of digits allowed when converting an integer to a string. If a larger number of digits is encountered, an exception will be raised.

- `str_digits_check_threshold=640`: This argument represents a threshold value for a check performed while converting an integer to a string. If the number of digits exceeds this threshold, additional checks are performed to avoid unnecessary memory allocation.

In [2]:
import numpy as np

np.iinfo(np.int32)

iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [3]:
np.iinfo(np.int64)

iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

In [1]:
#include <stdio.h>

/* count characters in input; 2nd version */
int main(void)
{
	double nc;

	for (nc = 0; getchar() != EOF; ++nc)
		;
	printf("%.0f\n", nc);
}

0


```
./a.out 
hello
world
12
```

- `printf` uses `%f` for both `float` and `double`; 
- `%.0f` suppresses printing of the decimal point and the fraction part, which is zero.

The body of this `for` loop is empty, because all of the work is done in the test and increment parts. But the grammatical rules of C require that a for statement have a body. The isolated semicolon, called a **null statement**, is there to satisfy that requirement. We put it on a separate line to make it visible.

Before we leave the `character counting `program, observe that 

> if the input contains no characters, the `while` or `for` test fails on the very first call to `getchar`, and the program produces zero, the right answer. 

```
./a.out 
0
```

This is important. One of the nice things about `while` and `for` is that they test at the top of the loop, before proceeding with the body. If there is nothing to do, nothing is done, even if that means never going through the loop body. Programs should act intelligently when given zero-length input. The `while` and `for` statements help ensure that programs do reasonable things with boundary conditions.

### 1.5.3 Line Counting

The next program counts input lines. As we mentioned above, the standard library ensures that an input text stream appears as a sequence of lines, each terminated by a newline. Hence, counting lines is just counting newlines:

In [2]:
#include <stdio.h>

/* count lines in input; 1st version */
int main(void)
{
	int c, nl;

	nl = 0;
	while ((c = getchar()) != EOF)
		if (c == '\n')
			++nl;
	printf("%d\n", nl);
}

0


```
./a.out 
hello
world
2
```

The double equals sign `==` is the C notation for “is equal to” (like Pascal’s single `=` or Fortran’s `.EQ.`). This symbol is used to distinguish the equality test from the single `=` that C uses for assignment. A word of caution: newcomers to C occasionally write `=` when they mean `==`. As we will see in Chapter 2, the result is usually a legal expression, so you will get no warning.

A character written between single quotes represents an integer value equal to the numerical value of the character in the machine’s character set. This is called a character constant, although it is just another way to write a small integer. So, for example, ′A′ is a character constant; in the ASCII character set its value is 65, the internal representation of the character A. Of course ′A′ is to be preferred over 65: its meaning is obvious, and it is independent of a particular character set.

The escape sequences used in string constants are also legal in character constants, so `′\n′` stands for the value of the newline character, which is `10` in ASCII. You should note carefully that `′\n′` is a single character, and in expressions is just an integer; on the other hand, `"\n"` is a string constant that happens to contain only one character. The topic of strings versus characters is discussed further in Chapter 2.

#### Exercise 1-8

Write a program to count blanks, tabs, and newlines.

In [3]:
#include <stdio.h>

/* count lines in input; 1st version */
int main(void)
{
    int c, blanks, tabs, newlines;

    blanks = tabs = newlines = 0;
    while ((c = getchar()) != EOF)
        if (c == ' ')
                ++blanks;
        else if (c == '\t')
                ++tabs;
        else if (c == '\n')
                ++newlines;
    printf("Blanks: %d, tabs: %d, newlines: %d\n", blanks, tabs, newlines);
}

Blanks: 0, tabs: 0, newlines: 0


```
./a.out 
hello C
my name is Vadim
this    is      tab
two tabs
Blanks: 5, tabs: 2, newlines: 4
```

#### Exercise 1-9

Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank.

#### Exercise 1-10

Write a program to copy its input to its output, replacing each tab by `\t`, each backspace by `\b`, and each backslash by `\\`. This makes tabs and backspaces visible in an unambiguous way.

### 1.5.4 Word Counting

The fourth in our series of useful programs counts lines, words, and characters, with the loose definition that 

> a **word** is any sequence of characters that does not contain a blank, tab or newline. 

This is a bare-bones version of the UNIX program `wc`.

In [1]:
man wc

WC(1)                            User Commands                           WC(1)

NAME
       wc - print newline, word, and byte counts for each file

SYNOPSIS
       wc [OPTION]... [FILE]...
       wc [OPTION]... --files0-from=F

DESCRIPTION
       Print newline, word, and byte counts for each FILE, and a total line if
       more than one FILE is specified.  A word is a non-zero-length  sequence
       of printable characters delimited by white space.

       With no FILE, or when FILE is -, read standard input.

       The  options  below may be used to select which counts are printed, al‐
       ways in the following order: newline, word,  character,  byte,  maximum
       line length.

       -c, --bytes
              print the byte counts

       -m, --chars
              print the character counts

       -l, --lines
              print the newline counts

       --files0-from=F
              read  input  from the files specified by NUL-terminated names in
              file F; If

In [1]:
#include <stdio.h>

#define IN 1  // inside a word
#define OUT 0 // outside a word

/* count lines, words, and characters in input */
int main()
{
    int c, nl, nw, nc, state;

    state = OUT;
    nl = nw = nc = 0;
    while ((c = getchar()) != EOF)
    {
        ++nc;
        if (c == '\n')
            ++nl;
        if (c == ' ' || c == '\n' || c == '\t')
            state = OUT;
        else if (state == OUT)
        {
            state = IN;
            ++nw;
        }
    }
    printf("%d %d %d\n", nl, nw, nc);
}

0 0 0


```
./a.out 
hello world
this is Vadim   from the USSR!
2 8 42
```

Every time the program encounters the first character of a word, it counts one more word. The variable state records whether the program is currently in a word or not; initially it is “not in a word,” which is assigned the value `OUT`. We prefer the symbolic constants `IN` and `OUT` to the literal values `1` and `0` because they make the program more readable. In a program as tiny as this, it makes little difference, but in larger programs, the increase in clarity is well worth the modest extra effort to write it this way from the beginning. You’ll also find that it’s easier to make extensive changes in programs where magic numbers appear only as symbolic constants.
***
ChatGPT:

The term "**magic numbers**" refers to literal or hardcoded numerical values that are used in programming without any clear explanation of their meaning or purpose. These numbers are often arbitrary and seem to have a special significance within the code.

The passage you mentioned refers to the use of symbolic constants (such as `IN` and `OUT`) instead of literal values (such as `1` and `0`) to improve the readability and maintainability of the code. By using symbolic constants, the code becomes more self-explanatory and easier to understand, as the meaning and purpose of these values are explicitly stated through the constants' names.

Using symbolic constants instead of magic numbers provides several advantages:

- **Readability**: The code becomes more human-readable, as the symbolic constants have meaningful names that convey their purpose and intent.

- **Maintainability**: If the value of a particular constant needs to be changed in the future, it only needs to be modified at one location, rather than searching for all occurrences of the magic number throughout the code.

- **Code clarity and understanding**: By using symbolic constants, the code becomes self-documenting, making it easier for other programmers (including the original developer) to understand and modify the code later on.

- **Flexibility**: By defining values as symbolic constants, the code becomes more flexible and adaptable to changes. If the program requirements change and certain values need to be adjusted, it can be done easily by modifying the constants' definitions.

In larger programs, where changes and modifications are more frequent, avoiding the use of magic numbers by replacing them with symbolic constants becomes even more essential. This approach enhances the code's readability, maintainability, and overall quality.
***

The line
```c
nl = nw = nc = 0;
```
sets all three variables to zero. This is not a special case, but a consequence of the fact that an assignment is an expression with a value and assignments associate from right to left. It’s as if we had written:

```c
nl = (nw = (nc = 0));
```

The operator `||` means `OR`, so the line

```c
if (c == ′ ′ || c == ′\n′ || c == ′\t′)
```
    
says “_if `c` is a blank **or** `c` is a newline **or** `c` is a tab_”. (Recall that the escape sequence `\t` is a visible representation of the tab character.) There is a corresponding operator `&&` for `AND`; its precedence is just higher than `||`. Expressions connected by `&&` or `||` are evaluated left to right, and it is guaranteed that evaluation will stop as soon as the truth or falsehood is known. If `c` is a blank, there is no need to test whether it is a newline or tab, so these tests are not made. This isn’t particularly important here, but is significant in more complicated situations, as we will soon see.

The example also shows an `else`, which specifies an alternative action if the condition part of an `if` statement is false. The general form is

```c
if (expression)
        statement1
else
        statement2
```

One and only one of the two statements associated with an `if-else` is performed. If the expression is `true`, `statement1` is executed; if not, `statement2` is executed. Each statement can be a single statement or several in braces. In the word count program, the one after the `else` is an `if` that controls two statements in braces.

#### Exercise 1-11

How would you test the `word count` program? What kinds of input are most likely to uncover bugs if there are any?

#### Exercise 1-12

Write a program that prints its input one word per line.

In [2]:
#include <stdio.h>

#define IN 1  // inside a word
#define OUT 0 // outside a word

/* count lines, words, and characters in input */
int main()
{
    int c,state;
    int prev_c;

    state = OUT;
    prev_c = -1;  // check for extra ' ', \n, \t
    while ((c = getchar()) != EOF)
    {
        if (c == ' ' || c == '\n' || c == '\t')
        {
            if (prev_c == ' ' || prev_c == '\n' || prev_c == '\t')
                    continue;  // we do not need extra new lines
            state = OUT;
            printf("\n");
        }
        else if (state == OUT)
            state = IN;
        if (state == IN)
            putchar(c);
        prev_c = c;
    }
}

```
./a.out 
hello world (enter)
hello
world
 hello world (enter)
hello
world
 hello  world (enter)
hello
world
        hello world (enter)
hello
world
```

## 1.6 Arrays

Let us write a program to count the number of occurrences of each digit, of white space characters (blank, tab, newline), and of all other characters. This is artificial, but it permits us to illustrate several aspects of C in one program.

There are twelve categories of input, so it is convenient to use an array to hold the number of occurrences of each digit, rather than ten individual variables. Here is one version of the program:

In [3]:
#include <stdio.h>

/* count digits, white spaces,  others */
int main()
{
    int c, i, nwhite, nother;
    int ndigit[10];

    nwhite = nother = 0;
    for (i = 0; i < 10; i++)
            ndigit[i] = 0;

    while ((c = getchar()) != EOF)
        if (c >= '0' && c <= '9')
                ++ndigit[c-'0'];
        else if (c == ' ' || c == '\n' || c == '\t')
                ++nwhite;
        else
                ++nother;

    printf("digits = ");
    for (i = 0; i < 10; i++)
            printf("%d", ndigit[i]);
    printf(", white space = %d, other = %d\n", nwhite, nother);
}

digits = 0000000000, white space = 0, other = 0


```
./a.out
(program itself's text)

digits = 9300000001, white space = 123, other = 349
```

The declaration

```c
int ndigit[10];
```

declares `ndigit` to be an array of 10 integers. Array **subscripts** (a numerical index) **always start at zero in C**, so the elements are `ndigit[0]`, `ndigit[1]`, ..., `ndigit[9]`. This is reflected in the `for` loops that initialize and print the array.

A subscript can be any integer expression, which includes integer variables like `i`, and integer constants.

This particular program relies on the properties of the character representation of the digits. For example, the test

```c
if (c >= ′0′ && c <= ′9′) ...
```

determines whether the character in `c` is a digit. If it is, the numeric value of that digit is

```c
c - ′0′
```

This works only if `′0′`, `′1′`, ..., `′9′` have consecutive increasing values. Fortunately, this is true for all character sets.

By definition, `char`s are just small integers, so `char` variables and constants are identical to `int`s in arithmetic expressions. This is natural and convenient; for example, `c-′0′` is an integer expression with a value between `0` and `9` corresponding to the character `′0′` to `′9′` stored in `c`, and is thus a valid subscript for the array `ndigit`.

The decision as to whether a character is a digit, white space, or something else is made with the sequence

```c
if (c >= ′0′ && c <= ′9′)
    ++ndigit[c-′0′];
else if (c == ′ ′ || c == ′\n′ || c == ′\t′)
    ++nwhite;
else
    ++nother;
```

The pattern
```c
if  (condition1)
         statement1
else if (condition2)
         statement2
...
          ...
else
         statementn
```

occurs frequently in programs as a way to express a multi-way decision. The conditions are evaluated in order from the top until some condition is satisfied; at that point the corresponding statement part is executed, and the entire construction is finished. (Any statement can be several statements enclosed in braces.) If none of the conditions is satisfied, the statement after the final else is executed if it is present. If the final `else` and statement are omitted, as in the `word count` program, no action takes place. There can be any number of

```c
else if (condition)
           statement
```

groups between the initial `if` and the final `else`.

As a matter of style, it is advisable to format this construction as we have shown; if each `if` were indented past the previous `else`, a long sequence of decisions would march off the right side of the page.

The `switch` statement, to be discussed in Chapter 3, provides another way to write a multi-way branch that is particularly suitable when the condition is whether some integer or character expression matches one of a set of constants. For contrast, we will present a switch version of this program in Section 3.4.

### Exercise 1-13

Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.

In [4]:
#include <stdio.h>

#define IN 1
#define OUT 0

/* draw histograms of the words' length */
int main()
{
	int c, i, state, prev_c, length;

	state = OUT;
	prev_c = -1;
	length = 0;

	while ((c = getchar()) != EOF)
	{
		if (c == ' ' || c == '\n' || c == '\t')
		{
			if (prev_c == ' ' || prev_c == '\n' || prev_c == '\t')
				continue;
			for (i = 0; i < length; i++)
				printf("#");
			printf("\n");
			state = OUT;
			length = 0;
		}
		else if (state == OUT)
			state = IN;

		if (state == IN)
			++length;

		prev_c = c;
	}
}

```
./a.out 
good afternoon  Mrs.            Smith
####
#########
####
#####
```

### Exercise 1-14

Write a program to print a histogram of the frequencies of different characters in its input.

In [5]:
#include <stdio.h>

/* plot the frequency of characters */
int main()
{
	int c, i, j;
	int chars[128];

	// initialize the array
	for (i = 0; i < 128; i++)
		chars[i] = 0;

	// count each character occurences
	while ((c = getchar()) != EOF)
		++chars[c];

	// print the results
	for (i = 0; i < 128; i++)
	{
		// print only the occured chars
		int count = chars[i];
		if (count > 0)
		{
            printf("\n");
			// print char
			if (i == '\n')
				printf("\\n");
			else if (i == '\t')
				printf("\\t");
			else if (i == ' ')
				printf("ws");
			else
			{
				putchar(i);
				printf(" ");
			}
			printf(" ");

			// plot histogram
			for (j = 0; j < count; j++)
				printf("#");
		}
	}
}

```
./a.out 
all you gotta do is to think about tomorrow,
this is your duty and curse.
sorry for that buddy
\n ##
ws ################
,  #
.  #
a  #####
b  ##
c  #
d  #####
e  #
f  #
g  #
h  ###
i  ####
k  #
l  ##
m  #
n  ##
o  ###########
r  #######
s  #####
t  ##########
u  ######
w  #
y  #####
```

## 1.7 Functions

In C, a **function** is equivalent to a subroutine or function in Fortran, or a procedure or function in Pascal. 

A function provides a convenient way to encapsulate some computation, which can then be used without worrying about its implementation. With properly designed functions, it is possible to ignore how a job is done; knowing what is done is sufficient. 

C makes the use of functions easy, convenient and efficient; you will often see a short function defined and called only once, just because it clarifies some piece of code.

So far we have used only functions like `printf`, `getchar`, and `putchar` that have been provided for us; now it’s time to write a few of our own. Since C has no exponentiation operator like the `**` of Fortran, let us illustrate the mechanics of function definition by writing a function `power(m,n)` to raise an integer `m` to a positive integer power `n`. That is, the value of `power(2,5)` is `32`. This function is not a practical exponentiation routine, since it handles only positive powers of small integers, but it’s good enough for illustration. (The standard library contains a function `pow(x,y)` that computes $x^y$.)

Here is the function `power` and a `main` program to exercise it, so you can see the whole structure at once.

In [6]:
#include <stdio.h>

int power(int m, int n);

/* test power function */
int main()
{
	int i;

	for (i = 0; i < 10; i++)
		printf("%d: %5d %7d\n", i, power(2, i), power(-3, i));
	return 0;
}


/* power: raise base to n-th power; n >=0 */
int power(int base, int n)
{
	int i, p;

	p = 1;
	for (i = 0; i <= n; i++)
		p *= base;

	return p;
}

0:     2      -3
1:     4       9
2:     8     -27
3:    16      81
4:    32    -243
5:    64     729
6:   128   -2187
7:   256    6561
8:   512  -19683
9:  1024   59049


A **function definition** has this form:

```
return-type function-name(parameter declarations, if any)
{
    declarations
    statements
}
```

Function definitions can appear in any order, and in one source file or several, although no function can be split between files (i.e. the entire definition of a function must be contained within a single file). If the source program appears in several files, you may have to say more to compile and load it than if it all appears in one, but that is an operating system matter, not a language attribute. For the moment, we will assume that both functions are in the same file, so whatever you have learned about running C programs will still work.

The function power is called twice by `main`, in the line

```c
printf("%d %d %d\n", i, power(2,i), power(-3,i));
```

Each call passes two arguments to `power`, which each time returns an integer to be formatted and printed. In an expression, `power(2,i)` is an integer just as 2 and i are. (Not all functions produce an integer value; we will take this up in Chapter 4.)
The first line of power itself,

```c
int power(int base, int n)
```
    
declares the parameter types and names, and the type of the result that the function returns. The names used by `power` for its parameters are local to `power`, and are not visible to any other function: other routines can use the same names without conflict. This is also true of the variables `i` and `p`: the `i` in `power` is unrelated to the `i` in `main`.

We will generally use parameter for a variable named in the parenthesized list in a function definition, and argument for the value used in a call of the function. The terms **formal argument** and **actual argument** are sometimes used for the same distinction.

The value that `power` computes is returned to `main` by the **return statement**. Any expression may follow `return`:

```c
return expression;
```

A function need not return a value; a return statement with no expression causes control, but no useful value, to be returned to the caller, as does “falling off the end” of a function by reaching the terminating right brace. And the calling function can ignore a value returned by a function.

You may have noticed that there is a return statement at the end of `main`. Since `main` is a function like any other, it may return a value to its **caller**, which is in effect the environment in which the program was executed. Typically, 
- a return value of `zero` implies normal termination; 
- non-zero values signal unusual or erroneous termination conditions. 

In the interests of simplicity, we have omitted return statements from our `main` functions up to this point, but we will include them hereafter, as a reminder that programs should return status to their environment.

The declaration

```c
int power(int m, int n);
```

just before `main` says that power is a function that expects two `int` arguments and returns an `int`. This declaration, which is called a **function prototype**, has to agree with the definition and uses of `power`. It is an error if the definition of a function or any uses of it do not agree with its prototype.

**Parameter names need not agree**. Indeed, parameter names are optional in a function prototype, so for the prototype we could have written

```c
int power(int, int);
```

Well-chosen names are good documentation, however, so we will often use them.

> A note of history: The biggest change between ANSI C and earlier versions is how functions are declared and defined. In the original definition of C, the `power` function would have been written like this:

```c
/* power:  raise base to n-th power; n >= 0 */
/* (old-style version) */
power(base, n)
int base, n;
{
    int i, p;

    p = 1;
    for (i = 1; i <= n; ++i)
        p = p * base;
    return p;
}
```

The parameters are named between the parentheses, and their types are declared before the opening left brace; undeclared parameters are taken as `int`. (The body of the function is the same as before.)

The declaration of `power` at the beginning of the program would have looked like this:

```c
int power();
```

No parameter list was permitted, so the compiler could not readily check that `power` was being called correctly. Indeed, since by default `power` would have been assumed to return an `int`, the entire declaration might well have been omitted.

The new syntax of function prototypes makes it much easier for a compiler to detect errors in the number of arguments or their types. The old style of declaration and definition still works in ANSI C, at least for a transition period, but we strongly recommend that you use the new form when you have a compiler that supports it.

### Exercise 1-15

Rewrite the temperature conversion program of [Section 1.2](#section_1_2) to use a function for conversion.

In [7]:
#include <stdio.h>

float celsius_to_fahr(float celsius);

/*
 print Celsius-Fahrenheit table
   for celsius = -40, -30, ..., 140
*/
int main(void)
{
	float celsius;
	float lower, upper, step;

	lower = -40;		// lower limit of temperature scale
	upper = 140;	// upper limit
	step = 20;		// step size

	celsius = lower;
	printf("Celsius: Fahrenheit:\n");
	while (celsius <= upper)
	{
		printf("%5.0f\t%9.0f\n", celsius, celsius_to_fahr(celsius));
		celsius += step;
	}
}


float celsius_to_fahr(float celsius)
{
	return celsius * 9.0 / 5.0 + 32.0;
}

Celsius: Fahrenheit:
  -40	      -40
  -20	       -4
    0	       32
   20	       68
   40	      104
   60	      140
   80	      176
  100	      212
  120	      248
  140	      284


## 1.8 Arguments - <b>Call by Value</b>

One aspect of C functions may be unfamiliar to programmers who are used to some other languages, particularly Fortran. In C, all function arguments are passed **“by value.”** This means that 

> the called function is given the values of its arguments in temporary variables rather than the originals. 

This leads to some different properties than are seen with **“call by reference”** languages like Fortran or with var parameters in Pascal, in which the called routine has access to the original argument, not a local copy.

The main distinction is that 

> in C the called function cannot directly alter a variable in the calling function; it can only alter its private, temporary copy.

Call by value is an asset, however, not a liability. It usually leads to more compact programs with fewer extraneous variables, because parameters can be treated as conveniently initialized local variables in the called routine. For example, here is a version of `power` that makes use of this property:

In [8]:
#include <stdio.h>

int power(int, int);

/* power: raise base to n-th power; n>=0; version 2 */
int main(void)
{
    int base, n;

    base = 3;
    n = 2;

	printf("%d\n", power(3, 2));
    printf("%d, %d\n", base, n);
}


int power(int base, int n)
{
	int p;

	for (p = 1; n > 0; n--)
		p *= base;
	return p;
}

9
3, 2


The parameter `n` is used as a temporary variable, and is counted down (a `for` loop that runs backwards) until it becomes zero; there is no longer a need for the variable `i`. Whatever is done to `n` inside `power` has no effect on the argument that `power` was originally called with.

When necessary, it is possible to arrange for a function to modify a variable in a calling routine. The caller must provide the **address** of the variable to be set (technically a **pointer** to the variable), and the called function must declare the parameter to be a pointer and access the variable indirectly through it. We will cover pointers in Chapter 5.

The story is different for arrays. When the name of an array is used as an argument, _the value passed to the function is the location or address of the beginning of the array_ — there is no copying of array elements. By subscripting (indexing) this value, the function can access and alter any element of the array. This is the topic of the next section.

## 1.9 Character Arrays

The most common type of array in C is the **array of characters**. To illustrate the use of character arrays and functions to manipulate them, let’s write a program that reads a set of text lines and prints the longest. The outline is simple enough:

```
while  (there’s another line)
    if  (it’s longer than the previous longest)
        save it
        save its length
print longest line
```

This outline makes it clear that the program divides naturally into pieces:
- one piece gets a new line, 
- another tests it, 
- another saves it, and 
- the rest controls the process.

Since things divide so nicely, it would be well to write them that way too. Accordingly, let us first write a separate function `getline` to fetch the next line of input. We will try to make the function useful in other contexts. At the minimum, `getline` has to return a signal about possible end of file; a more useful design would be to return the length of the line, or zero if end of file is encountered. Zero is an acceptable end-of-file return because it is never a valid line length. Every text line has at least one character; even a line containing only a newline has length `1`.

When we find a line that is longer than the previous longest line, it must be saved somewhere. This suggests a second function, `copy`, to copy the new line to a safe place.

Finally, we need a `main` program to control `getline` and `copy`. Here is the result:

In [9]:
#include <stdio.h>

#define MAXLINE 1000  // maximum input line size

int get_line(char line[], int maxline);
void copy(char to[], char from[]);

/* print longest input line */
int main(void)
{
	int len; 	// current line length
	int max; 	// maximum length seen so far
	char line[MAXLINE];		// current input line
	char longest[MAXLINE];	// longest line saved here

	max = 0;
	while ((len = get_line(line, MAXLINE)) > 0)
		if (len > max)
		{
			max = len;
			copy(longest, line);
		}
	if (max > 0) 	// there was a line
	{
		printf("\nThe longest line is:\n");
		printf("%s", longest);
	}
	return 0;
}


/* get line: read a line into s, return length */
int get_line(char s[], int lim)
{
	int c, i;

	for (i = 0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
		s[i] = c;
	if (c == '\n')
	{
		s[i] = '\n';
		++i;
	}
	s[i] = '\0';
	return i;
}


/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
	int i;

	i = 0;
	while ((to[i] = from[i]) != '\0')
		++i;
}

```
./a.out 
my name is Vadim
hi
hello world

The longest line is:
my name is Vadim
```

The functions `getline` and `copy` are declared at the beginning of the program, which we assume is contained in one file.

`main` and `getline` communicate through a pair of arguments and a returned value. In `getline`, the arguments are declared by the line

```c
int getline(char s[], int lim);
```

which specifies that the first argument, `s`, is an array, and the second, `lim`, is an integer. The purpose of supplying the size of an array in a declaration is to set aside storage. The length of the array `s` is not necessary in `getline` since its size is set in `main`. 

`getline` uses `return` to send a value back to the caller, just as the function `power` did. This line also declares that `getline` returns an `int`; since `int` is the default return type, it could be omitted (_not true anymore_).

Some functions return a useful value; others, like `copy`, are used only for their effect and return no value. The return type of copy is `void`, which states explicitly that no value is returned.

`getline` puts the character `′\0′` (the IInull characterII, whose value is zero) at the end of the array it is creating, to mark the end of the string of characters. This convention is also used by the C language: when a string constant like

```c
   "hello\n"
```

appears in a C program, it is stored as an array of characters containing the characters of the string and terminated with a `′\0′` to mark the end.

The `%s` format specification in `printf` expects the corresponding argument to be a string represented in this form. `copy` also relies on the fact that its input argument is terminated by `′\0′`, and it copies this character into the output argument. (All of this implies that `′\0′` is not a part of normal text.)

It is worth mentioning in passing that even a program as small as this one presents some sticky design problems. For example, what should `main` do if it encounters a line which is bigger than its `limit`? `getline` works safely, in that it stops collecting when the array is full, even if no newline has been seen. By testing the length and the last character returned, `main` can determine whether the line was too long, and then cope as it wishes. In the interests of brevity, we have ignored the issue.

There is no way for a user of `getline` to know in advance how long an input line might be, so `getline` checks for overflow. On the other hand, the user of `copy` already knows (or can find out) how big the strings are, so we have chosen not to add error checking to it.

### Exercise 1-16

Revise the `main` routine of the `longest-line` program so it will correctly print
- the length of arbitrarily long input lines, and 
- as much as possible of the text.

In [10]:
#include <stdio.h>

#define MAXLINE 1000  // maximum input line size

int get_line(char line[], int maxline);
void copy(char to[], char from[]);

/* print longest input line */
int main(void)
{
	int len; 	// current line length
	int max; 	// maximum length seen so far
	int total_len;	// total input len
	char line[MAXLINE];		// current input line
	char text[MAXLINE*MAXLINE];		// all the text
	int position;	// the indexer for copyting lines

	max = total_len = position = 0;
	while ((len = get_line(line, MAXLINE)) > 0)
	{
		if (len > max)
		{
			max = len;
		}
		for (int i = 0; (text[position] = line[i]) != '\0'; ++i)
			++position;
		printf("The line's length is: %d\n", len);
		total_len += len;
	}

	printf("\nThe user typed this text: \n%s\n", text);
	printf("The length of the longest line is: %d\n", max);
	printf("The total input length is: %d\n", total_len);

	return 0;
}


/* get line: read a line into s, return length */
int get_line(char s[], int lim)
{
	int c, i;

	for (i = 0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
		s[i] = c;
	if (c == '\n')
	{
		s[i] = '\n';
		++i;
	}
	s[i] = '\0';
	return i;
}


The user typed this text: 

The length of the longest line is: 0
The total input length is: 0


```
./a.out
Gutten Abend Genosse!
The line's length is: 22
I am so pleased to see ya'll here today
The line's length is: 40
The Victory in near!
The line's length is: 21
No pasaran! Rot Front!
The line's length is: 23

The user typed this text: 
Gutten Abend Genosse!
I am so pleased to see ya'll here today
The Victory in near!
No pasaran! Rot Front!

The length of the longest line is: 40
The total input length is: 106
```

```
valgrind ./a.out 

==64359== Memcheck, a memory error detector
==64359== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==64359== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==64359== Command: ./a.out
==64359== 
hello comrades
The line's length is: 15
this is the first time we meet
The line's length is: 31
and we have to discuss very important questions of our fight with the imperialism         
The line's length is: 82
victory is near!
The line's length is: 17

The user typed this text: 
hello comrades
this is the first time we meet
and we have to discuss very important questions of our fight with the imperialism
victory is near!

The length of the longest line is: 82
The total input length is: 145
==64359== 
==64359== HEAP SUMMARY:
==64359==     in use at exit: 0 bytes in 0 blocks
==64359==   total heap usage: 2 allocs, 2 frees, 2,048 bytes allocated
==64359== 
==64359== All heap blocks were freed -- no leaks are possible
==64359== 
==64359== For lists of detected and suppressed errors, rerun with: -s
==64359== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
```

### Exercise 1-17

Write a program to print all input lines that are longer than 80 characters.

In [11]:
#include <stdio.h>

#define MAXLINE 1000
#define MIN 80

int get_line(char line[], int maxline);
void copy(char to[], char from[]);
int line_to_text(char to[], char from[], int carret_position);

/* print longest input line */
int main(void)
{
	int len; 	// current line length
	char line[MAXLINE];		// current input line
	char text[MAXLINE*MAXLINE];
	int position;   // byte carret for text[]

	position = 0;
	while ((len = get_line(line, MAXLINE)) > 0)
		if (len > MIN)
			position = line_to_text(text, line, position);

	printf("\nThe lines that are longer than %d characters:\n", MIN);
	printf("%s\n", text);

	return 0;
}


/* get line: read a line into s, return length */
int get_line(char s[], int lim)
{
	int c, i;

	for (i = 0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
		s[i] = c;
	if (c == '\n')
	{
		s[i] = '\n';
		++i;
	}
	s[i] = '\0';
	return i;
}


int line_to_text(char to[], char from[], int position)
{
	int i;

	for (i = 0; (to[position] = from[i]) != '\0'; ++i)
		++position;

	return position;
}


The lines that are longer than 80 characters:



```
./a.out 
"The stars winked at her, beckoning her to a distant galaxy." 
"As the alien mothership descended from the skies, its sheer size and otherworldly presence sent shivers down their spines, fearing an impending invasion." 
"The android, with its advanced artificial intelligence, contemplated the meaning of existence and pondered its own mortality, defying its programming."

The lines that are longer than 80 characters:
"As the alien mothership descended from the skies, its sheer size and otherworldly presence sent shivers down their spines, fearing an impending invasion." 
"The android, with its advanced artificial intelligence, contemplated the meaning of existence and pondered its own mortality, defying its programming."
```

### Exercise 1-18

Write a program 
- to remove trailing blanks and tabs from each line of input, and 
- to delete entirely blank lines.

In [2]:
#include <stdio.h>

#define MAXLINE 1000

int get_line(char line[], int maxline);
void copy(char to[], char from[]);

/* print longest input line */
int main(void)
{
	int len; 	// current line length
	char line[MAXLINE];		// current input line

	while ((len = get_line(line, MAXLINE)) > 0)
		printf("%s\n", line);

	return 0;
}


/* get line: read a line into s, return length */
int get_line(char s[], int lim)
{
	int c, i;

	// populate s
	for (i = 0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
		s[i] = c;

	// remove blanks and tabs
	while (i >= 0) {
		--i;
		if (s[i] != ' ' && s[i] != '\t' && s[i] != EOF)
			break;
	}
	++i;

//	if (c == '\n')
//	{
//		s[i] = '\n';
//		++i;
//	}

	s[i] = '\0';

	return i;
}


int line_to_text(char to[], char from[], int position)
{
	int i;

	for (i = 0; (to[position] = from[i]) != '\0'; ++i)
		++position;

	return position;
}

```sh
$ ./a.out 
hello    
hello
world             
world
endend
```

### Exercise 1-19

Write a function `reverse(s)` that reverses the character string `s`. Use it to write a program that reverses its input a line at a time.

In [14]:
#include <stdio.h>

#define MAXLINE 1000  // maximum input line size


int get_line(char line[], int maxline);
void copy(char to[], char from[]);
int reverse(char line[], int maxline);


/* print longest input line */
int main(void)
{
	int len; 	// current line length
	char line[MAXLINE];		// current input line

	while ((len = reverse(line, MAXLINE)) > 0)
		printf("%s\n", line);

	return 0;
}


/* get line: read a line into s, return length */
int get_line(char s[], int lim)
{
	int c, i;

	for (i = 0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
		s[i] = c;
	if (c == '\n')
	{
		s[i] = '\n';
		++i;
	}
	s[i] = '\0';
	return i;
}


/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
	int i;

	i = 0;
	while ((to[i] = from[i]) != '\0')
		++i;
}


int reverse(char s[], int lim)
{
	int len, i, j;
	len = get_line(s, lim);

	char reversed_s[len];	// tmp string

	j = 0;
	for (i = len-1; i >= 0; --i) {
		if (s[i] != '\n') {
			reversed_s[j] = s[i];
			++j;
		}
	}
	reversed_s[j] = '\0';

	copy(s, reversed_s);

	return len;
}

```
./a.out 
hello world
dlrow olleh
my name is Vadim
midaV si eman ym
I made some fun program
margorp nuf emos edam I
```

## 1.10 `extern`

### External Variables and Scope

The variables in `main`, such as `line`, `longest`, etc., are **private** or **local** to main. Because they are declared within `main`, no other function can have direct access to them. The same is true of the variables in other functions; for example, the variable `i` in `getline` is unrelated to the `i` in `copy`. Each local variable in a function comes into existence only when the function is called, and disappears when the function is exited. This is why such variables are usually known as **automatic variables**, following terminology in other languages. We will use the term automatic henceforth to refer to these local variables. (**Chapter 4** discusses the `static` storage class, in which local variables do retain their values between calls.)

Because automatic variables come and go with function invocation, they do not retain their values from one call to the next, and must be explicitly set upon each entry. If they are not set, they will contain **garbage**.

As an alternative to automatic variables, it is possible to define variables that are external to all functions, that is, variables that can be accessed by name by any function. (This mechanism is rather like Fortran `COMMON` or Pascal variables declared in the outermost block.) Because **external variables** are globally accessible, they can be used instead of argument lists to communicate data between functions. Furthermore, because external variables remain in existence permanently, rather than appearing and disappearing as functions are called and exited, they retain their values even after the functions that set them have returned.

An external variable must be 
- **defined**, exactly once, outside of any function; this sets aside storage for it. 
- **declared** in each function that wants to access it; this states the type of the variable. The **declaration** may be an explicit `extern` statement or may be implicit from context. 

To make the discussion concrete, let us rewrite the `longest-line` program with `line`, `longest`, and `max` as external variables. This requires changing the calls, declarations, and bodies of all three functions:

In [1]:
#include <stdio.h>

#define MAXLINE 1000	/* maximum input line size */

int max;	/* maximum length seen so far */
char line[MAXLINE]; 	/* current input line */
char longest[MAXLINE];	/* longest line saved here */

int get_line(void);
void copy(void);


/* print longest input line; specialized version */
int main()
{
	int len;
	extern int max;
	extern char longest[];

	max = 0;
	while ((len = get_line()) > 0)
		if (len > max) {
			max = len;
			copy();
		}
	if (max > 0) /* there was a line */
		printf("%s", longest);
	return 0;
}


/* get_line: specialized version */
int get_line(void)
{
	int c, i;
	extern char line[];

	for (i = 0; i < MAXLINE-1
	  && (c = getchar()) != EOF && c != '\n'; ++i)
		line[i] = c;
	if (c == '\n') {
		line[i] = '\n';
		++i;
	}
	line[i] = '\0';
	return i;
}


/* copy: specialized version */
void copy(void)
{
	int i;
	extern char line[], longest[];

	i = 0;
	while ((longest[i] = line[i]) != '\0')
		++i;
}

The external variables in `main`, `getline`, and `copy` are defined by the first lines of the example above, which state their type and cause storage to be allocated for them. Syntactically, external definitions are just like definitions of local variables, but since they occur outside of functions, the variables are external. Before a function can use an external variable, the name of the variable must be made known to the function. One way to do this is to write an `extern` declaration in the function; the declaration is the same as before except for the added keyword `extern`.

In certain circumstances, the `extern` declaration can be omitted. 

If the definition of an external variable occurs in the source file before its use in a particular function, then there is no need for an extern declaration in the function. The `extern` declarations in `main`, `getline` and `copy` are thus redundant. In fact, common practice is to place definitions of all external variables at the beginning of the source file, and then omit all `extern` declarations:

In [1]:
#include <stdio.h>

#define MAXLINE 1000	/* maximum input line size */

int max;	/* maximum length seen so far */
char line[MAXLINE]; 	/* current input line */
char longest[MAXLINE];	/* longest line saved here */

int get_line(void);
void copy(void);


/* print longest input line; specialized version */
int main()
{
	int len;

	max = 0;
	while ((len = get_line()) > 0)
		if (len > max) {
			max = len;
			copy();
		}
	if (max > 0) /* there was a line */
		printf("%s", longest);
	return 0;
}


/* get_line: specialized version */
int get_line(void)
{
	int c, i;

	for (i = 0; i < MAXLINE-1 && (c = getchar()) != EOF && c != '\n'; ++i)
		line[i] = c;
	if (c == '\n') {
		line[i] = '\n';
		++i;
	}
	line[i] = '\0';
	return i;
}


/* copy: specialized version */
void copy(void)
{
	int i;

	i = 0;
	while ((longest[i] = line[i]) != '\0')
		++i;
}

If the program is in several source files, and a variable is defined in `file1` and used in `file2` and `file3`, then `extern` declarations are needed in `file2` and `file3` to connect the occurrences of the variable. 

> The usual practice is to collect `extern` declarations of variables and functions in a separate file, historically called a `header`, that is included by `#include` at the front of each source file. 

The suffix `.h` is conventional for header names. The functions of the standard library, for example, are declared in headers like `<stdio.h>`. This topic is discussed at length in **Chapter 4**, and the library itself in **Chapter 7** and **Appendix B**.

Since the specialized versions of `getline` and `copy` have no arguments, logic would suggest that their prototypes at the beginning of the file should be `getline()` and `copy()`. But for compatibility with older C programs the standard takes an empty list as an old-style declaration, and turns off all argument list checking; the word `void` must be used for an explicitly empty list. We will discuss this further in **Chapter 4**.

You should note that we are using the words _definition_ and _declaration_ carefully when we refer to external variables in this section:
- “definition” refers to the place where the variable is created or assigned storage; 
- “declaration” refers to places where the nature of the variable is stated but no storage is allocated.

By the way, there is a tendency to make everything in sight an `extern` variable because it appears to simplify communications — argument lists are short and variables are always there when you want them. But external variables are always there even when you don’t want them. Relying too heavily on external variables is fraught with peril since it leads to programs whose data connections are not at all obvious — variables can be changed in unexpected and even inadvertent ways, and the program is hard to modify. The second version of the `longest-line` program is inferior to the first, partly for these reasons, and partly because it destroys the generality of two useful functions by wiring into them the names of the variables they manipulate.

At this point we have covered what might be called the conventional core of C. With this handful of building blocks, it’s possible to write useful programs of considerable size, and it would probably be a good idea if you paused long enough to do so. These exercises suggest programs of somewhat greater complexity than the ones earlier in this chapter.

### `extern` in modern C

_ChatGPT_:  
In modern C programming, the use of `extern` variables is less common compared to its historical usage. With the evolution of best practices and coding standards, the use of global variables (which `extern` variables often correspond to) has been discouraged in favor of **encapsulation** and **modularity**.

**Global variables** can lead to issues such as 
- namespace pollution, 
- difficulty in tracking changes, and
- potential for unintended modifications from multiple parts of the program. 

Instead, modern C programming often emphasizes the use of **local variables** to the greatest extent possible, and using techniques like 
- encapsulation, 
- data hiding, and 
- passing data explicitly between functions.

However, there are still situations where `extern` variables may be used, such as when 
- interfacing with legacy code, 
- defining constant values to be shared across multiple files, or when 
- working with low-level libraries and hardware registers.

Overall, while `extern` variables are still a valid feature of the C language, their usage is generally less frequent in modern C programming practices.

### Encapsulation and modularity

_ChatGPT_:  
Encapsulation and modularity are both principles in software design that aim to improve the organization, ease of maintenance, and reusability of code. While they share similarities, there are differences between the two concepts.

**Encapsulation** refers to the bundling of data (variables) and methods (functions) that operate on the data into a single unit, known as a class in object-oriented programming. This allows for better control over the accessibility of the data and operations, as well as hiding the internal implementation details from the outside world. This can be achieved in C using structures or in modern C using the concept of **opaque pointers**, where data members are hidden from direct access and are accessed only through specific functions or methods.

**Modularity**, on the other hand, is the concept of breaking down the software into smaller, self-contained units (modules) that can be developed, tested, and maintained independently. Each module typically has a well-defined interface through which it communicates with other modules, allowing for easier understanding, maintenance, and reuse of code. In modern C, modularity can be achieved through the use of header files and separating code into different source files, each containing related functions and data structures.

In the context of modern C, encapsulation is more about controlling access to the data and implementation details within a module, while modularity is about breaking down the software into smaller, reusable components that can be developed and maintained independently. Both principles are important for creating clear, maintainable, and efficient software systems.

### Opaque pointers

_ChatGPT:_  
In modern C programming, **opaque pointers** (непрозрачные указатели) are a technique used to hide the details of a data structure or object from the user. This is done by defining a pointer to a data type without revealing the actual structure of the type. This allows the implementation details to be hidden, making the code more secure and maintainable.

Opaque pointers are commonly associated with the use of abstract data types and are often used in libraries and APIs to provide a level of encapsulation and information hiding. By using opaque pointers, developers can prevent direct access to the internal data of a type, enforcing encapsulation and modularity in their programs. This can also help prevent unintended manipulation and potential errors.

Suppose we have a library that provides a data structure called "List" which we want to keep private and only allow access to it through functions provided by the library. To achieve this, we can use opaque pointers.

Here's an example of how this can be implemented:

```c
// list.h
typedef struct ListImpl List;

List* create_list();
void add_to_list(List* list, int value);
void remove_from_list(List* list, int value);
void free_list(List* list);
```

```c
// list.c
#include "list.h"
#include <stdlib.h>

struct ListImpl {
  int data[100];
  int size;
};

List* create_list() {
  List* list = (List*)malloc(sizeof(List));
  list->size = 0;
  return list;
}

void add_to_list(List* list, int value) {
  list->data[list->size] = value;
  list->size++;
}

void remove_from_list(List* list, int value) {
  // implementation for removing from list
}

void free_list(List* list) {
  free(list);
}
```

In this example, the structure `ListImpl` is defined internally in `list.c`, where the implementation details are hidden from the user of the library. In `list.h`, we declare an opaque pointer `List` without revealing its internal structure. The user can only access the `List` type through the functions provided by the library, such as `create_list`, `add_to_list`, and `remove_from_list`.

By using opaque pointers in this way, the internal details of the `List` data structure are hidden, providing encapsulation and information hiding. This prevents direct access to the internal data of the `List`, improving the maintainability and security of the library.

### Exercise 1-20

Write a program `detab` that replaces tabs in the input with the proper number of blanks to space to the next tab stop. Assume a fixed set of tab stops, say every `n` columns. Should `n` be a variable or a symbolic parameter?

In [None]:
#include <stdio.h>

#define MAXLINE 1000
#define TAB 4

int get_line(char line[], int limit);
void detab(char new_line[], char old_line[], int tab_length);


int main()
{
	char raw_line[MAXLINE], correct_line[MAXLINE];
	int len, i;

	for (i = 0; (len=get_line(raw_line, MAXLINE)) > 0; ++i)
		detab(correct_line, raw_line, TAB);

	/* I want the result on new line in case of EOF without '\n' */
	if (len == 0)
		printf("\n");
	printf("%s\n", correct_line);

	return 0;
}


int get_line(char line[], int limit)
{
	int c, i;

	for (i = 0; i<limit-1 && (c = getchar())!='\n' && c != EOF; ++i)
		line[i] = c;
	line[i] = '\0';

	return i;
}


void detab(char new_line[], char old_line[], int tab_length)
{
	int c, i, j, k;  /* j counts chars for new_line, k counts spaces */

	j = 0;
	for (i = 0; (c = old_line[i])!= EOF && c!='\0'; ++i) {
		if (c == '\t') {
			for (k = 0; k < tab_length; ++k) {
				new_line[j] = ' ';
				++j;
			}
			continue;
		}
		new_line[j] = c;
		++j;
	}
	new_line[j] = '\0';
}

```
./a.out
Input: (a\ta  \ta\t  )
a       a       a         a

Output: (a4a6a6)
a    a      a      a
```

### Exercise 1-21

Write a program `entab` that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. Use the same tab stops as for `detab`. When either a tab or a single blank would suffice to reach a tab stop, which should be given preference?

I use hyphens instead of blanks to clearly see what is happening:

In [None]:
#include <stdio.h>

#define TAB 4
#define MAXLINE 1000

int get_line(char line[], int limit);
void detab(char new_line[], char old_line[], int tab_length);
void entab (char new_line[], char old_line[], int tab_lenght);


int main()
{
	char raw_line[MAXLINE], correct_line[MAXLINE];
	int len, i;

	for (i = 0; (len=get_line(raw_line, MAXLINE)) > 0; ++i)
		entab(correct_line, raw_line, TAB);

	/* I want the result on new line in case of EOF without '\n'*/
	if (len == 0)
		printf("\n");

	printf("%s\n", correct_line);

	return 0;
}


int get_line(char line[], int limit)
{
	int c, i;

	for (i = 0; i<limit-1 && (c = getchar())!='\n' && c != EOF; ++i)
		line[i] = c;
	line[i] = '\0';

	return i;
}


void entab (char new_line[], char old_line[], int tab_length)
{
	int c, prev_c;    /* current and previous chars */
    int	i, j;    /* i is for old_line and j is for new_line */
    int	count;    /* count blanks */

	prev_c = -1;  /* -1 is certainly is not in ASCII table */
	j = 0;
	count = 1;

	for (i = 0; (c=old_line[i])!=EOF && c!='\0'; ++i) {
		if (c == '-' && prev_c == '-') {
			++count;
			if (count == tab_length) {
				j -= tab_length - 1;		/* indexing starts from 0 */
				c = '\t';
				count = 1;
			}
		}
		else
			count = 1;
		new_line[j] = c;
		prev_c = c;
		++j;
	}
	new_line[j] = '\0';
}

```
$ ./a.out
-----a----a
        -a      a
```

Bash terminal uses 8 spaces for a tab and sometimes it behaves in a very strange manner, for example, it can use 3 or even 2 spaces instead of 8 in the middle of the string, like here:

```
$ ./a.out
----a------b--c----d--e
        a       --b--c  d--e
```
or 
```
$ ./a.out
----a--b----c--d----e
        a--b    c--d    e
```
but:
```
$ ./a.out
---a----b----c----d----e
        a       b       c       d       e
```

_ChatGPT:_  
Tab spacing can be a problem in computing, particularly when it comes to displaying and formatting text. The use of tabs for indentation and spacing can lead to inconsistencies in the way text is displayed, especially when different programs or systems interpret tab characters differently.

To mitigate these issues, many coding style guides and best practices recommend using spaces for indentation and spacing rather than tabs. This helps ensure consistent and uniform formatting across different platforms and text editors.

Using spaces for indentation also helps maintain the readability and consistency of code, as it reduces the risk of formatting issues when sharing code with others or when working on different systems.

While the use of tabs versus spaces is a longstanding debate in the programming community, the guidance to use spaces for indentation is aimed at improving the readability and maintainability of code in different environments.

### Exercise 1-22

Write a program to “fold” long input lines into two or more shorter lines after the last non-blank character that occurs before the `n`-th column of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.

In [1]:
#include <stdio.h>

#define TAB 4
#define MAXLINE 1000
#define NCOLUMN 40

int get_line(char line[], int limit);
void detab(char new_line[], char old_line[], int tab_length);
void wrap_line(char new_line[], char old_line[], int line_length, int n_column);


int main()
{
	int len;
	char line[MAXLINE], detabbed_line[MAXLINE*2], wrapped_line[MAXLINE*2];

	while ((len=get_line(line, MAXLINE)) > 0) {
		detab(detabbed_line, line, TAB);
		wrap_line(wrapped_line, detabbed_line, len, NCOLUMN);
		printf("%s\n", wrapped_line);
	}
	return 0;
}


int get_line(char line[], int limit)
{
	int c, i;

	for (i = 0; i<limit-1 && (c = getchar())!='\n' && c != EOF; ++i)
		line[i] = c;
	line[i] = '\0';

	return i;
}


void detab(char new_line[], char old_line[], int tab_length)
{
	int c, i, j, k;  /* j counts chars for new_line, k counts spaces */

	j = 0;
	for (i = 0; (c = old_line[i])!= EOF && c!='\0'; ++i) {
		if (c == '\t') {
			for (k = 0; k < tab_length; ++k) {
				new_line[j] = ' ';
				++j;
			}
			continue;
		}
		new_line[j] = c;
		++j;
	}
	new_line[j] = '\0';
}


void wrap_line(char new_line[], char old_line[], int line_length, int n_column)
{
	int c, end_c;  /* current char and end char of the short line */
	int i, j;  /* counters for old and new lines respectively */
	int line_end;  /* helps to find the end of the short line */
	int keeper;  /* keeps the last end_line value */
	int problem_line;  /* the flag indicates there are no blanks in the short line */

	keeper = line_end = n_column;
	problem_line = 0;
	j = 0;  /* normally j == i, but we will need it in case of problem lines */

	/* loop through old line */
	for (i = 0; (c=old_line[i])!='\0' && c!=EOF; ++i, ++j) {
		/* find the end of new line, this is the blank or null terminator */
		if (!problem_line)
			while ((end_c=old_line[line_end]) != ' ' && end_c!='\0')
				--line_end;
		/* in case the short line does not contain blanks */
		if (line_end == -1) {
			line_end = keeper;  /* restore the line_end */
			problem_line = 1;  /* mark the problem */
		}
		/* put \n at the end of the current line_end */
		if (i == line_end) {
			new_line[j] = '\n';
			/* emergency plan for a problem line */
			if (problem_line) {
				--j;
				new_line[j] = '-';
				++j;
				i -= 2;  /* we had to substitute 2 non blank chars with '-'
							and \n */
				problem_line = 0;
			}
			keeper = line_end += n_column-1;
			continue;
		}
		new_line[j] = c;
	}
	/* finish the whole expression */
	new_line[j] = '\0';
}

```
./a.out 

Input: 
Tab spacing can be a problem in computing, particularly when it comes to displaying and formatting text. The use of tabs for indentation and spacing can lead to inconsistencies in the way text is displayed, especially when different programs or systems interpret tab characters differently.

Output:
Tab spacing can be a problem in
computing, particularly when it comes
to displaying and formatting text. The
use of tabs for indentation and
spacing can lead to inconsistencies in
the way text is displayed, especially
when different programs or systems
interpret tab characters differently.

./a.out 

Input: 
Tabspacingcanbeaproblemincomputing,particularlywhenitcomestodisplayingandformattingtext.Theuseoftabsforindentationandspacingcanleadtoinconsistenciesin the way text is displayed, especially when different programs or systems interpret tab characters differently.

Output:
Tabspacingcanbeaproblemincomputing,part-
icularlywhenitcomestodisplayingandforma-
ttingtext.Theuseoftabsforindentationand-
spacingcanleadtoinconsistenciesin the
way text is displayed, especially when
different programs or systems
interpret tab characters differently.

./a.out 

Input: 
Tab spacing     can be a problem in computing, particularly when it comes to displaying         and formatting text. The use of tabs for indentation and spacing can lead to inconsistencies in the way text is displayed,                         especially when different programs or systems interpret tab characters differently.

Output:
Tab spacing    can be a problem in
computing, particularly when it comes
to displaying        and formatting
text. The use of tabs for indentation
and spacing can lead to
inconsistencies in the way text is
displayed,             especially when
different programs or systems
interpret tab characters differently.
```

### Exercise 1-23

Write a program to remove all comments from a C program. Don’t forget to handle quoted strings and character constants properly. C comments do not nest.

### Exercise 1-24

Write a program to check a C program for rudimentary syntax errors like unbalanced parentheses, brackets and braces. Don’t forget about quotes, both single and double, escape sequences, and comments. (This program is hard if you do it in full generality.)