# The Linux Programming Interface
- Kerrisk, Michael. **The Linux Programming Interface**. 2010. No Starch Press.
- obsidian://open?vault=obsidian&file=DevOps%2FLinux%2Fbook.The%20Linux%20Programming%20Interface

parts:
- Background and concepts: 1-3
- Fundamental features of the system programming interface: 4-12
- More advanced features of the system programming interface: 13-23
- Processes, programs, and threads: 24-33
- Advanced process and program topics: 34-42
- Interprocess communication (IPC): 43-55
- Sockets and network programming: 56-61
- Advanced I/O topics: 62-64

# Codes

In [None]:
# 工作目录
# with Make: /mnt/d/workspace/github/workbench/OS/tlpi-make
%cd /mnt/d/workspace/github/workbench/OS/tlpi-cmake

/mnt/d/workspace/github/workbench/OS/tlpi-cmake


In [None]:
!rm -rf ./build && cmake -B build
!cd build && make

-- The C compiler identification is GNU 11.2.0
-- The CXX compiler identification is GNU 11.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/d/workspace/github/workbench/OS/tlpi-cmake/build
[  4%] [32mBuilding C object src/lib/CMakeFiles/lib.dir/error_functions.c.o[0m
[  8%] [32mBuilding C object src/lib/CMakeFiles/lib.dir/get_num.c.o[0m
[ 13%] [32m[1mLinking C static library liblib.a[0m
[ 13%] Built target lib
[ 17%] [32mBuilding C object test/CMakeFiles/get_libc_version.dir/gnu/get_libc_version.c.o[0m
[ 21%] [32m[1mLinking C executable get_libc

In [None]:
# 清理
!rm -rf ./build

## Tests

need to use CMake tests.

In [None]:
!cd build/test && ./get_libc_version

GNU libc version: 2.35


In [None]:
!cd build/test && ./test_lib

blockSize=1234


# 1. History and Standards
# 2. Fundamental Concepts 
# 3. System Programming Concepts

# 4. File I/O: The Universal I/O Model
# 5. File I/O: Further Details 

# 6. Processes

Figure 6-1: Typical memory layout of a process on Linux/x86-32

<img src="./images/Linux X86-32 Process Memory Layout.png"/>

# 7. Memory Allocation
# 8. Users and Groups
# 9. Process Credentials 
# 10. Time
# 11. System Limits and Options
# 12. System and Process Information

# 13. File I/O Buffering
# 14. File Systems 
# 15. File Attributes 
# 16. Extended Attributes 
# 17. Access Control Lists
# 18. Directories and Links 
# 19. Monitoring File Events 
# 20. Signals: Fundamental Concepts 
# 21. Signals: Signal Handlers
# 22. Signals: Advanced Features 
# 23. Timers and Sleeping

# 24. Process Creation
# 25. Process Termination
# 26. Monitoring Child Processes 
# 27. Program Execution
# 28. Process Creation and Program Execution in More Detail
# 29. Threads: Introduction
# 30. Threads: Thread Synchronization
# 31. Threads: Thread Safety and Per-Thread Storage
# 32. Threads: Thread Cancellation
# 33. Threads: Further Details 

# 34. Process Groups, Sessions, and Job Control 
# 35. Process Priorities and Scheduling 
# 36. Process Resources 
# 37. Daemons 
# 38. Writing Secure Privileged Programs 
# 39. Capabilities 
# 40. Login Accounting
# 41. Fundamentals of Shared Libraries 
# 42. Advanced Features of Shared Libraries

# 43. Interprocess Communication Overview 

IPC设施分类:
- communication
  - data transfer
    - byte stream: pipe, FIFO, stream socket
    - pseudoterminal
    - message: System V message queue, POSIX message queue, datagram socket
  - shared memory
    - System V shared memory
    - POSIX shared memory
    - memory mapping: anonymous mapping, mapped file
- synchronization
  - semaphore
    - System V semaphore
    - POSIX semaphore: named, unnamed
  - file lock
    - record lock: `fcntl()`
    - file lock: `flock()`
  - mutex(threads)
  - condition variable(threads)
- signal
  - standard signal
  - realtime signal

# 44. Pipes and FIFOs
# 45. Introduction to System V IPC
# 46. System V Message Queues 
# 47. System V Semaphores 
# 48. System V Shared Memory
# 49. Memory Mappings 
# 50. Virtual Memory Operations
# 51. Introduction to POSIX IPC
# 52. POSIX Message Queues 
# 53. POSIX Semaphores 
# 54. POSIX Shared Memory
# 55. File Locking

# 56. Sockets: Introduction 
# 57. Sockets: UNIX Domain 
# 58. Sockets: Fundamentals of TCP/IP Networks
# 59. Sockets: Internet Domains 
# 60. Sockets: Server Design 
# 61. Sockets: Advanced Topics

- Partial Reads and Writes on Stream Sockets
- The `shutdown()` System Call
- Socket-Specific I/O System Calls: `recv()` and `send()`
- The `sendfile()` System Call 
- Retrieving Socket Addresses 
- A Closer Look at TCP
- Monitoring Sockets: `netstat`
- Using `tcpdump` to Monitor TCP Traffic
- Socket Options
- The `SO_REUSEADDR` Socket Option
- Inheritance of Flags and Options Across `accept()`
- TCP Versus UDP 
- Advanced Features
  - Out-of-Band Data
  - The `sendmsg()` and `recvmsg()` System Calls
  - Passing File Descriptors
  - Receiving Sender Credentials
  - Sequenced-Packet Sockets
  - SCTP and DCCP Transport-Layer Protocols 

## The `sendfile()` System Call

例: 读磁盘文件写入socket

`sendfile`之间将文件内容传输到socket, 避免了经过用户空间. 零拷贝传输(zero-copy transfer).

```c
#include <sys/sendfile.h>

// Returns number of bytes transferred, or –1 on error
ssize_t sendfile(
  int out_fd,        // 到, 必须是一个socket
  int in_fd,         // 从, 必须是可以使用`mmap()`的文件
  off_t *offset,     // in_fd中的偏移量
  size_t count       // 传输的字节数量
);
```

从Linux 2.6.17, 新的系统调用: `spilce()`, `vmsplice()`, `tee()`

# 62. Terminals 


# 63. Alternative I/O Models

- I/O multiplexing: `select()`, `poll`
- signal-driven I/O
- Linux `epoll` API

> One I/O model that we don’t describe in this chapter is POSIX asynchronous
I/O (AIO).
>
> Currently, Linux provides a threads-based implementation of POSIX AIO within glibc. At the time of writing, work is ongoing toward providing an in-kernel implementation of POSIX AIO, which should provide better scaling performance. POSIX AIO is described in [Gallmeister, 1995] and [Robbins & Robbins, 2003].

blokcing IO model

nonblocking IO model: `O_NONBLOCK`
- apply to: pipes, FIFOs, sockets, ternimals, pseudoterminals, other types of devices
- poll

multiple processes/threads


alternatives: 同时监控一个或多个文件描述符, 查看它们是否准备好执行IO. 文件描述符切换到准备好状态是由某种IO事件触发的. 这些方法只是告知有文件描述符处于准备好状态, 但不执行实际的IO.
- IO multiplexing: IO多路复用, `select()`, `poll()`
  - 可移植的
  - 在监控大量文件描述符时(几百或几千), 性能不好
- Signal-driven IO: 信号驱动的IO
  - 可以高效的监控大量文件描述符
- `epoll`: Linux 2.6
  - 可以高效的监控大量文件描述符
  - 可以指定监控的内容: 读就绪, 写就绪等
  - 可以选择水平触发或边缘触发的通知

[libevent](https://libevent.org/)

> The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. Furthermore, libevent also support callbacks due to signals or regular timeouts.
>
> Currently, libevent supports /dev/poll, kqueue(2), event ports, POSIX select(2), Windows select(), poll(2), and epoll(4)

文件描述符就绪通知的两种模型:
- 水平触发的通知: level-triggered notification
  - 如果文件描述符可以不阻塞的执行一个IO系统调用, 认为它已就绪.
  - IO模型: `select()`, `poll()`, `epoll`
  - 可以随时检查描述符的就绪状态: 不需要尽可能多的执行IO.
- 边缘触发的通知: edge-triggered notification
  - 在上次监控之后, 文件描述符上有新的IO活动, 发出通知.
  - IO模型: signal-driven IO, `epoll`
  - 只在IO事件发生时接收到通知, 在另一个IO事件发生前不会接收通知; 通知时不知道有多少IO可以执行: 收到通知时尽可能多的执行IO, 将文件描述符设置为非阻塞模式, 重复执行IO直到相应的系统调用以`EAGAIN`或`EWOULDBLOCK`错误失败.

## IO Multiplexing

```c
#include <sys/time.h> /* For portability */
#include <sys/select.h>

// Returns number of ready file descriptors, 0 on timeout, or –1 on error 
int select(int nfds,           // 3个文件描述符集中最大文件描述号 + 1
  fd_set *readfds,             // 读文件描述符集
  fd_set *writefds,            // 写文件描述符集
  fd_set *exceptfds,           // 异常文件描述符集
  struct timeval *timeout      // 阻塞select的时间上限
);

// 文件描述符集的操作
// FD_SETSIZE: 文件描述符集的最大大小, Linux 1024
void FD_ZERO(fd_set *fdset);         // 空
void FD_SET(int fd, fd_set *fdset);  // 添加fd到fdset中
void FD_CLR(int fd, fd_set *fdset);  // 从fdset中移除fd
int FD_ISSET(int fd, fd_set *fdset); // Returns true (1) if fd is in fdset, or false (0) otherwise

struct timeval {
  time_t tv_sec; /* Seconds */
  suseconds_t tv_usec; /* Microseconds (long int) */
};
```

```c
#include <poll.h>

// Returns number of ready file descriptors, 0 on timeout, or –1 on error
int poll(
  struct pollfd fds[],  // 监控的文件描述符数组
  nfds_t nfds,          // fds中项的数量
  int timeout           // 阻塞poll的时间上限
  );

struct pollfd {
  int fd;         /* File descriptor */
  short events;   /* Requested events bit mask */
  short revents;  /* Returned events bit mask */
};
```

`select()`和`poll()`告知IO操作是否不会被阻塞, 而不是IO操作是否可以成功传输数据.

在Linux中, `select()`和`poll()`均使用`poll` routine实现:
- 每个routine返回单个文件描述符就绪状态的信息: `poll()`中`revents`字段中的位掩码.

存在的问题:
- Linux内核不会记住多次调用中的要监控的文件描述符列表.
- 而signal-driven IO和`epoll`允许内核中记录某个进程感兴趣的文件描述符列表.

## Signal-Driven IO

- 信号: `SIGIO`
- 打开文件状态标记: `O_ASYNC | O_NONBLOCK`

## The `epoll` API

epoll: event poll

数据结构: epoll instance
- 通过一个打开的文件描述符fd引用, 该fd不用于IO, 而是一个内核数据结构的句柄:
  - the interest list: 某个进程声明的要监控的文件描述符列表 - 感兴趣的列表.
  - the ready list: 感兴趣列表中就绪的文件描述符列表 - 就绪列表.

3个系统调用:
- `epoll_create()`: 创建epoll instance, 返回其文件描述符.
- `epoll_ctl()`: 操作epoll instance中的感兴趣列表, 例如添加新的fd, 移除已有的fd, 修改fd监控事件掩码.
- `epoll_wait()`: 返回epoll instance中就绪列表.

```c
#include <sys/epoll.h>

// Returns file descriptor on success, or –1 on error
int epoll_create(int size); // 从2.6.8开始size参数不再需要


// Returns 0 on success, or –1 on error
int epoll_ctl(int epfd,    // epoll instance的文件句柄
  int op,                  // 修改的操作: EPOLL_CTL_ADD, EPOLL_CTL_MOD, EPOLL_CTL_DEL
  int fd,                  // 感兴趣列表中要修改的文件描述符
  struct epoll_event *ev   // op=EPOLL_CTL_MOD时, fd上的感兴趣事件设置
);

struct epoll_event {
  uint32_t events;   /* epoll events (bit mask) */
  epoll_data_t data; /* User data */ // 提供了找到与事件相关的文件描述符的唯一机制: 
                                     // fd, 或者ptr指向包含文件描述符的结构
};
typedef union epoll_data {
  void *ptr;         /* Pointer to user-defined data */
  int fd;            /* File descriptor */
  uint32_t u32;      /* 32-bit integer */
  uint64_t u64;      /* 64-bit integer */
} epoll_data_t;

// Returns number of ready file descriptors, 0 on timeout, or –1 on error
int epoll_wait(int epfd,       // epoll instance的文件句柄
  struct epoll_event *evlist,  // 就绪的文件描述符的信息 - 由调用者分配内存
  int maxevents,               // evlist中元素数量
  int timeout                  // 阻塞epoll_wait的时间上限
);
```

limit: `max_user_watches`
- 含义: 每个用户可以在所有`epoll`感兴趣列表中注册的文件描述符总数
- 原因: `epoll`感兴趣列表中注册的每个文件描述符需要一些不可交换出的(nonswappable)内核内存.

Table 63-8: Bit-mask values for the epoll `events` field

`epoll`默认提供水平触发的通知.

`epoll`使用边缘触发的通知: `epoll_ct()`的`env.ents`中使用`EPOLLET`标志.

边缘触发可能产生的问题: 文件描述符饥饿
- 原因: 使用边缘触发需要尽可能多的执行IO, 在一个文件描述符上执行大量的IO会导致下次`epoll_wait()`检查延迟.
- 解决方法: 定一个应用的文件描述符列表(应用fd列表)
  - `epoll_wait()`发现就绪的文件描述符时, 加入应用fd列表, 设置下次`epoll_wait()`检查较短的超时时间.
  - 在应用fd列表中执行受限数量的IO(可以是轮询方式). 在相应的非阻塞IO系统调用以`EAGAIN`或`EWOULDBLOCK`错误失败时, 将fd从应用fd列表中移除.

## Waiting on Signals and File Descriptors

同时等待信号和就绪的文件描述符

系统调用: `pselect()`

```c
#define _XOPEN_SOURCE 600
#include <sys/select.h>

// Returns number of ready file descriptors, 0 on timeout, or –1 on error
int pselect(int nfds, 
  fd_set *readfds, 
  fd_set *writefds, 
  fd_set *exceptfds,
  struct timespec *timeout, 
  const sigset_t *sigmask    // 信号掩码: 阻塞信号传递
);
```


# 64. Pseudoterminals