

# Generelt 
- **Kontor** ADA-216

# Virtual memory 
- An **operating system** from the programmers point of views adds a variety of new instructions beyond the ISA  level 
    - These instructions is called **system calls** which invokes a predefined operating system service, effectively, one of its instructions e.g. reading data from a file 
    - The Operating System Level (OSL) is always interpreted 


- **Virtual memory** is a smart way to use memory which is not available 
    - The words of memory and memory are separated so that 4096 words but they does not need to correspond to memory location 0 to 4096
![memory_mapping](img/memory_mapping.png)
    - If the memory is mapped between 4096 to 8191 and the program branches to memory location between 8192 and 12287 and the machine has virtual memory the following happens (**paging**)
        1. The contents of the main memory is saved on disk
        2. Words 8192 to 12287 would be located on disk.
        3. Words 8192 to 12287 would be loaded into main memory.
        4. The address map would be changed to map addresses 8192 to 12287 onto memory locations 0 to 4095.
        5. Execution would continue as though nothing unusual had happened. 
    - The chunks of program read in from disk are called **pages**
    - The addresses that a program refer to is called **virtual address space**
    - The actual physical memory locations are called **physical address space** 
    - A **memory map** or **page table** specifies for each virtual address what the corresponding physical address is 
    - The paging mechanism is said to be **transparent** because the programming does not need to know that it exists.

## Implementation
- The virtual disk space is broken up into a number of equal size pages
    - Page sizes range from 512 to 64 KB pr page 
    - The page size is always the power of two, for example $2^k$, so that all addresses can be represented in $k$ bits
    - The physical address space is broken into pieces in a similar way as the virtual ones
    - The pieces of main memory into which the pages goes are called **page frames**
        - Typically thousands exists 
        
        
- The device for doing virtual-to-physical mapping is called **MMU (Memory Management Unit).**
    - may be on the CPU chip or a seperate chip
    - To see if an page-table entry is currently in memory the MMU checks the **present/absent bit** in the page table entry


- When a reference is made to an address on a page not present in main memory, it is called a **page fault**.
    - After it has occurred the operating system must 
        - read the required page from disk
        - enter its new physical memory location in the page table
        - then repeat the instruction that caused the fault.
    - In **demand paging**, a page is brought into memory only when a request for it occurs, not in advance.
    - The **locality principle** is that references tend to cluster on a small number of pages.
    - The **working set** is a set at any given time consisting of all the pages used by the k most recent memory references

## Page-Replacement Policy
- When fetching a page an algorithm needs to decide which page to be sent back to disk
    - The **LRU (Least Recently Used)** algorithm evicts the page least recently
    - **FIFO (First-In First-Out)** removes the least recently loaded page 
    - A program that generates page faults frequently and continuously is said to be thrashing.
    - Many operating systems has a bit which tells if the page has been written to since it was loaded 


- The problem of wasted bytes when only some of the pages are full are called **internal fragmentation**

## Segmentation
- A straightforward solution to overflowing memory is to provide many completely independent address spaces, called **segments**.
    - Each segment consists of a linear sequence of addresses, from 0 to some maximum.
    - The length of a stack segment may be increased and decrease when the stack changes 
    - Different segments can grow and shrink independently without affecting each other
    - It is very rare that a segment is filled up because it is very large
    - To specify a segment the program must supply a two-part address: a segment number, and an address within the segment.
![paging_vs_segmentation](img/paging_vs_segmentation.png)
  

- Segmentation can be implemented in one of two ways: swapping and paging 
    - Segment swapping is not unlike demand paging: segment come and and segments go as needed 
        - Implementation differs from paging because segments differs in size and paging does not
        - **External fragmentation** is the phenomenon where after the computer has been running after some time the memory is containing segments and some containing holes
        - **Compacting** is moving all the segments together to remove the external fragmentation
    - Paging is dividing each segment up into fixed-size pages and demand paging them.
    
        

# Threads 
- A **program** can contain multiple threads
    - Each thread has a lifetime extending from the first instruction executes to the last one
    - If two threads has overlapping life time it is said that they are concurrent 

- Command to compile C program with threads: `gcc -pthread program.c -o program.out`


- Simple multi threaded program in C

```c
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>

static void * child(void *ignored){
    sleep(3);
    printf("Child is done sleeping 3 seconds.\n");
    return NULL;
}
int main(int argc, char *argv[]){
    pthread_t child_thread;
    int code;
    code = pthread_create(&child_thread, NULL, child, NULL);
    if(code){
        fprintf(stderr, "pthread_create failed with code %d\n", code);
    }
    sleep(5);
    printf("Parent is done sleeping 5 seconds.\n");
    return 0;
}
```



## Switching Between Threads
- To switch between different threads the processor needs to store information about that thread in memory
    - Saves the program counter, the contents of the registers and the stack pointer for higher order languages 
    - A block of memory containing information about a thread is called a **thread control block** or **task control block** (TCB)


- The registers can be pushed onto the stack before thread switching so thread switching is done as follows
```
    push each register on the (outgoing thread’s) stack
    store the stack pointer into outgoing->SP
    load the stack pointer from next->SP
    store label L’s address into outgoing->IP
    load in next->IP and jump to that address
L:
    pop each register from the (resumed outgoing thread’s) stack
```

- Thread switching is often called **context switching**

## Preemptive Multitasking
- **Cooperative multitasking** is where each thread themselves contains explicit code at each point a thread switch should occur
- **Preemptive multitasking** is where the threads code do not contain explicit code, where the thread switch should occur 
    - It can still be useful for a thread to voluntarily give way to the other threads, normally called yield 
        - The pthreads api uses`sched_yield()`
    - The `ret` instruction used in the Linux code for thread switching.
- When an I/O device or timer needs attention an **interrupt** occurs and the processor jumps off to the special procedure called the **interrupt handler**
    - Is part of the operating system and deal with the hardware device
    - When it is done it executes a return from interrupt instruction, which jumps back to the instruction which it interrupted
    - It needs to handler needs to save all the registers at the start and restore them before returning.
    - Can be used by an operating system to provide preemptive multitasking.
        - If the interrupt was from the timer and the current thread had been executing for a long time, it may be needed to switch to another thread

# Scheduling
- The **scheduler** chooses which thread to run at each time 
- Checking if an desired event has occurred is called **busy waiting**
    - Is a bad-idea
- The operating system keeps track of which threads are waiting and which can usefully run 
    - The system does this by storing runnable threads in a **run queue** and the waiting in **waiting queues** one per reason for waiting 
        - The threads which are waiting for a desired amount of time to elapse is stored in time order
    - The scheduler only considers threads in the run queue
    - One of the services of the interrupt handler is do determine that a waiting thread becomes runnable 
- A thread can be in one of the following states 
    - Runnable, awaiting dispatch by the scheduler
    - Running on a processor
    - Waiting for some event    
![scheduling_states](img/scheduling_states.png)    

## Scheduling goals 
- **Throughput** is the rate which useful work is accomplished.
    - Example measure of throughput could be the number of search transactions per second 
    - Only by using all the I/O devices efficiently can a scheduler maximize throughput.
    - Switching between threads more often than necessary will reduce throughput 
    - The thread runs best on the same processor, called **processor affinity**, because of cache therefore scheduling the same thread on the same core maximize throughput
    - If a processor needs data from another processors cache it uses the systems **cache coherence protocol**
        - Typically means first transferring the data from the old cache to the main memory and then transferring it from the main memory to the new cache
        
        
- **Response Time** is the elapsed time from a triggering event to to a completed response 
    - A high performance system in throughput might be low performance in response time and vice versa 
    - System intended for direct interaction with a user tend to be optimized for response time whereas servers are optimized for throughput
    - Often involves trade-offs between responding to different interactions 
        - can be done by responding to the interaction which takes the smallest amount of time to complete 
            - Called **Shortest Job First** (SJF)
            - Normally a operating system does not know how much processor time each thread need to respond it often guess based on previous threads or based on previous burst
            - **Burst** is the amount of processing done between waits for external events.
        - can also be done by frequently switching between threads 

- A task is **urgent** if it needs to be done soon
- **Importance** indicate how much is at stake in accomplishing a task in a timely fashion.
- **Resource Allocation** is how the resources are allocated between task
    - Is a matter of fairness
    - **Proportional-share scheduling** balances the processing time given to threads over a much shorter time scale, such as a second.
        - The idea is to focus on the runnable threads and how much time the user has allocated to them 
    - The niceness of a thread is akin to low priority. 
        - Niceness on linux is interpret of the amount of processor time
        - Niceness on OSX is interpret as thread with high niceness only get processor time when the processor is idle
![scheduling_goals](img/scheduling_goals.png)

## Scheduling Mechanisms 
![scheduling_mechanisms_goals](img/scheduling_mechanisms_goals.png)
![scheduling_mechanisms](img/scheduling_mechanisms.png)

### Fixed-Priority Scheduling
- Threads with higher priority runs before the onces with lower priority
- A thread cannot change priority 
- In a fixed-priority scheduler the run queue can be kept in a data-structure ordered by priority 
    - Typically represented as an array where the first entry contains a list of threads with highest priority the second entry contains a list of threads with the next highest priority, and so forth.
    - Whenever a processor becomes idle because a thread has terminated or entered a waiting state, the scheduler dispatches a runnable thread of highest available priority.
    - The processor also compares priorities if a thread becomes runnable 
        - If the new thread has higher priority than the running thread the scheduler performs a thread switch 
    - To deal with ties there are two possible solutions
        - Run the thread that become runnable first until it waits for some event of voluntarily yield the processor (first in, first out (FIFO))
        - Share the processor between the threads that are tied in a **round-robin** fashion. (RR) 
____        
        
- Not viable in a general purpose system 
- More suited for am environment where all the threads are part of a carefully quality-controlled system design.
- Two key theorems make it easy to analyze a periodic hard-real-time system under fixed-priority scheduling:
    - If the threads meet their deadlines under any fixed priority assignment, then they will do so under an assignment that prioritizes threads with shorter periods over those with longer periods. This policy is known as **rate-monotonic scheduling.**
    - To check that the deadlines are met it suffices to check the worst cast scenario where the threads start at the same time 
___    
    
- To test the feasibility of a real-time schedule, it is conventional to use a **Gantt chart**.
    - is a bar, which represent the passage of time, divided into regions labeled to show what thread is running during the corresponding  time interval.
    - can be used to check whether a rate-monotonic fixed priority schedule will work for a given set of threads

### Dynamic Priority Scheduling
- In **Earliest Deadline First Scheduling** each time a thread becomes runnable the priorities according to the following rule: the sooner a threads dealing the higher its priority
- **Decay Scheduling** priority downward from the base priority by an amount that reflects recent processor usage by that thread.
    - The user-specified priorities can serve as base priorities, which the operating system will use as a starting point for its automatic adjustments.
        - Most of the time the users will use the default thread priorities for all their threads and threads only differ in priority because of the 
        - The threads that are a tie after automatic adjustment are processed in a round robin fashion
    - The time that each thread are allowed to run before switching is called a **quantum**
        - The priority will not sink below some minimum value
        - If the thread has been running for a while it has a low priority 
        - If the thread has not run for a long time its priority will be equal to the base priority 
    - The thread’s recent processor usage increases when the thread runs and **decays** when the thread waits
        - The currently running thread has its usage updated whenever it voluntarily yields the processor        
![decay_osx](img/decay_osx.png)        

### Proportional-Share Scheduling
- When resource allocation is the primary user-goal 
- Researchers have proposed three basic mechanisms for controlling the rate at which threads are granted processor time:
    1. Each thread can be granted the use of the processor equally often 
        - Just as in simple round robin
        - Those who have larger allocation and granted larger time slice
        - Known as **Weighted Round Robin Scheduling** (WRR)
    2. A uniform time slice can be used for all threads.
        - Those with larger allocations are run more often
        - The smaller allocation sit out on some rotations through the list
        - Several names are used
            - Weighted Fair Queuing (WFQ),
            - Stride Scheduling
            - Virtual Time Round-Robin Scheduling (VTRR).
    3. A uniform time slice can be used for all threads.
        - Those with larger allocations are chosen to be run more often 
        - The threads are selected by a lottery with weighted odds
        - This is not terribly practical
        - Known as **lottery scheduling**
        
___        
- The Linux system uses **Completely Fair Scheduler** which always run the thread which is behind on virtual runtime
    - Virtual runtime is calculated based on the niceness of a thread and multiplied with the scale
    - If a thread has been non-runnable for a certain amount of time its virtual runtime is set forward so as to be only slightly less than the minimum virtual runtime of any of the previously runnable threads.
    - The run queue is kept sorted in order of the runnable threads’ virtual runtimes.
        - Represented as a red-black search three
    - The thread scheduler switches threads if one of the two following things happend
        - A threads time slice has expired
        - A new thread enters the run queue 

## Security 
- The kind of attack most relevant to scheduling is the denial of service (DoS) attack
    - Is an attack with the goal of preventing legitimate users of a system to be able to use it 
    - e.g. Given an urgent thread a low priority 
    - Because of systems with e.g. decay scheduling, an attacker must run many concurrent threads in order to drain off a significant fraction of the processor’s time.
    - A limit number of threads per users will constrain denial of service attacks without causing most users much hardship.

# Processes and Protection
- Most mainstream systems definition of a **process** are based on definitions that include the following:
    - **One or more threads**
        - Is often closely associated with one thread
        - Some programs are designed to divide work between multiple threads
    - **Virtual memory accessible to those threads**
        - The mainstream protection approach is for each process to have its own virtual memory address shared by threads within that process
        - The access rights are assigned to the process, not to the individual threads.
    - **Other access rights**
        - The process can either hold a specific **capability** (e.g. writing to a file) or a **credential** such as the identification of the user for whom the process is running 
    - **Resource allocation context**
        - Limited resources are associated with a process for two reasons
            - The process’s termination may serve to implicitly release some of the resources it is holding
            - The process may be associated with a limited resource quota
    - **Miscellaneous context**
        - The operating system tracks a single current working directory per process
        - 

## POSIX Process Management API
- All operating systems provide mechanisms for creating new processes, terminating existing processes, and performing related actions.
- In the POSIX approach each process is identified by a **process ID number** which is a positive integer 
- Each process comes into existence through the forking of a parent process.
    - Except the first process that is started when the operating system starts running 
    - A process forks off a new process whenever one of the threads running in the parent procedure calls the `fork` procedure
    - In the parent process the call to `fork` returns returns the process ID number of the child process
        - May be important to the parent if it want to exert some control over the child later or find out if the child terminates
    - The child process is in many regards a copy of the parent process
        - For protection purpose it has the same credentials as the parent 
        - The child has the the same capabilities for such purposes as access to files that have been opened for reading or writing
        - The child contains a copy of the parents address space
            - The operating system does not need to copy each page of address space it just copies on write (COW)
    - The child starts by calling the `fork` procedure
        - Fork returns a value of 0 in the child.
        - The normal programming pattern is for any fork operation to be immediately followed by an if statement that checks the return value from fork.
            - That way the programming code can change behavior if it is a child 
        - Failure is signaled by a negative value
        - Used in Linux to start new programs


- In order to wait for a child process the parent can use the `waitpid` procedure which takes three arguments
    - The first argument is the process id of the child for which the parent should wait
    - The two other arguments can be 0 if the parent should just wait for the just process to finish
    - If the child process has exited before the parent the `waitpid` command does not wait
    - The operating system retains information about the terminated process until the parent waits for it.
    - A terminated process that has not yet been waited for is known as a zombie.
        - Waiting for a zombie is known as reaping the zombie.
        

- A program file can have a special set user ID (`setuid`) set on it 
    - The process that executes it acquires the credential of the file’s owner.
    - The setuid mechanism provides an extremely general mechanism for granting access rights.

### Exec family
- The POSIX standard includes six different procedures, any one of which can be used to load in a new program and start it running.
    - They are commonly called the **exec family** because they have names starting with exec, 
    - Each member must be given enough information to find the new program stored in a file and to provide the program with any arguments and environment variables it needs.
    - The family members differ in exactly how the calling program provides this information.
    - Because the family members are so closely related, most systems define only the `execve` procedure in the kernel of the operating system itself
    - Only return if an error occurs, because if all is well the new program starts running without the possibility of reaching the old program


- `execl` is one of the simpler members of the exec family
    - The first argument specifies where the program is located e.g. `/bin/ps`
    - The remaining string are the command line arguments  e.g. `ps` and `axl`
    - An inconvenience about `execl` is that the location of the program file is needed 


- `execlp` does not need to know the location of the program file
    - can be given a filename and will search through the directories to find the program
    - if used in combination with work in a program a new command can be executed

# Synchronization
- In a **race** two threads use the same data structure without any mechanisms to ensure only one threads use the data structure at a time.
    - Can be avoided using **locks**
- **Mutual exclusion** is when a thread temporarily excludes other threads when running on data structures 

## Mutexes and Monitors
- A programmer can arrange for exclusive access to a data structure by using a lock object associated with it 
    - Only one thread can lock it at a time
    - When a thread uses a lock, it **holds** the locks 
- To support race prevention, operating systems and middleware generally provide **mutual exclusion locks**
    - Often called **Mutex** for short 

### The Mutex API 
- A mutex can be in to states locked or unlocked 
    - Needs to operations for locking and unlocking
    - When a thread use the lock operations on a locked mutex, it waits for the lock to be unlocked 
    - If more than one thread are waiting for a mutex to be unlocked only one can unlock it and the others will wait
    - There may also be operations for checking if a mutex is locked and removing it from memory


- In the POSIX API `my_mutex` can be declared to be a mutex and initialize with the default attributes as follows:
```c
pthread_mutex_t my_mutex;
pthread_mutex_init(&my_mutex, 0);
```
- A thread that wants lock a mutex, operate on the associated data structure and then unlock the mutex would do the following:
```c
pthread_mutex_lock(&my_mutex);
// operate on the protected data structure
pthread_mutex_unlock(&my_mutex);
```
- Destroying a mutex can be done in the following procedure call 
```c
pthread_mutex_destroy(&my_mutex);
```
- POSIX has a couple variants on `pthread_mutex_lock` which can be useful under particular circumstances
    - `pthread_mutex_trylock` will never wait to acquire a mutex but throw an error code immediately if unable to acquire the lock
    - `pthread_mutex_timedlock` allows the programmer to specify the maximum waiting time 
        - If the mutex cannot be acquired within that time the procedure returns an error code 


- Types of mutexes in the POSIX standard
    - `PTHREAD MUTEX DEFAULT` 
        - If a thread tries to lock a mutex that it already holds or unlocks one it does not, all bets are off as to what will happen. 
        - The programmer has the responsibility for this never happening 
        - Different POSIX system may behave differently
    - `PTHREAD MUTEX ERROR CHECK`
        - If a thread tries to lock a mutex that it alreadyholds, or unlock a mutex that it doesn’t hold, the operation returns an error code. 
    - `PTHREAD MUTEX NORMAL`
        - If a thread tries to lock a mutex that it already holds it goes into a deadlock situation waiting for itself to unlock the thread
        - If it tries to unlock a lock it does not hold all bets are off and each POSIX-compliant system is free to respond however it likes.
    - `PTHREAD MUTEX RECURSIVE` 
        - If a thread tries to unlock a mutex it does not hold it returns an error code
        - When a thread tries to lock a mutex it already holds it simple increments a counter and it is allowed to proceed, then when it unlocks it decrements the counter

### Monitors
- In object oriented programming the mutex can be used in a very rigidly structured way:
    - All state variables within an object should be kept private and only accessible by the code within that object
    - Every object should contain a mutex as an additional field 
    - Every method should start by locking that object's mutex and end by unlocking it before returning
    
    
- When the mutex rules are applied it will be impossible for two thread to race an objects state
    - An programming language can follow the mutex rules are the programmer can apply them manually 
    - An object that automatically follows the mutex rules are called a **monitor**.
        - e.g. in pascal using the keyword `monitor` or in java using the keyword `synchronized,`at the beginning of every non-private method

### Underlying Mechanisms for Mutexes
- All modern processor  architectures has at least one instruction that can be used to both change the contents of a memory location and get information about the previous contents of the location
    - These instructions are executed atomically
    - One of them is called the `exchange` operation, which atomically swaps the contents of a register with the contents of a memory location
  
  
- **The basic spinlock**
    - It can be represented as a memory location that contains 1 if the mutex is unlocked and 0 if the mutex is locked
    - The unlock operation can be trivial: to unlock a mutex just store 1 into it
    - The lock operation uses the atomically exchange operation where it swaps a 0 with 0 until it gets a 1
        - Psuedo code
```
        to lock mutex:
            let temp = 0
            repeat
                atomically exchange temp and mutex
            until temp = 1
```
    - Due to cache coherence the basic spin lock is very inefficient when two threads are waiting at the same time
        - To avoid this reads can be used instead an then when the mutex becomes unlock they try to grab it and is refered to as the **Cache-conscious spinlock**
    - A mutex that uses busy waiting is called a **spinlock**


- **The queuing lock**
    - Notifies the operating system that it needs to wait 
    - Notifying that the thread needs to wait requires some overhead 
        - Therefore the relative efficiency of spinlocks and queuing locks depends on the time the lock waits
    - Used for cases where a thread might hold a mutex a long time
    - Has three components 
        - A memory location used to record the mutexs state, 1 for unlocked or 0 for locked
        - A list of threads waiting to acquire the mutex
            - This list allows the scheduler to place the threads in a waiting state 
        - A cache-conscious spinlock, used to protect against races in operations on the mutex ifself 
    - The locked mutex is passed from one thread to another without being unlocked 

## Other Synchronization Patterns

### Bounded Buffers
- Often two threads are linked together in a processing **pipeline**
    - Where one thread (**producer**) produces output used by other thread (**consumer**)
    - Can be done using a a intermediate storage area called a **buffer**, where the producer places results and the consumer retrieve the result
    - If the consumer tries to retrieve a value from and empty buffer it needs to wait for the producer to catch up
    - If using a limited sized buffer (**bounded buffer**) the producer needs to wait if it gets to far ahead
    - Found in the piping feature in UNIX: `ls | tr a-z A-Z`

### Reader/Writers Locks 
- A **readers/writers lock** is much like a mutex excepts that when a thread lcosk the lock, it specifies whether it is planning to do any writing to the protected data structure or only reading
    - The lock operating like a mutex waits until the lock can be acquired
    - Any number of readers can hold the lock at the same time
    - Has a higher overhead than a mutex since a mutex is simpler
    - To avoid starvation of waiting writers some versions of reader/writers locks make new readers wait until after the waiting writers
    
    
- The POSIX standard includes readers/writers locks
    - Used with procedures such as `pthread_rwlock_init`, `pthread_rwlock_rdlock`, `pthread_rwlock_wrlock`, and `pthread_rwlock_unlock`
    - The POSIX standard leaves it up to each individual system how to priorities new readers versus waiting writers
    - The POSIX standard also includes a more specialized form of readers/writers locks specifically associated with files
        - In the POSIX standard, file locks are available through the complex `fcnt` procedure
        - Most UNIX-family OSs also provide a simpler interface, `flock`

### Barriers 
- **Barriers** are most commonly used in programs that do large-scale numerical calculations for scientific or engineering applications
    - May also be used in other application as long as there is a requirement for all threads in a group to finish one phase of the computation before any of the moves on to the next phase
- When a barrier is created, the programming specifies how many threads will be sharing it 
    - Each of the threads completes the first phase of the computations and then invokes the barrier's wait operation
    - When the last thread invokes the wait operation the wait operation immediately returns 
    - When the all the threads are done with the first phase they proceed to the second phase and the barrier is used again and for the remaining phases
    - Barriers are provided as part of POSIX and other widely available APIs

## Condition Variables
- **Condition variables** can be used to help a thread wait until circumstances are appropriate for it to proceed
    - It works in partnership with monitors or with mutexes used in the style of monitors
    - There are two basic operations on a condition variable `wait` and `notify`
    - A thread that finds circumstances that are not to its liking invokes the wait command and goes to sleep until another thread invokes the notify command 
    - In Java each object has a single condition variable automatically associated with it 
        - An objects `wait` method wait on the objects condition variable
        - The `notifyAll` method wakes up all the threads waiting 
    - When calling the `wait` the thread releases the lock so anther threads can use it 
    - The waiting needs to be done inside a while loop to ensure that the invariant is true
    
    
- The POSIX API allows multiple condition variable per mutex
    - They are initialized with `pthread_cond_init` independent of any particular mutex
    - The mutex is passed as an argument to `pthread_cond_wait` with the condition variable being waited on
    - The operations corresponding to `notify` and `notifyAll` are called `pthread_cond_signal` and `pthread_cond_broadcast` without holding corresponding mutex

## Semaphores
- **Semaphores** are another synchronization mechanism with the same generality as monitors with condition variables
    - They are less natural, resulting in more error-prone code 
    - The applications where they are more natural e.g. bounded buffers they result in very succinct clear code 
    - Used before monitors
    - Available in Java as the class `Semaphore` from the package `java.util.concurrent`
    

- A semaphore is essentialy an unsigned integer variable where only three operations are allowed:
    - At the time the semaphore is created, it may be initialized to any nonnegative integer of the programmers choice
    - It may be increased by 1
        - The operation is called either `release`, `up` or `V`
    - It may be decreased by 1
        - The operation is called either `acquire`, `down` or `P`
        - The thread performing an `acquire` operation waits if the value is 0. 
        - Only once another thread has performed a `release` operation to make the value positive does the waiting thread continue with its acquire operation 


- Semaphore can be used as mutexes
    - It should be initialized to one
    - `acquire` is used as the ocking operation and `release` as unlocking
    - can result it some nasty behavior if unlocked 2 times


- Semaphores can be used for keeping track of the available quantity of some resource, such as free spaces or data values in a bounded buffer 
    - Whenever a thread creates a unit of the resource it increases the semaphores
    - Whenever a thread wants to consume a unit of resource it first does an `acquire` on the semaphore
    - It can be used e.g. on a `BoundedBuffer`
    

## Deadlocks
- A **deadlock** exists whenever there is a cycle of threads, each waiting for some resource held by the next under the following defining conditions
    1. Threads hold resources exclusively ("mutual exclusions") 
    2. Threads hold some resources while waiting for additional ones ("wait for")
    3. Resources cannot be removed from threads forcibly ("No preemption")
    4. Threads wait in a circular chain such that each thread holds resources that are requested by the next thread in the chain
- Deadlocks are quite rare even if measures are taken to avoid them

### Resource Ordering (Deadlock prevention)
- The ideal way to cope with deadlocks is preventing them from happening
    - **Deadlock prevention** aims to ensure that at least one of the four defining conditions is not satisfied
- One way of deadlock prevention targets the circular wait situation which is characterized by condition 4 by imposing a linear order in which resources need to be locked 
    - Ordered by memory location 
    - Can only be done if the locks needed are known

### Ex Post Facto (Deadlock detection)
- To detect a deadlock information about who is waiting for whom is needed 
    - Can be done by keeping mutex records not just whether it is locked or unlocked but also which thread it is held by if any and which threads are waiting for it


- With information about who is waiting for whom a **resource allocation graph** can be constructed
    - Threads are represented as squares.
    - Mutexs are represented as circles
    - The arrows shod which mutex each thread is waiting to acquire and which thread each mutex is currently held by
    - If a graph has a cycle it means that the system is deadlocked
    
    
- A system can test for deadlocks periodically or when a thread has waited an unreasonable long time for a lock
    - To test for a deadlock the system uses a standard graph algorithm to check whether the resource allocation graph contains a cycle 
    - Since a mutex out-degree can be greater than one a simple graph search can be used


- When a deadlock is detected, one of the deadlocked threads must be forcible terminated or rolled back to an earlier state to free up mutexes 
    - Ex Post Facto is not commonly used in general purpose operating system because of this, but can be useful in a database system

### Immediate Deadlock Detection
- **Immediate deadlock detection** is intervening at the very moment when the system would otherwise deadlock
    - As long as the deadlock is not allowed to occur the resource allocation graph will remain acyclic


- Each time a thread tries to lock a mutex, the system can act as follows
    - If the mutex is unlocked, lock it and add an edge from the mutex to the thread, to indicate that the thread holds the lock
    - If the mutex is locked follow the chain of edges from it until the chain ends is the end of the chain the same as the thread trying to lock the mutex
        - If not add an edge showing that the thread is waiting for the mutex and put the thread into a waiting state
        - If the end of the chain is the same thread, adding the extra edge would complete the cycle 
            - Do not add the edge and do not put the thread into a waiting state
            - Return an error code from the lock request or throw an exception indicate that the mutex could not be locked because an deadlock would have results


- When a possible deadlock is detected the program release all locks it currently holds and restart 
    - The chance of repeating the response can be reduced by sleeping 
    
- Used in `fcntl` in Linux and Mac OS for file locks

## The Interaction of Synchronization with Scheduling
- Synchronization and scheduling interact with one another 
    - Since the scheduler controls which runnable thread runs on each processor and synchronization actions preformed by the running thread controls which threads are runnable

### Priority Inversion 
- If threads of different priority levels share mutexes or other block synchronization primitives, some minor violations of priority ordering are inevitable
    - A low priority having a mutex while a high priority one waits for it is generally no a big violation because programmers ensure that threads does not holds mutexes for very long


- **Priority Inversion** occurs when a low priority holds a mutex that a high priority thread needs and then when then medium priority thread runs in stead keeping the high-priority from running because the low priority one cannot run
    - A solution is to avoid fixed priority scheduling and use a decay one instead
    - The genuine solution is **priority inheritance**.
        - Any thread that is waiting for a mutex temporarily lends its priority to the thread that holds the mutex
        - A thread that holds mutexes runs with the highest priority among its own priority and those priorities it has been lend by the other threads waiting for the mutex

### The Convoy Phenomenon 
- The queue of threads in a popular mutex is names the **convoy**
    - This causes threads which holds the mutex to runs for a time slice and then stop when trying to acquire the mutex again
    - Creates a long queue where each thread in turn moving from the front of the mutex through a brief period of execution and back to the rear of the queue 
    - This causes to problems two problems
        - The context switching rate goes up. I
            - Instead of one context switch pr time slice there is one per attempt to acquire the popular mutex
            - The overhead of all tge context  switches will affect the throughput
        - The scheduler's policy for choosing which thread to run is subverted 
            - This can be avoided by handling the mutex wait queue in priority order the same way as the run queue 
    - It can be avoided by making all the threads runnable and unlocking the mutex
    - The POSIX stanard API for mutexes requires that one or the other of the two prioritization-preserving approaches to be takes

## Nonblocking Synchronization
- To an unblocking version of a set function can use the compare-and-set instruction meets this need by doing the following two things atomically (Java example: `AtomicReference` in `java.util.concurrent.atomic` package) .:
    1. The instruction determines whether a variable contains a specified value and reports the answer
    2. The instruction sets the variable to a new value, but only if the answer to the preceding question was yes

- It can be done until successful

# Network overview

## Layers
- LANs, IP and TCP are often called **layers**
    - They constitute the Link layer, the Internet work layer and Transport layer respectively
    - Together with the application layer they form the **"four-layer model"** for networks


- The LAN layer is in charge of actual delivering a packets, using LAN supplied addresses 
    - Is often subdivided into the physical layer dealing with the e.g. radio signals and mechanisms involved and above it and abstracted logical LAN layer that describes the digital non-analog operations on packages 
    - If divided gives us a five-layer model
![network_layers.png](img/network_layers.png)

## Data Rate, Throughput and Bandwidth
- Any network connection e.g. the LAN layer has a **data rate**
    - The rate which bits are transmitted
    - The data rate can vary with time in some LANs e.g. Wi-Fi
- **Throughput** refers to the overall effective transmission rate
    - Taking into account into account things like transmission overhead, protocol ineffectiveness and perhaps even competing traffic 
    - Is generally measured a network layer than data rate
- **Bandwidth** can refer to either data rate or throughput
    - Mostly used for data rate (in the book)
- The term **goodput** is used to refer to the "application layer throughput" when talking about TCP 
    - The amount of usable data being delivered to the receiving application 
- Data rates are generally calculated in kilobits per second (Kbps) or megabits per second (Mbps); the use of lowercase "b" denotes bits
    - In the context of data rates, a kilobit is $10^3$ bits (not $2^{10}$ ) and a megabit is $10^6$ bits

## Packets
- Packets are a modest-sized buffer, transmitted as a unit through some shared set of links 
    - Packets needs to be prefixed with a **header** containing delivery information
    - The the common case known as **datagram forwarding**, the header contains a destination **address**
    - Headers in networks using **virtual-circuit** forward contain an indentifier for the connection
    - Almost all networking today are packet based 

- At the LAN layer, packets can be viewed as the imposition of a buffer (and addressing) structure on top of low-level serial lines
    - Additional layers then impose additional structure
    - Packets are often referred to as **frames** at the LAN layer and as **segments** at the Transport layer
    
- The max packet size supported by a given LAN is an intrinsic attribute of that LAN
    - Ethernet allows a maximum of 1500 bytes of data
- Each layer adds its own header


- In datagram-forwarding networks the appropriate header will contain the address of the destination and perhaps other delivery information
    - Internal nodes of the network called **routers** or **switches** try to ensure that the packet are delivered to the requested destination

## Datagram Forwarding 
- In the datagram-forwarding model of packets delivery, packet headers contain a destination address. 
    - It is up to the intervening switches or routers to look at this address and get the packets to the correct destination
    - It used both by Ethernet switches and by IP routers l
    
    
- Delivering is achieved by providing each switch with a forwarding table of $\langle$destination,next_hop$\rangle$ pairs. 
    - When a packet arrived the switch looks up the destination address in its forwarding table and finds the **next_hop** information: 
        - the immediate-neighbor address to which the packet should be forwarded to bring it closer to the destination
    - The next_hop value in the forwarding table is a single entry; each switch is only responsible for a single step in the switch bath 
    - The destination entries in the forwarding table do not have to correspond exactly with the packet destination address
        - They do for ethernet datagram forwarding 
        - In IP routing, the table destination correspond to **prefixes** of IP addresses 
    - The fundamental requirement is that a switch can determine the next hop using its forwarding table and destination address in the arriving packet
    - Is also called **stateless** forwarding 
    - IP routers commonly as a **default** entry matching any nonlocal IP addresses 
    
    
- The fundamental alternative is **virtual circuts**
    - Each router maintains state about each connection passing through it,

# Ethernet

## 10-Mbps Classic Ethernet
- The Ethernet originally consisted of a long cable and when a station transmitted data went everywhere along the cable 
    - Is known as a **broadcast bus** 
    - All packets were on the physical layer, broadcast onto a shared medium and could be seen by all other nodes 
    - The **network interface** took care of the details of transmitting, receiving and deciding which packet should be forwarded to the host via CPU interrupt
        - Made it appear logically as peer-to-peer
        
- If to stations transmitted as the same time both signals would **collide** and fail as a result
    - In order to minimize collision loss, each station would implement the following
        1. Before the transmission wait for the line to become quite 
        2. While transmitting continually monitor the line for signs that a collision has occurred; if a collision is detected cease transmitting 
        3. If a collision occurs, use backoff-and-retransmit strategy 
    - The collision avoidance properties can be summarized with the **CSMA/CD** acronym: Carrier Sense, Multiple Access, Collision Detect.


- Errors can occur
    - when packets have bits flipped or garbled by electrical noise on the cable
        - Ethernet package contain a 32-bit CRC error-detecting code to detect bit errors
    - when the package is be misaddressed by the sending host or if they arrive 

### Ethernet Packet Format
- The format of a typical Ethernet packet, which is still used for newer, faster Ethernets:
![ethernet_format](img/ethernet_format.png)
    - The destination and source addresses are 48-bit quantities
    - The type is 16-bits
        - Identifies the higher protocol layer 
    - The data length is a variable up to a maximum of 1500 bytes  
    - The CRC checksum is 32-bits 
        - Added by the Ethernet hardware, never by host software 
    - There are also a preamble, which is a block of 1 bits followed by a 0, in front of the packet for synchronization 
    - Each ethernet card has it own hardware address used for identification

### Ethernet Multicast
- Another category of Ethernet addresses are **multicast**
    - Used to transmit a set of stations
    - The lowest order bit of the first byte of the address indicates whether an address is physical or multicast
    - To receive packets address to a given multicast address the host must inform the network interface that it wishes to do so 
        - Once done any arriving packets addressed to that multicast address are forwarded to the host
    - The set of subscribers to a given multicast address is called a **multicast group** 
    - If several host subscribe to the same multicast address they each receive a copy of each multicast packet transmitted
    - If switches are involved they normally forward multicast packages or broadcast packages on all outbound links 


- All the cases in which a network interface forwards a received packet up to its attached host:
    - if the destination address of the received packet matches the physical address of an interface
    - if the destination address of the received package is a broadcast address
    - if the interface is in promiscuous mode (receive all packages)
    - if the destination address of the received packet is a multicast address and the host has told the network interface to accept packets sent to that multicast address



### Ethernet Address Internal Structure
- The second-to-lowers-order bit of a physical Ethernet address indicates whether the address is believed to be globally unique or if its only locally unique 
    - Known as the **Universal/Local** bit 
    - For real Ethernet physical address, the multicast and universal/local bits of the first byte should be 0
    - Global Ethernet IDs are assigned to the physical Ethernet card by the manufacture 
        - The first three bytes serve indicate the manufacture 

### The LAN Layer
- The LAN layer, 
    - at its upper end, supplies to the network layer a  mechanisms for addressing a package a sending it from one station to another
    - at its lower end handles interactions with the physical layer
    - covers packet addressing, delivery and receipt, forwarding, error detection, collision detection and collision-related retransmission attempts
    - is divided into the **media access control**, or MAC, sublayer and a higher **logical link control** or LLC sublayer for higher-level flow-control functions that today have moved largely to the transport layer
        - The MAC layer is oftest used since it has the most frequently used functions 
        - LAN layer addresses are often called MAC addresses 

### The Slot Time and Collisions
- The **diameter** of an Ethernet is the maximum distance between any pair of stations
    - The maximum allowed diameter, measured in bits is limited to 232, which makes the round-trip-time 464 bits
    - If a station involved in a collision discovers a it, it transmit a special **jam signal** of up to 48 bits 
    - The time to send 512 bits is the **slot time** of an Ethernet
        - Often described in bit times but in in conventional time units the slot time is 51.2 µsec.
    - If a station has transmitted for one slot time it is said to **acquired** the network
        - No collision can occur since any other station had has enough time to find out that the first station is transmitting 
    - The Ethernet has a **minimum package size** equal to the slot time
        - A station transmitted that package are assured that is a collision were to occur, the sender would detect it and apply the retransmission algorithm
    - The Ethernet has a **maximum** packet size, of 1500 bytes
        - It is primarily for the sake of fairness 

### Exponential Backoff Algorithm
- Whenever there is a collision the Exponential Backoff Algorithm operating at the MAC layer is used to determine when each station will retry its transmission  
- The full Ethernet transmission algorithm
    1. Listen before transmitting ("carrier detect")
    2. If line is busy, wait for sender to stop and then wait an additional 9.6 microseconds (96 bits).
        - One consequence of this is that there is always a 96-bit gap between packets, so packets do not run together.
    3. Transmit while simultaneously monitoring for collisions
    4. If a collision does occur, send the jam signal and choose a backoff time as follows
        - For transmission $N$, $1\leq N \leq 10$ choose $k$ randomly with $0 \leq k < 2^N$. Wait $k$ time slots times, check if line is idle, waiting if necessary for someone else to finish and then retry step 3. For $10 \leq N \leq 15$ choose k randomly with $0 \leq k < 2^10$ 
    5. If we reach $N = 16$ give up 


- A problem that can occur when using the exponential Backoff Algorithm is the **Capture effect**
    - Is a potential lack of fairness
    - Happens when one device is lucky a gets the line most when the other is trying to send the first package

### CSMA persistence
- A carrier-sense/multiple-access transmission strategy is said to be **nonpersistent** if when the line is busy it waits a randomly selected time
- A strategy is said to be **p-persistent** if, after waiting for the line to clear, the sender sends with probability $p\leq 1$
- The ethernet uses 1-persistent 
    - A consequence of this is that if more than one device are waiting for the line to clear a collision is certain 
    - Ethernet handles gracefully a resulting collision via the usual exponential backoff

## Ethernet Learning Algorithm
- The solution for a switch to act as a drop-in replacement for a hub is to start out with an empty forwarding table and the incremently build the table to a learning process
    - If a switch does not have an entry for a particular destination, it will **fall back to flooding**
        - It will forward the packet out every interface other than the one which it arrived
    - A switch learns address locations as follows
        - For each interface, the switch maintains a table of physical (MAC) addresses that have appeared as source addresses in packets arriving via that interface
        - When a package arrives on interface $I$ with source address $S$ and destination unicast address $D$, the switch enters $\langle S, I \rangle$ into its forwarding table
    - To deliver a packet, the switch also looks up the destination $D$ in the forwarding table.
        - If an entry $\langle D,J \rangle$ with $J \ne I$ exists the switch **forwards** the packet out interface $J$
        - If an entry $\langle D,J \rangle$ with $J = I$ exists the packet does not get forwarded at all
        - If there is no entry for $D$ the switch must flood the package out all interface $J$ with $J\ne I$ 
    - After a while the fallback-to-flooding alternative is needed less and less often

## Spanning Tree Algorithm and Redudancy 
- Used to avoid loops 
- The switch with the lowest id is the root
- To create the spanning three it disable all the interfaces not following the following rules:
    1. It enables the port via which it reaches the root
    2. It enables any of its ports that further-out switches use to reach the root
    3. If a remaining port connects to a segment to which other “segment-neighbor” switches connect as well, the port is enabled if the switch has the minimum cost to the root among those segment-neighbors, or, if a tie, the smallest ID among those neighbors, or, if two ports are tied, the port with the smaller ID.
    4. If a port has no directly connected switch-neighbors, it presumably connects to a host or segment, and the port is enabled. Rules 1 and 2 construct the spanning tree; if S3 reaches the root via S2, then Rule 1 makes sure S3’s port

# Packets

## Packet Delay 
- There are several contributing sources to the packet delay
    - On LAN the most significant is **bandwidth delay**
        - The time needed for a sender to get onto the wire 
        - This is simply the packet size divided by the bandwidth
    - There is also **propagation delay**
        - This relates to the propagation of the bits at the speed of light
        - This delay is the distance divided by the speed of light 
    - The introduction of switches leads to **store-and-forward delay**,
        - The time reading the packet before any of it can be retransmitted 
    - A switch may also introduce **queuing delay** 

## Error Detection
- The basis strategy for error detection is to add some extra bits formally known as **error-detection code**, that will allow the receiver to determine is the packet has been corrupted in transit 
    - A corrupted package will be discarded by the receiver 
    - Packet errors generally fall into two categories: 
        - low-frequency bit errors due to things like cosmic rays, and interference errors, typically generated by nearby electrical equipment. 
        - Errors of the latter type generally occur in bursts, with multiple bad bits per packet. Occasionally, a malfunctioning network device will introduce bursty errors as well.

# IP Version 4 
- IP has a better scalability than LAN
- The IP network service should act like a giant LAN
- To support package size higher than what LAN allows the IP protocol supports **fragmentation**
    - Breaks a large package into units that it can transport successfully 

## The IPv4 Header
- The IPv4 header needs to contain the following information:
    - destination and source address 
    - indication of ipv4 vs ipv6
    - a Time To Live (TTL) value to prevent infinite routing loops 
    - a field indicating that comes next in the packet (e.g. TCP v UDP)
    - fields supporting fragmentation and reassembly.


- The header is organized as a series of 32-bit words as follows :
![ipv4_header](img/ipv4_header.png)
    - The **Version** field is, for IPv4, the number 4: 0100
    - The **IHL** field represents the total IPv4 Header Length in 32-bit words
        - an IPv4 Header can thus be at most 15 words long
        - The base header  takes up five words, so the IPv¤ Options can consist of at most ten words.
    - The **Dofferential Services** (DS) field is used by the Differentiated Services suite to specify preferential handling for designated packets
        - e.g. those involved in VoIP or other real-time protocols
    - The **Explicit Congestion Notification** bits  are there to allow router experiencing congestion to mark packets
        - This indicates to the sender that the transmission rate should be reduced
    - The **Total Length** field is present because an IPv¤ packet may be smaller than the minimum LAN packet size or larger than thewe maximum
        - The IPv4 packet length, in other words, cannot be inferred from the LAN- level packet size. 
        - Because the Total Length field is 16 bits, the maximum IPv4 packet size is 2 16 bytes.
    -  The **Time-to-Live** (TTL) field is decremented by 1 at each router and used to avoid loops
        - If it reaches 0, the packet is discard
        - A typical initial value is 64
            - It must be larger than the total number of hops in the path 
        - In most cases a  value of 32 would work 
    - The **Protocol** field contains a value to indentify the contents of the packet body such as
        - 1: an ICMP package 
        - 4: an encapsulated IPv4 packet
    - The **Header Checksum** field is the "Internet checksum" applied to the header only
        - It purpose is to allow discarding packets with corrupted headers 
        - When the TTL value is decremented the router must update this, and this can be done algebraically by adding a 1 in the correct place to compensate
        - Also update when the packet header is rewritten by a NAT router
    - The **Source** and **Destination Address** fields contain the IPv4 addresses 
        - Only updated by NAT firewalls
        - The source address can be changed to allow for IP **spoofing**
    - **IPv4 options**
        - The **Record Route** option in which routers are to insert their own IPv4 address into the IPv4 header option area
        - The **Timestamp** option where intermediate routers are requested to mark their address and a local time stamp +++

## Interfaces
- IP addresses are, strictly speaking, assigned not to hosts or nodes, but to **interfaces**
- Each comnputer likely also has a **loopback** interface, which provides a way to deliver IP packets to other processes on the same machine
    - Often named local host and resolves to the IPv4 loopback address 127.0.0.1
    - Loopback delivery avoids the the need to use the LAN at all
    - On unix-based machines the loopback interface represents a genuine logical interface, commonly named `lo`. 
- When VPN connections are created each end of the logical connection terminates at a virtual interface
    - The virtual interfaces appear to the systems involved, to be attached to a point-to-point link that leads to the other end
- When a computer hosts a virtual machine there is almost always a virtual network to connect host and the virtual machine
    - The host uses a virtual interface and may act as a NAT router or as an Ethernet switch 
- A host with multiple "real" network interfaces is often said to be **multihomed**
    - Many computers has both a Ethernet interfasce and a Wi-Fi interface which can be used a the same time with different addresses
- It is possible to assign multiple IP addresses to a single interface
    - e.g. to allow two IP networks to share a single physical LAN

## Special Addresses
- A few IPv4 addresses represent special cases
    - The standard loopback address is 127.0.0.1,
        - Any host beginning with 127 can serve as loopback host
    - **Private addresses** are addresses only intended for internal use
        - If a packet shows up on a non-private router containing a private address it should be dropped
        - There are three standard private address blocks that have been defined 
            - `10.0.0.0/8`
            - `172.16.0.0/12`
            - `192.168.0.0/16`
    - **Broadcast addresses** are a special form of addresses that are intended to be used in conjunction with the LAN-layer broadcast
        - The most common forms are
            - *"Broadcast to this network"* consisting of all 1 bits 
            - *"Broadcast to network D"* consisting of D's network address followed by all 1-bits for the host address 
                - If trying to broadcast to a remote network the odds are that some router will refuse it 

## Fragmentation
- There are two potential fragmentation strategies 
    - **per-link** fragmentation and reassembly where the reassembly is done at the opposite end of the link 
    - **path** fragmentation and reassembly where the reassemble is done in the opposite end of the link
        - is used by IPv4
    

- An IPv4 sender is supposed to use a different value for the **INDENT** field for different packages 
    - When a IPv4 datagram is fragmented it keeps the same INDENT field


- After fragmentation the **Fragment Offset** field marks the start position of the data portion of the fragment within the original IPv4 packet.
    - It can be numbered up to $2^{16}$ 
    - The three fragment bits in the header are the following
        - The first bit is reserved and must be 0 
        - The second bit is the **Don’t Fragment** bit and if set the router should not fragment the package and drop it instead
        - The third bit is set to 1 for all fragments except the final one 
            - tells the receiver where the fragments stop 
            
            
- The receiver must take the fragments and reassemble the package 
    - The package may not arrive in order 
    - The reassembler must identify when arriving packages are part of the same package 
    - Fragments are considered to belong to the same packet  if they have the same IDENT field, source and destination addresses and same protocol. 
    - If packages arrive that are part of a new fragmented packet a buffer is allocated
        - A bitmap is allocated to keep track of arrived bits
        - As subsequent fragments arrived they are placed in the buffer and the approiate be placed in the proper buffer in the proper position. 
        - If the bitmap shows that all packets has arrived the packet is sent on up as a complete IPv4 packet 

## The Classless IP Delivery Algorithm
- To decide if a IPv4 address D is **local** or **nonlocal** the host or router involved we do f0r each network address B/k assigned to one of the host’s interfaces a comparison of the first k bits of B and D; that is, we ask if D matches B/k.
    - If one these comparisons yield a match delivery is **local**
        - The host delivers the package via the LAN connected to the corresponding interface
        - That means looking up the LAN address of the destination and if applicable sending the packet to that destination via the interface
    - If there is no match delivery is **nonlocal**
        - The host passes $D$ to the `lookup()` routine of the forwarding table and sends it to the associated next_hop
        - It is know up to `lookup()` to split D into D$_{\text{net}}$ and D$_{\text{host}}$ this split cannot be made outside lookup


- The forwarding table is abstractly a set of network addresses, with lengths, on the form B/k with an associated next_hop for each
    - The `lookup()` routine will in principle compare the D with each table entry B/k looking for a match
    header to store a net/host division point, and furthermore different routers along the path may use different
    - Routers receive the prefix length /k for a destination B/k as part of the process by which they receive $\langle$destination,next_hopy pairs$\rangle$ 
    - When there are multiple matches in one table the **longest-match** rule is used to pick the best match
    - There may also be a default entry at the table which is typically 0.0.0.0/0 which will match everything 
    - Routers may also be configured to pass quality of service to the lookup table to best determine a path 

## IPv4 Subnets
- A larger network can be divided into subnets 
    - Subnets introduce **hierarchical routing**: first we route to the primary network then inside that site we route to the subnet and finally the last hop delivers to the host 
    - To implemented subnets the site's IPv4 network is divided into some combination of physical LANS and assign each a subnet address
    - A subnet address is an IPv4 network address B/k such that:
        - The address B/k is within the site: the first n bits of B are the same as A/n’s
        - B/k extends A/n: $k \geq n$
- To be able for host within a subnet to send something to a host of another subnet a **subnet mask** is used
    - A subnet mask is created for each subnet address B/k, consisting of k 1-bits followed by enough 0-bits to make a total of 32.
    - We need to make sure that every host and router knows the subnet address for everyone of its interfaces 
    - Hosts usually find their subnet mask the same way they find their IP address
- The host and routers apply the IP delivery algorithm, with the condition that, if a subnet mask for an interface is present, then the subnet mask is used to determine the number of address bits raters than the Class A/B/C mechanism
    - Done by comparing D&M and B&M, where D is the destination, the subnet address is B and the mask is M, if they are  equal the packet is local

## Network Address Translation
- NAT has the abilityu to multiplex and arbitrarily large number of individual hosts behind a single IPv4 address (or small number of addresses
    - The basis idea is that, instead of assigning each ghost at a site a public visible IPv4 address, just one address is assigned to a special device known as a NAT router (often called router)
    - One side of the NAT router connects to the internet the other connects to the site's internal network 
    - Hosts on the internal network are assigned private IP addresses typically on the form 192.168.x.y or 10.x.y.z
    - Connection to internal host from the outside world are banned 
    - When an internal machine wants to connect to the outside the NAT router intercepts the connection and forwards the connection's packets after rewriting the source address to make it seem like they came from the NAT routers own IP
        - When a remote machine responds the NAT router remembers the connection (stored in a special forwarding table) and forward the data to the correct internal host rewriting the destination address field of the incoming packets
        - The NAT forwarding table also includes ports numbers 
    - The NAT route improves security

## Address Resolution Protocol: ARP 
- ARP is used when a host or router A finds that the destination IP address D=D$_\text{IP}$ mactches the network address of one of its interfaces, it is to deliver the packet via LAN 
- The basic idea of ARP is that the host A sends out a broadcast ARP query or "who has D$_\text{IP}$" request, which includes A's own IPv4 and LAN addresses 
    - All hosts on the lan receive this message.
    - The host for whom the message is indended, D, will recognize that it should reply and whill return an ARP reply or "is-at" message containing D$_\text{LAN}$
    - Because the original request contained A$_\text{LAN}$, D's response can be sent directly to A 
    

- All host maintain an **ARP cache** consisting of $\langle$IPv4. LAN$\rangle$ address pairs for other hosts on the network
    - After an exchange the involved hosts and/or routers puts the other into their cache
    - ARP-cache entries eventually expire and a ARP query is send out about this entry
    - This cuts down the total amount of broadcast trafficjj

### ARP Finer Points
- Most host implement **self-ARP** or **gratuitous ARP** on startup
    - When a station A starts up it sends out an ARP query for itself: *"who has a"*
    - To things are gained from this
        - All stations which had A are now updated with A's most current A$_\text{LAN}$ address 
        - If an answer is received, then presumably some other host on the network has the same IPv4 address as A.


- ARP can be used for **duplicate address detection** though it does not often work well 
    - Often only a single self-ARP query is sent, and if a reply is received then frequently the only response is to log an error message; 
    - The host may even continue using the duplicate address


- There have been defined improved mechanism known as **Address Conflict Detection** (ACD)
    - A host using ACD sends out three ARP queries for its new IPv4 address, spaced over a few seconds.
    - It leaves the ARP field for the sender's IPv4 address filled with zeroes. 
    - This means that any other host with that IPv4 address in its cache will ignore the packet, rather than update its cache. 
    - If the original host receives no replies, it then sends out two more ARP queries for its new address,
        - This time with the ARP field for the sender's IPv4 address filled in with the new address
        - This is the stage at which other hosts on the network will make any necessary cache updates. 
    - Finally, ACD requires that hosts that do detect a duplicate address must discontinue using it. It is also possible for other stations to answer an ARP query on behalf of the actual destination 
    
    
- It is also possible for other stations to answer an ARP query on half of the actual destination D
    - This is called **proxy ARP** 
- It is important to have time out time to avoid the network being flooded

## Dynamic Host Configuration Protocol (DHCP)
- DHCP is the most common mechanism byt which hosts are assigned their IPv4 addresses
- DHCP invovles a host, at startup, broadcasting a query containing its own LAN address and having a server reply telling the host what IPv4 address is assigned to it 
    - Also called Reverse ARP 


- The DNCP response message is likely to carry, piggypacked onto it, several other essential startup options
    - Unlike the IPv4 address the additional network parameters does not usually depend on the specific host
    - A typical DHCP message includes the following
        - IPv4 address
        - subnet mask
        - default router
        - DNS Server
    - The options are called **minimal network configuration**
        - Hosts cannot function properly without these 
        
        
- The DHCP server has a range of IPv4 addresses to hand out
    - It maintains a database of which IPv4 address has been assigned to which LAN address.
    - Reservations can either be permanent or dynamic
    - If reservations are dynamic, hosts typically renew their DHCP reservation periodically-
    

- The typical home/small-office "router" is a NAT router coupled with an Ethernet switch, and usually also coupled with a Wi-Fi access point and a DHCP server. 

## Internet Control Message Protocol
- The Internet Control Message Protocol (ICMP) is a protocol for send IP-layer error and status messages
    - ICMP is, like IP, **host-to-host** and they are never delivered to a specific port even if they are sent in response to an error related to a specific port


- ICMP messages are identified by an 8-bit **type** field, followed by an 8-bit subtype, or **code**, the most common ICMP types with subtypes listed in the description
![ICMP_types](img/ICMP_types.png)
- The Echo and Timestamp formats are queries, sent by one host to another. 

- Most of them are all error messages, sent by a router to the sender of the offending packet. 
    - Error-message formats contain the IP header and next 8 bytes of the packet in question; the 8 bytes will contain the TCP or UDP port numbers. 
    - Redirect and Router Solicitation messages are informational, but follow the error-message format. 
    - Query formats contain a 16-bit Query Identifier, assigned by the query sender and echoed back by the query responder.

# Routing-Update Algorithms

## Distance-Vector Routing-Update Algorithm
- Distance-vector is the simplest routing-update algorithm used by the Routing Information Protocol
    - Routers identify their router neighbors and add a thrid column to their forwarding tables representing the total **cost** for delivery to the corresponding destination
        - Through some sort of neighbor-discovery mechanism 
        - The cost are the distance
    - Forwarding table entries are of the form $\langle$destination, next_hop, cost$\rangle$
    - Cost are administratively assigned to each link
    - The algorithm calculates the total cost as the sum of the link cost along the path
        - If a cost of 1 is assigned to each link it is called the hopcount metric
        - Link cost can also reflect each linkøs bandwidth, or delay
    - Each router reports the $\langle$destination, cost$\rangle$ portion of its table to its neighboring router at regular intervals 
        - These table portions are the vectors
    - Each router monitors its continued connectivity to each neighbor
        - If a neighbor becomes unreachable its reachability cost is set to infinity 

### Update rules
- Let $A$ be a router receiving a report $\langle D, c_D \rangle$ from neighbor $N$ at cost $c_N$, this means that $A$ can reach $D$ via $N$ with cost $c=c_D+c_N$. $A$ updates its own table according to these rules
    1. **New destination**: D is previously unknown destination. $A$ adds $\langle D. M, c \rangle$ to its forwarding table 
    2. **Lower cost**: D is a known destination with entry $\langle D,M,c_{old} \rangle$. but the new total cost $c$ is less than $c_{old}$.$A$ switches to the cheaper route, updating its entry for D to $\langle D,N,c \rangle$
        - It is possible that $M=N$
        - If $c=c_{old}$ A ignores the new report
    3. **Mext_hop increase:** $A$ has an existing $\langle D,N,c_{old} \rangle$ and the new total cost $c$ is greater than $c_{old}$ . $A$ updates its entry for $D$ to $\langle D,N,c \rangle$ 
   


## Distance-Vector Slow-Convergence Problem
- The **Distance-Vector Slow-Convergence Problem** happens when a link breaks and another neighbors sends a packets before the link where the it broken 
    - Results in an infinite loop since the to neighbors sends the packet to each other for ever 
    
- Fixes to the Distance Vector Slow-Convergence Problem
    - The simplest fix to this problem is to use a small value for infinity 
        - No path can be longer than this value 
    - Under **split horizon** if $A$ uses $N$ as its next_hop for destination $D$  then $A$ simply does not report to $N$ that it can reach $D$
        - When preparing a report to $N$ it first deletes all entries that have $N$ as next_hop
        - Can prevent all linear routing loops but cannot prevent all non linear
        - Can also use **poison reverse** where a cost of $\infty$ is report instead of deleting them
    - Under **Triggered Updates** any router should report immediately to its neighbors whenever it detects any change for the worse
    - **Hold down** dictates that the receiver does not use new alternative routes for a period of time following the discovery of unreachability
        - This gives time for the bad news to arrive 

# Abstract sliding windows

## Building Reliable Transport: Stop-and-Wait
- `Data[N]jj` represents the Nth data packes
    - is acknowledge by `ACK[N]`

- In the **stop-and-wait** version of retransmit-on-timeout, the sender sends only one outstanding packet at a time
    - If there is no response the packet may be retransmitted
    - The sender does not send `Data[N+1]` until it has received `ACK[N]`
    - Each side has only one packet in play at a time
    - If the `ACK[N]` is lost the sender sends a duplicate `Data[N]` and the receiver has implemented a **retransmit-on-duplicate**
    - Each site must implement a **retransmit-on-timeout**
        - Otherwise a lost packet leads to a deadlock
    - The receiver must either implement retransmit-on-timeout or **retransmit-on-duplicate**
    - To avoid the **Sorcerer’s Apprentice Bug** were the double amount of packages is send only one side should only implement one strategy
        - Usually the sender only implements the **retransmit-on-timeout**
        

- Stop-and-wait provides a simple form of **flow control** to prevent data from arriving at the receiver faster than it can be handled
    - The stop-and-wait mechanism will prevent data from arriving too fast if the time to process a received package is less than one RTT
    - If the processing time is slightly larger than RTT, all the receiver has to do is wait to send `ACK[N]` until `Data[N]` not only has arrived but also been processed and the receiver is ready
    - To show that data has been received but the receiver is has not processed it yet the `ACK`$_{\text{WAIT}}$[N] is used and when ready the ACK$_\text{GO}$[N] is used
        - Creates a new problem where the sender is waiting for the ACK$_\text{GO}$[N] but it is lost, which can be solved by the receiver using the retransmit-on-timeout in that period.
    

## Sliding Windows
- Sliding Windows want to improve on the efficiency of stop and wait by allowing the sender to send multiple packages at once 
    - `ACK[N]` cannot be sent until `Data[K]` has arrived for all $K \leq N$ 
    - The sender picks a **window size**, winsize, which is the amount of packets the sender is allowed to send before waiting for an `ACK` 
        - The sender keeps a state variable **last_ACKed** which represents the last packets which it has received an ACK from the other end (initially 0 if packets are one indexed)
    - At any instant, the sender may send packets numbered last_ACKed +1 through last_ACKed+winsize
        - This range is known as the **window** 
    - If `ACK[N]` arrives with N>last_ACKed, the windows slides forward
        - We set lasst_ACKed = N
    - If there is no packet reordering and no packets losses the windows will slides forward in one units at a time


- The bandwidth $\times$ RTT product is generally the optimum value for the window size
    - If the sender chooses a winsize larger than this, the RTT grows due to queuing delays
    - The sender is often more interested in bandwidth RTT$_\text{noLoad}$
        - Sometimes referred to as the **transit capacity** of the route
        - A window size smaller than this means underutilization of the network


- Sliding windows can work pretty well with the receiver assuming winsize=1
    - Like the sender, the receiver will also maintain the state variable last_ACKede
    - At any instant the receiver is ready to receive Data[last_ACKed+1] through Data[last_ACKed+winsize].


- If no response is received the sender only sends the first lost package
    - When a full timeout has occurred the sliding windows process has to ground in a halt, which is called **pipeline drain**    

# UDP 
- UDP header
![udp_header](img/udp_header.png)
- UDP is fairly basic 
    - The two features it adds beyond IP are **port numbers** and a **checksum** 
    - The port numbers are what makes UDP into a real transport protocol with them a process can not connect as an individual server process
        - Rather than simply a host
    - UDP is unreliable in that there is no UDP-layer attempt at timeouts timeouts, acknowledgment and retransmission
        - Applications written for UDP must implement these 
    - As with TCP, a UDP pair $\langle$host,port$\rangle$ is known as a socket
    - UDP is **unconnected** or **stateless**
        - If an application has opened a port on a host, any other host on the Internet may deliver packets to that  $\langle$host,port$\rangle$ socket without preliminary negotiation.
    - UDP packets use a 16-bit Internet **checksum** on the data
        - Can be disabled and set to an all 0-bits value
        - It covers the UDP header, the UDP data and also a *"pseudo-IP header"* that includes the source and destination IP addresses. 
        - If a NAT router rewrites an IP address or port the checksum must be updated
    - UDP packets can be dropped due to queue overflows either at an intervening route or at the receiving host 
    - UDP is popular for local transport
        - Is typically used as a transport basis for RPC
    - UDP is well suited for *"request-reply"* sematics
        - Uses less overhead than TCP
    - UDP is popular for **real-time** transport
        - Because of the **loss tolerance**
        - RTP is built on top of UDP rather than TCP (common for VoIP calls)

## Trivial File Transport Protocol (TFTP)
- TFTP supports file transfers in both directions
    - Does not support a mechanism for authentication
    - Uses stop-and-wait and often a fixed timeout interval 
    - Typically confined to internal use within a LAN 


- TFTP has five packet types
    - Read ReQuest, RRQ containing the filename and a text/binary indication
    - Write Request, WRQ
    - Data, containing a 16-bit clock number and up to 512 bytes of data
    - ACK, containing a 16-bit block number
    - Error, for certain designated errors 
        - All errors other than "Unknown Transfer ID" are cause for sender termination


- Data block numbering begins at 1
    - The packet with the Nth block of data is denoted as Data[N] and acknowledgements are Ack[N]
    - All blocks o data contain 512 bytes except the final block
        - The final block is identified by containing less than 512 bytes of data
        - If the file size was divisible by 512, the final block will contain 0 bytes of data
    - TFTP numbers are 16 bits in length, and are not allowed to wrap around
    

- The TFTP server listens on UDP port 69 for arriving RRQ packets 
    - For each RRQ requesting a valid file, TFTP server implementations almost always create a separate process (or thread) to handle the transfer
        - That child process will obtain an entirely new UDP port, which will be used for all further interaction with the client for this particular transfer


- In the absence of packet loss or other errors, TFTP file requests typically proceed as follows:
    1. The client sends a RRQ to server port 69.
    2. The server creates a child process, which obtains a new port, s_port, from the operating system.
    3. The server child process sends Data[1] from s_port.
        - Refered to as **latching on** to that port
    4. The client receives Data[1], and thus learns the value of s_port. The client will verify that each future  Data[N] arrives from this same port.
    5. The client sends ACK[1] (and all future ACKs) to the server’s s_port.
    6. The server child process sends Data[2], etc, each time waiting for the client ACK[N] before sending   Data[N+1].
    7. The transfer process stops when the server sends its final block, of size less than 512 bytes, and the   client sends the corresponding ACK.

## Fundamental Transport 

## Fundamental Transport Issues
- Some issues regarding any transport strategy
    - Old Duplicate Packets
        - Happens when a package is lost due to delay, a duplicate one is send and then a new connection mistakes it
        - A packet from a previous instance of the connection is called an **external** old duplicate
            - Two separate instances of a connection between the same socket addresses are sometimes known as **incarnations** of the connection, particularly in the context of TCP.
            - The TFTP defense to this is that both endpoints try and choose a different port for each separate transfer
        - **Internal** old duplicate could happen if the data numbers was allowed to wrap around
    - Lost Final ACK
        - One cannot be certain that the final ACK is received because no ack is send to the receiver
        - It is addressed by TFTP by recommending that the receiver enter into a **DALLY** state when it has to sent the final ACK
            - In this state the receiver responds only to deuplicates of the final DATA packet and retransmit the final ACK 
            - The dally state will expire after an interval which should be at least twicer the senders timeout interval
            - Reduces greatly the possibility of the last ACK being lost
    - Duplicated Connection Request
        - It happens when a connection cancels a read transfers and starts another one 
            - Then the new transfer can get the old once packets by mistake
        - It can be solved by the receiver changing port number
    - Reboot
        - TFTP has to take into account that one side may reboot between messages from the other side 
        - It is problem with huge importance

# TCP
- The **End-to-End** Principle states that transport issues are the responsibility of the endpoints in questions
- The TCP header:
![tcp_header](img/tcp_header.png)
    - The checksum covers the TCP header, the TCP data and an IP "pseudo header" the includes the source and destination IP addresses
        - Must be updated by a NAT router that modifies any header values
    - The **sequence** and **acknowledgement** numbers are for numbering the data at the byte level
        - This allows TCP to send 1024 blocks of data incrementing the sequence number by 1024 between successive packets or send 1-byte telnet packets, incrementing the sequence number by 1 each time
        - There is no distinction between DATA and ACK packets
            - All packets carrying data from A to B also carry the most current acknowledgement of data sent from B to A. 
            - Many TCP applications are largely unidirectional 
        - It is traditional to refer to the data portion of TCP packets as **segments** 
        - The value of the sequence number is the position of the first byte of the packet in the data stream or the position of where the first byte would be in case no data was sent
        - The value of the acknowledgment number represents the byte position for the next byte expected
        - The sequence and acknowledgment numbers, as sent, represent these relative values plus an Initial Sequence Number, or ISN, that is fixed for the lifetime of the connection. 
            - Each direction of a connection has its own ISN
        - The TCP acknowledgements are **cumulative**
            - It acknowledging receipt of all data byte numbered less than N where N is the acknowledgment number
    - The TCP header defines the following flag bits
        - **SYN**: for SYNchronize, marks packets that are part of the new connection handshake
        - **ACK**: indicates that the header ACcknowledgment field is valid, that is all but the first packet
        - **FIN**: for FINish, marks packets involved in the connection closing 
        - **PSH**: for PuSH, marks *"non-full"* packets that should be delivered promptly at the far end
        - **RST**: for ReSet, indicates various error conditions
        - **URG**: for URGent, part of a now-seldom-used mechanism for high-priority data
        - **CWR** and **ECE**: part of the Explicit Congestion Notification mechanism

## TCP Connection Establishment 
- TCP connections are established via an exchange known as the **three-way handshake**, A is the client and B is the LISTENing server then the handshake goes as follows
    - A sends B a packet with the SYN bit set (a SYN packet)
    - B responds with a SYN packet of its own, the ACK bit is now also set
    - A responds to B's SYN with its own ACK


- Normally a three way handshake is triggered by an application's request to connect
    - Data can only be send after the handshake completes
    - It is vulnerable to an attack known as **SYN flooding** 
        - The attacker sends a large number of SYN packets to a server B
        - For each arriving resource B must allocate resources and B's resources may face exhaustion
        
- To **close** the connection, a superficially similar exchange involving FIN packets may occur
    - A sends B a packet with the FIN bit set (a FIN packet), announcing that it has finished sending data
    - B sends A an ACK of the FIN
    - B may continue to send additional data to A
    - When B is also ready to cease sending, it sends its own FIN to A
    - A sends B an ACK of the FIN; this is the final packet in the exchange


- When closing a connection it is important to use `shutdown` instead of `close`
    - `close` just closes the connection and does not listen for further data
    - If the non closed part tries to send data the closed one might send a `rst` which means that all data sent is lost

## TIMEWAIT
- The TIMEWAIT state is entered by whichever side initiates the connection close
    - In the event of a simultaneous close both sides enter TIMEWAIT 
    - It is to last for a time $2 \ \times$ MSL
        - MSL is an agreed-upon value for the maximum lifetime on the Internet of an IP packet
        - Traditionally 60 seconds but modern 30 seconds
    - One function is to solve the external-old-duplicates problem
        - Requires that enough time has passed for old duplicates to disappear 
    - A second function of TIMEWAIT is to address the lost-final-ACK problem
        - TIMEWAIT only blocks reconnections for which both sides reuse the same port they used beforej

## TCP state diagram 
- The following is a **state diagram** for TCP
![tcp_state](img/tcp_state.png)
    - The blue arrows indicate the sequence of state transitions typically followed by the server
    - The brown arrows represent the client
    - Arrows are labeled with **event/action**
    - The ESTABLISHED state and the states below it are sometimes called the **synchronized** states
        - Since both sides have confirmed each others ISN values

# The Web and HTTP
- A webpage consists of **objects**
    - A **object** is a file 
- HTTP uses TCP as its underlying transport protocol
    - The HTTP client first initiates a TCP connection
    - The client sends HTTP request messages into its socket interface and receives HTTP response messages from its socket interface. 
    - The server receives request messages from its socket interface and 
    - HTTP is said to a **stateless** protocols because thea server does not store and state on the different clients
- A **non-persistent connection** is where each request/response pair is send over separate TCP connections
- A **persistent connection** is where each request/response pair is send over the same TCP connection
    - Used by the default mode in HTTP 

## HTTP request message format
- There are two different types of HTTP messages, request messages and response messages
- Example of a typical HTTP request message
```
    GET /somedir/page.html HTTP/1.1
    Host: www.someschool.edu
    Connection: close
    User-agent: Mozilla/5.0
    Accept-language: fr
```
    - The first line of an HTTP request message is called the **request line** 
        - It has three fields: the message field, the URL field and HTTP version field
        - The `HEAD` method is the same as the `GET` but it leaves out the body
    - The subsequent lines are called **header lines**
        - The header line starting with `Host:` specifies where the host lives 
        - By supplying the `Connection: close` the client tells the server that it doesn't want to bother with a persistent connection 
        - The `User-agent:` line what browser type is making the request 
        - The `Accept-language:` tells the server that it want the french version of the site

![http_request_header_format](img/http_request_header_format.png)

## HTTP Response Message
- Example of HTTP response message
```
    HTTP/1.1 200 OK
    Connection: close
    Date: Tue, 09 Aug 2011 15:44:04 GMT
    Server: Apache/2.2.3 (CentOS)
    Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
    Content-Length: 6821
    Content-Type: text/html
    
    (data data....)
```
    - The first line is the status line which has three fields: the protocol version, the status code and the corresponding status message
    - The other lines before the data is the header lines
    - The `Connection: close` tells the client that it is going to close the connection after sending this message
    - The `Date:` tells the client which and date the HTTP response message was created
    - The `Server:` indicates which server has created the message
    - The `Last-Modified:` indicates the time or date the object was created or modified
    - The `Content-Length:` indicates the number of bytes being sent
    - The `Content-Type:` indicates the objects type

![http_response_header_format](img/http_response_header_format.png)

## Cookies 
- Cookies allows sites to keep track of users
- Example of a cookie header line when a new cookie is made: `Set-cookie: 1678`
- Example of a cookie in a normal request: `Cookie: 1678`
- A cookie consists of three components
    1. A cookie header line in the HTTP response message
    2. A cookie header line in the HTTP request message
    3. A cookie file kept on the user's end system and managed by the user's browser 
    4. A back-end database at the Web site. 

## Web-caching
- A Web cache also called a proxy server is a network entity that satisfies HTTP requests on the behalf of an origin Web server. 
    - It has its own disk storage and keeps copies of recently requested objects in this storage.
- The browser interacts with the net-cache in the following way
    1. The browser establishes a TCP connection to the Web cache and sends an HTTP request for the object to the Web cache.
    2. The Web cache checks to see if it has a copy of the object stored locally. 
        - If it does, the Web cache returns the object within an HTTP response message to the client browser.
    3. If the Web cache does not have the object, the Web cache opens a TCP connection to the origin server. The Web cache then sends an HTTP request for the object into the cache-to-server TCP connection. 
        - After receiving this request, the origin server sends the object within an HTTP response to the Web cache.
    4. When the Web cache receives the object, it stores a copy in its local storage and sends a copy, within an HTTP response message, to the client browser 
        - This is done over the existing TCP connection between the client browser and the Web cache.
- A web cache is typically purchased and installed by an ISP 
- At web cache can substantually reduce an institution’s access link to the Internet.

## The conditional GET
- The mechanism **conditional GET** allows the web cache to verify that the objects are up to date
- An HTTP request is a conditional GET if 
    1. The request message uses the GET message 
    2. The request message includes an `If-Modified-Since:` header line.

# FTP
- FTP is used to transfer files from a remote system to another system
    - The user first provides the host name of the remote host
        - This causes the FTP client to establish an TCP connection with the FTP server process on the remote host 
    - The users then provides the user identification and password
        - This is send over TCP as a part of the FTP commands 
    - Once the server has authorized the user, the user copies one or more files stored in the local file system into the remote file system (or vice versa).
    

- FTP uses two parallel connection for to transfer a file a **control connection** and a **data connection**
    - The control connection is used for sending control information between the two connections
        - Information such as user identification, password, commands to change remote directory, and commands to "put" and "get" files. 
    - The data connection is used to actually send a file. 
    
    
- Since FTP uses a separate control connection, FTP is said to send its control information **out-of-band**
    - For this reason, HTTP is said to send its control information **in-band**.
    
    
- The client side of FTP  first initiates a control TCP connection with the server side (remote host) on server port number 21. 
    - The client side of FTP sends the user identification and password over this control connection
    - The client also sends commands to change the directory through this connection
    
    
- When the server side receives a command for a file transfer over the control connection, the server side initiates a TCP data connection to the  client side. 
    - FTP sends exactly one file over the data connection and then closes the data connection. 
    - If, during the same session, the user wants to transfer another file, the server opens another connection
    
    
- Throughout a session FTP maintains a **state** about each user

## FTP commands and Replies 
- The commands, from client to server, and replies, from server to client, are sent across the control connection in 7-bit ASCII format. 
- Common FTP commands
    - `USER username`: Used to send user identification to the server
    - `PASS password`: Used to send the user password to the server
    - `LIST`: Used to ask the server to send a list of all the files in the current directory 
        - The list of files are send over a new and non-persistent data-connection
    - `RETR filename`: Used to retrieve (get) a file from the current directory on the remote server
    - `STOR filename`: Used to store (put) a file into the current directory on the remote server


- Each command is followed by a corresponding reply from the server to the client, which consists of a number with an optional message. Here are some examples
    - `331 Username OK, password required`
    - `125 Data Data connection already open; transfer starting`
    - `425 Can't open data connection`
    - `452 Error writing file`

# Electronic Mail in the Internet
- Internet mail has three major components: **user agents**, **mail servers** and **Simple Mail Transfer Protocol (SMTP)**
- User agents allow the user to read, reply to, forward, save and compose message
    - When the user is done composing an email the user agent sends the mail to the mail server, where the mail is placed in the mail servers outgoing queue
    
    
- Mail servers form the core of the e-mail infrastructure. 
    - Each user has a mailbox located in one of the mail servers
    - Each users mailbox manages and maintains the message that have been send to that person
    - A message starts in the sender's user agent, then to the senders mail server, and then to the recipient mail server and is then deposited in the recipient's mailbox
    - When a user wants to access messages in the mailbox, the mail server authenticates the User
        - Done using usernames and passwords 
    - A mail server must deal with failures in other peoples mail servers
    - If a mail server cannot deliver mail to another mail server it holds the message in a message queue and attempts again later
        - Is often done every 30 minutes or so
        - If there is no success after several days, the server removes the message and notifies the sender with e-mail address
        
        
- SMTP is the principal application-layer protocol for Internet electronic mail. 
    - It TCP to transfer mail from the senders mail server to the recipient's mail server
    - SMTP has two sides 
        1. A client site which executes on the sender's mail server
        2. A server side which executes on the recipients mail server
    - Both the client and the server side run on every mail server
    - When a mail server sends mail to other servers it acts as SMTP client
    - When a mail server receives mail from other servers it acts as SMTP server 

## SMTP
- SMTP restricts the body (not just the headers) of all mail messages to simple 7-bit ASCII. 
- SMTP does not normally use intermediate mail servers for sending mail, instead a direct TCP connection is used
- How SMTP transfers a message from a sending mail server to a receiving mail server
    1. The client SMTP has TCP establish a connection to port 25 at the server SMTP
        - If the server is down the client tries again later
    2. The server and client perform some application-layer handshaking 
        - SMTP clients and servers introduce themselves before transferring information
        - The SMTP client indicates the e-mail address of the sender and the e-mail of the recipient
    3. The client sends the message
        - The client repeats this process over the same TCP connection if it has other messages to send to the server
        
        
- Example where `S` is server and `C` is client
```
    S: 220 hamburger.edu
    C: HELO crepes.fr
    S: 250 Hello crepes.fr, pleased to meet you
    S: MAIL FROM: <alice@crepes.fr>
    S: 250 alice@crepes.fr ... Sender ok
    C: RCPT TO: <bob@hamburger.edu>
    S: 250 bob@hamburger.edu ... Recipient ok
    C: DATA
    S: 354 Enter mail, end with "." on a line by itself
    C: Do you like ketchup?
    C: How about pickles?
    C: .
    S: 250 Message accepted for delivery
    C: QUIT
    S: 221 hamburger.edu closing connection
```
    - If more mails are send it begins each new mail with `MAIL FROM: crepes.fr`

## SMTP header
- Example of a SMTP header
```
    From: alice@crepes.fr
    To: bob@hamburger.edu
    Subject: Searching for the meaning of life.
```
    - Every header must have a `FROM` and `TO` line
    - The `Subject:` line is optional
    - The header lines are part of the mail message itself

## Mail Access Protocols
- SMTP cannot be used to access mails on the server since it uses a push operation 
- There are a number of popular mail access protocols, including Post Office Protocol—Version 3 (POP3), Internet Mail Access Protocol (IMAP), and HTTP. 
![mail_access](img/mail_access.png)

- When the client it web based it sends and receives messages from the web server using the HTTP protocol 

### POP3
- POP3 is an extremely simple mail access protocol
    - It has limited functionality
    
    
- POP3 begins when the user agent open a TCP connection to the mail server on port 110
- POP3 progresses through three phases: authorization, transaction, and update when a connection has been established
    - **Authorization**: the user agent sends a username and a password to authenticate the user
    - **Transaction**: the user agent retrieves messages and the user can
        - mark messages for deletion
        - remove deletion marks
        - obtain mail statistics
    - **Update**: the mail server deletes the messages that were marked for deletion
        - Occurs after the client has issued the quit command which ends the session


- In a POP3 transaction the user agent issues commands and the server responds with a reply to each command and there are two possible responses
    - `+OK` which indicates that the previous command was fine 
        - It is sometimes followed by server-to-client data
    - `-ERR` which indicates that something was wrong with the previous command
        
        
- The authorization phase has two principal commands
    - `user <username>`
    - `pass <password>`


- Example of authorization phase on mail server
```
telnet mailServer 110
+OK POP3 server ready
user bob
+OK
pass hungry
+OK user successfully logged on
```

- During the transaction phase the user agent can often be configured to *"download and delete"* or *"download and keep"* 
    - In the download-and-delete mode, the user agent will issue the `list`, `retr`, and `dele` commands.
        - Since the download-and-delete mode deletes the messages after reading them the user will not be able to view them on multiple computers
    - In the download-and-keep mode the user agents keeps the messages on the server 
 
 
- Example transtion phase using *download and delete*
```
C: list
S: 1 498
S: 2 912
S: .
C: retr 1
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 1
C: retr 2
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 2
C: quit
S: +OK POP3 server signing off
```

- POP3 only maintains the user state during the session 

### IMAP
- It is not possible to maintain a folder hierarchy on a remote server using POP3
    - This and other problems is solved by the more complex IMAP 
    
    
- An IMAP server will associated each message with a folder 
    - When a message first arrives at the server it is associated with the recipients INBOX folder
    - The recipient can move a message into a new, user-created folder, read the message, delete the message, and so on. 
    

- The IMAP protocol provides commands to allow users to create folders move messages from one folder to another 
    - It  also provides commands that allow users to search remote folders for messages matching specific criteria


- An IMAP server maintains user state information across IMAP sessions 
    - Such as the names of folders and which messages are associated with which folder
    

- IMAP has commands that permit the user agent to obtain component of messages
    - A user can obtain the message header of the message or just one part of a multipart MIME message.
    - It is useful for a low-bandwidth connection


# DNS 
- The task of DNS is to translate hostnames to IP addresses 
    - DNS is a distributed database implemented in a hierarchy of DNS servers
    - DNS is an application-layer protocol that allows host to query the distributed database


- DNS servers are often UNIX machine running the Berkeley Internet Name Domain (BIND) software.
    - The DNS protocol runs over UDP and uses port 53
    - IT is commonly employed by other application-layer protocols such as HTTP, SWTPP, and FTP-to translate hostnames to IP addresses


- DNS call to obtain the IP address of www.someschool.edu example
    1. The same user machine runs the client side of the DNS application.
    2. The browser extracts the hostname, www.someschool.edu, from the URL and passes the hostname to the client side of the DNS application.
    3. The DNS client sends a query containing the hostname to a DNS server.
    4. The DNS client eventually receives a reply, which includes the IP address for the hostname.
    5. Once the browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process located at port 80 at that IP address. 
    

- DNS provides a few other important services in addition to translating hostnames to IP addresses:
    - **Host aliasing**: A host with a complicated hostname can have one or more alias names. 
        - The original hostname is said to be the **canonical hostname**
    - **Mail server aliasing**: DNS can be invoked by a mail application to obtain the canonical hostname for a supplied alias hostname as well as the IP address of the host. 
    - **Load distribution.** DNS is also used to perform load distribution among replicated servers, such as replicated web server
        - For replicated servers a set of IP addresses are associated with one canonical hostname
        - When clients make a DNS query for a name mapped to a set of addresses, the server responds with the entire set of IP addresses, but rotates the ordering of the addresses within each reply. 
        - DNS rotation is also used for e-mail so that multiple mail servers can have the same alias name

## How DNS works
- To translate a hostname to an IP address, the application invoke the client side of DNS 
    - It specifies what hostname needs to be translate
        - On UNIX it is often the `getHostName()` function that is invoked
    - All DNS query and reply messages are sent within UDP datagrams to port 53
    - DNS in the user’s host takes over, sending a query message into the network.
    - After a delay from milliseconds to seconds DNS in the user's host receives a DNS reply message that provides the desired mapping
        - This is passed to the invoking application
        
        
- In the perspective of the invoking application DNS is a black box  providing a simple, straightforward translation service
    - DNS service is complex, consisting of a large number of DNS servers distributed around the globe and an application-layer protocol that specifies how the DNS servers and querying host communicate


- Problems with a centralized design simple DNS server includes
    - **A single point of failure**: If the DNS server crashed so does the entire Internet
    - **Traffic volume**: A single DNS server would have to handle all DNS queries 
    - **Distant centralized database**: A single DNS server cannot be "close to" all the querying clients
        - This can lead to significant delays
    - **Maintenance**: A single DNS server would have to keep records for all Internet hosts
        - It would not only have to be huge, but also have to be updated frequently


### A Distributed, Hierarchical Database
- DNS uses a large number of servers to deal with the issue of scale
    - They are organized in a hierarchical fashion and distributed around the world
    - No a single server has all of the mapping for all the servers in the world
    - Mappings are distributed across the DNS servers


- There are three classes of DNS servers
    - **Root DNS servers**: There are 13 root DNS servers on the internet
        - Each server is a network of replicated servers for security and reliability purposes
    - **Top-level-domain (TLD) servers**: These servers are responsible for top-level domains such as com, org, net, edu and gov, and all of the country top-level domains
    - **Authoritative DNS servers**: Every organization with publicly accessible hosts on the Internet must provide publicly  accessible DNS records that map the names of those hosts to IP addresses.
        - Such as Web servers and mail server
        - An organization's authoritative DNS server houses these DNS records
        - An organization can choose to implement its own authoritative DNS server to hold these records
            - The the organization can pay to have these records stored in an authoritative DNS server of some service provider 


- A **local DNS server** does not strictly belong to the the hierarchy of servers but is nevertheless central to the DNS architecture.
    - Each ISP has a local DNS server
        - Such as a university, an academic department, an employee’s company, or a residential ISP
        - Also called a default name server
    - When a host connects to an ISP, the ISP provides the host with the IP addresses of one or more of its local DNS servers
    - A host local DNS server is typically "close to" the host
        - For an institutional ISP the DNS server may be on the same LAN
        - For a residential ISP it is typically separated from the host by no more than a few routers
    - When a host makes a DNS query, the query is sent to the local DNS server which acts a proxy
        - It forwards the query into the DNS server hierarchy 
        
   
- A query can be iterative or recursive
    - If it is **iterative**, the Root DNS servers and Top-level-domain servers gives the client an IP address of an Top-level-domain server and a Autoritative DNS server and it is the clients job to contact them
    - If it is **recursive** the server does all the work contacting the correct servers and gives the correct IP address to the client
    - DNS queries to a local DNS server are typically recursive and all others are iterative 

### DNS caching
- DNS caching is done each time a DNS server receives a DNS reply
    - It can cache the mapping in its local memory
    - If a hostname/IP address pair is cached in a DNS server and another query arrives to the same hostname the DNS server can provide the desired IP address
    - DNS servers discard cached information after a period of time 
        - Often two days
    
    
- A local DNS server can also cache the IP addresses of TLD servers, thereby allowing the local DNS server to bypass the root DNS servers in a query chain

## DNS Records
- The DNS servers that implement the DNS distributed database store resource records (RRs)
    - Including RRs that provide hostname-to-IP address mappings
    - Each DNS reply message carries one or more resource records


- A resource records is a four-tuple that contains the following fields: `(Name, Value, Type, TTL)`
    - TTL is the time to live of the resource record
        - Determines when a resource should be removed from a cache
    - The meaning of `Name` and `Value` depend on `Type`:
        - If `Type=A` the `Name` is a hostname and `Value` is the IP address for the hostname
            - Provides standard hostname-to-IP address mapping
            - Example: `(relay1.bar.foo.com, 145.37.93.126, A)`
        - If `Type=NS` the name is a domain and the `Value` is the host name of an authoritative DNS server that knows how to obtain the IP address for hosts in the domains
            - Used to route DNS queries further along in the query chain
            - Example: `(foo.com, dns.foo.com, NS)`
        - If `Type=CNAME` the `Value` is a canonical hostname for the alias hostname Name
            - This record can provide querying hosts the canonical name for a hostname.
            - Example: `(foo.com, relay1.bar.foo.com, CNAME)`
        - If `Type=MX`the Value is the canonical name of a mail server that has an alias hostname `Name`
            - Example: `(foo.com, mail.bar.foo.com, MX)`
            - They allow the hostnames of mail server to have simple aliases


- If a DNS server is authoritative for a particular hostname, then the DNS server will contain a Type A record for the hostname.
    - Even if it is not authoriative it may contain an A record in its cache
    
    
- If a server is not authoritative for a hostname, then the server will contain a Type NS record for the domain that includes the hostname
    - It will also contain a Type A record that provides the IP address of the DNS server in the Value field of the NS record.

## DNS messages
- There are only two kinds of DNS messages DNS query and reply messages
    - They have the same format
    

- The semantics of the DNS messages are as follows
![dns_message_format](img/dns_message_format.png)
    - The first 12 bytes is the header header section, which has a number of fields
        - The first field is a 16-bitt number that identifies the query
            - This is copied into the reply message to a query, which allows the client to match received replies with send queries
        - There are a number of flags in the flags field
            - A 1-bit query/reply flags indicates whether the message is a query (0) or a reply (1)
            - A 1-bit authoritative flag is set if the DNS server is an authoritative server for a queried name
            - A 1-bit recursion-desired flag is set when a client desires that the DNS server perform recursion when it doesn't have the record
                - It is set in a reply if the DNS server support recursion
        - There are also four number fields, which indicates the number of occurrences of the four types of data following the header
    - The question section contains information about the query that is being made and it includes
        1. A name field that contains the name that is being query
        2. A type field that indicates the type of question being asked about
    - In a reply the answer section contains the resource records for the name that was originally queried 
        - A reply can return multiple RRs in the answer, since a hostname can have multiple IP addresses.
    - The authority section contains records of other authoritative servers
    - The additional section contains other helpful records
        - For example, the answer field in a reply to an MX query contains a resource record providing the canonical hostname of a mail server. 
        - It contains a Type A record providing the IP address for the canonical hostname of the mail server. 