## Integer Operations and Threads
Even simple integer operations like increament(++) involve multiple instructions like fetching the value from memory, increamenting the value in the processor core's register and pushing the new value to cache and main memory. Hence a thread can be interrupted inbetween a single integer operation, therefore we should lock the shared integers and operations on them.
There is another way, we can tell the compiler that we want this integer variable to be atomic, then the compiler will generate special instructions to disable pre-fetch of the variable and flush the store buffer immediately after doing the operation. This will result in a single interger operation that cannot be interrupted by other threads. This also means the single operation will take longer as we have disabled the optimizations.

## Atomic Types
When we make a variable a atomic type all operations on it will be atomic, other threads will not be able to interleave during that operation. C++ defines an atomic template, the template parameter is the type of the object atomic\<int\> x = 0;. Atomic variables must always be initialized. The type parameter must be trivially copyable, which means a built in type or a compound type where all the members are built in types. Normally integer types and pointers are made atomic. For more complex types compilers are allowed to replace them with locks, to avoid this make a pointer of the complex type atomic. We can assign to and from an atomic object in an atomic operation, threads could interleave between two atomic operations.
```
std::atomic<int> counter = 0;

void task()
{
    for(int i = 0; i < 100'000; ++i)
    {
        ++counter;
    }
}

void main()
{
    std::vector<std::thread> tasks;
    
    for(int i = 0; i < 10; i++)
    {
        tasks.push_back(std::thread(task));
    }
    
    for(auto &thr : tasks)
    {
        thr.join();
    }
}
```
The volatile keyword has no impact on threading. In C++ if we make a variable volatile it means its value can change at anytime, compiler optimizations which are based on value not being modified from software are removed. Typically used when accessing hardward, for example we could have a network card mapped to a certain memory location, if reads to this location are optimized as the processor is not updating to this memory location(network card is), we will have old data, volatile keyword is used to remove this optimization.

## Atomic Operations
* store() - Atomically replace the object's value with its argument.
* load() - Atomically returns the object's value.
* operator =() - Assignment operator is same as store.
* operator T() - Return operator is same as load.
* exchange() - Atomically replaces the object's value with its argument and returns the previous value.
* Atomic pointers also support pointer arithmetic operations - increament/decrement operations(++, --), compound increament/decrement operations(+=, -=).
* Atomic interger variables support atomic bitwise logical operators(&, | and ^) in addition to the increament/decrement operations.

There is also a std::atomic_flag class that can be used instead of std::atomic\<bool\>. std::atomic_flag has less overhead and faster. It has only 3 operations, clear()(sets flag to false), test_and_set()(sets flag to true and returns the previous value) and operator =(assignment operator). It must be initialized to false atomic_flag lock=ATOMIC_FLAG_INIT(C++ predefined symbol);

The function test_and_set() sets the flag to true and returns the previous value, we can use that for implementing a spin lock. A spin lock is an alternative to using a mutex or condition variable, it is essentially an infinite loop spinning until a condition becomes true. Each thread calls test_and_set() in a loop. If it returns true, some other thread has set the flag and is in critical section, so iterate again. If it returns false, this thread has set the flag, exit the loop and proceed into the critical section. After the critical section set the flag to false.
```
std::atomic_flag flag = ATOMIC_FLAG_INIT;

void task(int n)
{
    //Loop until we can set the flag
    while(flag.test_and_set())
    {
    }
    
    //Critical section
    std::this_thread::sleep_for(50ms);
    std::cout << "I'm a task with argument " << n << std::endl;
    
    //Clear the flag
    flag.clear();
}

int main()
{
    std::vector<std::thread> threads;
    
    for(int i = 0; i < 10; i++)
    {
        threads.push_back(std::thread(task, i));
    }
    
    for(auto &thr : threads)
    {
        thr.join();
    }
}
```
The spin thread remains active(is not blocked like mutex). Spin lock can continue immediately when it gets the lock, with mutex the thread will need to wait till it is woken up by the schedular. But it very processor-intensive(looping), so only suitable when spinning will be very less(very low contention). Usually used only in operating systems and libraries.

## Lock-free Programming
In lock-free programming we still have threads executing critical sections concurrently, but we try to remove data races without using locking facilities. There are a number of drawbacks to using locks
* Race conditions can be caused by forgetting to lock, or using the wrong mutex.
* Locking mutexes is not composable, if we lock one mutex and unlock another mutex without unlocking the first mutex then we have a risk of dead locking.
* High overhead and time consuming.
* Lack of scalability caused by coarse-grained locking.
* Increased overhead caused by fine-grained locking.

For these reasons is some systems which are time critical like real time systems we try to use lock-free programming. But it is very difficult to write lock-free code that is both efficient and has no data races. For these reasons it is only tried in cases where performance is critical.  
When doing lock free programming, shared data may have different values in different threads, the value of the shared variable may change between an if statement and its body. We use a transactional model for lock-free programming, which is described by the acronym ACID.
* Atomic : A transaction either completes successfully(commit), or it fails and leaves everything as it was(rollback).
* Consistent : The shared data goes from one consistent state to another. As seen by other users, the shared data is never in an inconsistent state.
* Isolated : Two transactions can never work on the same data simultaneously.
* Durable : Once a transaction is committed, it will not be lost untill the next transaction is committed.

The only way to write lock-free programs is to use atomic instructions. These follow all the 'ACID' semantics. We also need to think carefully about the thread interactions.  
Below is a implementation of a simple lock-free queue. It can be accessed by only 2 threads, a producer thread inserts elements into it and a consumer thread that removes elements from it. The code is designed in such a way that the two threads always work on different parts of the queue. The queue uses 2 pointers(iterators), head and tail. The consumer thread does not modify the queue, it removes an element by increamenting the head pointer. Producer inserts elements, increments tails pointer and erases the elements removed by the consumer. This means only the producer thread can modify the queue and the 2 threads can never overlap.
```
template <typename T>
class lock_free_queue
{
public:
    lock_free_queue()
    {
        m_list.push_back(T());     //Create a dummy element, this will help head and tail not to overlap
        m_head = list.begin();
        m_tail = list.end();
    }
    
    bool consume(T &t)
    {
        auto first = m_head;
        ++first;
        if(first != m_tail)
        {
            m_head = first;
            t = *m_head;
            return true;
        }
        return false;
    }
    
    void produce(const T& t)
    {
        m_list.push_back(t);
        m_tail = m_list.end();
        m_list.erase(list.begin(), m_head);    //Erase the elements which the consumer thread has marked for removal 
    }
    
private:
    std::list<T> m_list;
    std::list<T>::iterator m_head, m_tail;
};

int main()
{
    lock_free_queue<int> lfq;
    std::list<std::thread> threads;
    int j = 1;
    
    for(int i = 0; i < 10; ++i)
    {
        std::thread produce(&lock_free_queue<int>::produce, &lfq, std::ref(i));
        threads.push_back(produce);
        std::thread consume(&lock_free_queue<int>::consume, &lfq, std::ref(j));
        threads.push_back(consume);
    }
    
    for(std::thread &thr: threads)
    {
        thr.join();
    }
}
```
Still this will not give the desired results, even though the list is modified by only the producer thread,shared variables m_head and m_tail are used by both threads at the same time.