A data structure has multiple elements, multiple threads may access these elements, these accesses can conflit, locks or atomic operations may be needed. Modifying one part of the data structure object may effect other parts of the object, with linked lists adding/removing elements modifies the surrounding nodes, with vector/string/dynamic array adding/removing elements in the middle will move the following elements in the memory and adding elements may cause the block to be reallocated, if during these times other threads are accesses those elements, pointers and references may dangle, iterators may become invalidated.  
STL containers are "memory objects", concurrent reads from different threads of the same container object are safe, single write is also safe, concurrent reads and writes are not safe.

## std::shared_ptr
Different instances share the same memeory location, it uses reference counting. When a shared_ptr object is copied/assigned there are no memory operations, only the reference counter is incremented, when a copy is destroyed the reference counter is decremented, when the last copy is destroyed the reference counter becomes zero thus allocated memory is released.
```
//Pass a pointer to the constructor
std::shared_ptr<int> ptr1(new int(42));

//Calling the make_shared is better, data and reference counter close in memory, faster access
std::shared_ptr<int> ptr2 = std::make_shared<int>(42);
```
std::unique_ptr has the same overhead as a traditional pointer, std::shared_ptr has more overhead, use it only when necessary.
When using std::shared_ptr in threads context there are potential issues of conflicting access of its data and reference counter. The reference counter is a atomic type in std::shared_ptr, hence it is safe to copy/move/assign it multi threaded programs, however the syncronization of the data that the shared_ptr is using is the programmers responsibility, we can also use std::atomic\<std::shared_ptr\>.
```
std::shared_ptr<int> shptr = std::make_shared<int>(42);

//Mutex to protect std::shared_ptr's data
std::mutex mut;

void func1()
{
    std::lock_guard<std::mutex> lgd(mut);
    *shptr = 5;
}

void func2()
{
    std::lock_guard<std::mutex> lgd(mut);
    *shptr = 7;
}

int main()
{
    std::thread thr1(func1);
    std::thread thr2(func2);
    
    thr1.join();
    thr2.join();
}
```

## Monitor Class
A monitor class is a class that is internally synchronized, users of the class don't need to wary about synchronization.
```
class bank
{

public:
    void debit(const std::string& name, int amount)
    {
        std::lock_guard<std::mutex> lck(mut);
        //Read/write shared data
    }
    
    void credit(const std::string& name, int amount)
    {
        std::lock_guard<std::mutex> lck(mut);
        //Read/write shared data
    }
    
    void print(const std::string& name)
    {
        std::lock_guard<std::mutex> lck(mut);
        //Read/write shared data
    }

private:
    std::mutex mut;
    //Shared data
};
```
This naive solution has a number of problems. Memeber functions may call other member functions, result in recursive locks which may result in deadlocks. Clients using this class may invoke the member functions multiple times, resulting in many locking and unlocking calls, thus slowing down the application. This solution cannot be applied to classes that cannot be modified.
A slightly better solution is to add a wrapper around the object, wrappers member functions lock a mutex and forward the call to the actual object. This solution can be extended to any class but the same other problems exist here too. 
```
class bank_monitor
{

public:
    void debit(const std::string& name, int amount)
    {
        std::lock_guard<std::mutex> lck(mut);
        bank.debit(name, amount);
    }
    
    void credit(const std::string& name, int amount)
    {
        std::lock_guard<std::mutex> lck(mut);
        bank.credit(name, amount):
    }
    
    void print(const std::string& name)
    {
        std::lock_guard<std::mutex> lck(mut);
        bank.print(name);
    }

private:
    std::mutex mut;
    Bank bank; //Bank class will no longer be synchronized internally
};
```
In a spphisticated monitor class, we make it generic, wrapped class type as template parameter, so we can monitor any type of object. It will functor class with overloaded operator() having a callable object as its argument, this callable object should contain a sequence of member function calls that needs to be synchronized(transaction), we lock the mutex and invoke the callable object. This overloaded operator() will also be a template function, callable object as the template parameter.
```
template <typename T>
class monitor
{
public:
    //Constructor which takes the object of the type to monitor
    //If object is not given it uses default constructor to create object of the given type
    monitor(T data = T{}) : m_data(data)
    {
    }
    
    //Argument is a callable object of type F, which takes a argument of type T
    template <typename F>
    auto operator()(F func)
    {
        std::lock_guard<mutex> lck(mut);
        return func(m_data);
    }

private:
    //The object to be monitored
    T m_data;
    std::mutex mut;
}

int main()
{
    //Monitor wrapper for the bank class
    monitor<bank> mon;
    
    //Invoke monitors function call operator
    //pass a callable object which takes a bank argument
    mon([](bank b)
          {
              //Call the member functions, all run under the same lock
              b.debit("Peter", 1000);
              b.credit("Paul", 1000);
              b.print("Peter");
              b.print("Paul");
          });
}
```

## Semaphore
Semaphore has a counter, acquire() decrements the counter, release() increments the counter, counter can be zero, aquire() will block until the counter becomes positive again. Below is an implementation of it with condition variable.
```
class semaphore
{
public:
    void release()
    {
        std::lock_guard<std::mutex> lock(m_mtx);
        if(m_counter < MAX_COUNTER)
        {
            ++m_counter;
        }    
        m_cv.notify_all();
    }
    void acquire()
    {
        std::unique_lock<std::mutex> lock(m_mtx);
        while(m_counter == 0)
        {
            m_cv.wait(lock);
        }
        --m_counter;
    }

private:
    std::mutex m_mtx;
    std::condition_variable m_cv;
    int m_counter{0};
    int MAX_COUNTER{10}; //If MAX_COUNTER is 1 then it is a binary semaphore
};
```
C++ 20 has now implemented the semaphore, probably implemented by using the OS semaphores.

## Concurrent Queue
Queue is FIFO data structure, new items are added to the back of the queue while items are removed from the front of the queue. The std::queue present in the C++ libarry is not thread safe. Here we are going to implement a concurrent queue with std::queue as the class member.
```
template <typename T>
class concurrent_queue
{
public:
    concurrent_queue() = default;
    concurrent_queue(int max) : m_max{max}
    {
    }
    
    concurrent_queue(const concurrent_queue&) = delete;
    concurrent_queue& operator=(const concurrent_queue&) = delete; 
    concurrent_queue(concurrent_queue&&) = delete;
    concurrent_queue& operator=(const concurrent_queue&&) = delete;
    
    void push(T value)
    {
        std::lock_guard<std::mutex> lck(m_mut);
        m_queue.push(value);
        
        while(m_queue.size() > m_max)
        {
            lck.unlock();
            std::this_thread::sleep_for(50ms);
            lck.lock();
        }
        
        m_cv.notify_one();
    }
    
    void pop(T& value)
    {
        std::lock_guard<std::mutex> lck(m_mut);
        
        //If empty, wait till some data is pushed in
        m_cv.wait(lck, [this](){return !m_queue.empty();});
        
        value = m_queue.front();
        m_queue.pop();
    }

private:
    std::queue<T> m_queue;
    std::mutex m_mut;
    std::condition_variable m_cv;
    int m_max{50};
};
```

## Thread Pools
Creating a thread requires a lot of work by the OS, like creating a execution stack for the thread, creating internal data to manage the thread and scheduler context switching and executing the thread. Creating a new thread can take 10,000 times as long as calling a function directly. Thus to aviod the overhead of creating threads we use thread pools so that we can reuse threads from the pool as needed. To implement in C++ we can use a container of thread objects, with size matching the number of cores on the machine - 2(to allow for the main thread and the OS). To get the number of cores on the machine use std::thread::hardware_concurrency(). It will have a queue of tasks for the pool of threads to execute, tasks are callable objects. Thread pools work best when you have short simple tasks.
```
class thread_pool
{
public:

    thread_pool()
    {
        m_thread_count = std::thread::hardware_concurrency() - 2;
        for(int i = 0; i < m_thread_count; i++)
        {
            threads.push_back(std::thread(thread_pool::worker, this));
        }
    }
    
    ~thread_pool()
    {
        for(std::thread& thr : m_threads)
        {
            thr.join();
        }
    }

    //User of the thread_pool class can submit new tasks to execute
    submit(std::function<void()> func)
    {
        m_work_queue.push(func);
    }

private:

    void worker()
    {
        while(true)
        {
            std::function<void()> task;
            //Pop will wait if there are no new tasks to run
            m_work_queue.pop(task);
            task();
        }
    }

    concurrent_queue<std::function<void()>> m_work_queue;
    std::vector<std::thread> m_threads;
    int m_thread_count;
};
```
In this example the queue can become bottle neck, as only thread can pop a task for execution at a time. An alternative is to have a seperate queue for each thread, this way a thread never needs to wait to get its next task. To implement this we can use vector of queues above instead of a single queue, in submit we can use round-robin scheduling to put a new task on the next threads queue.  
One disadvantage with this one queue per thread approach is that if a thread's queue is empty then that thread is idle even though another queue is have tasks pending in it to execute, this another queue might be running a long running task. One way to solve this problem is to remove round-robin scheduling in submit and push the task into the queue having the least number of elements. Another solution is to employ work stealing, if a thread's queue is empty then the thread steals a task from another thread's queue.