# OOP: A computer simulation example

As you have learned, Python is an **Object Orientated Progamming (OOP) language**.  We can follow the standard paradigm of OOP in python by creating **classes** and then instantiating an instance of the class called an **object**.  A class offers a higher level of abstraction for your code than functions alone.  One of the key benefits for data science is that you can design your algorithms to be resuable and composed from multiple classes. 

**In this section you will learn:**
* How to do define a class in Python
* Apply basic object orientated techniques to build an agent based simulation.

> To help visualise our problem we are going to use a package called **MatPlotLib**.  We will cover this in detail later, but for now you just need to know that it allows us to create charts from our data.  The code is provided.
---

## Coding Problem: 'Oh data scientists - can't you all just get along!'

To learn the basics of OOP you will create an **Agent Based Simulation** (ABS).  ABS is a popular methodology in data science for modelling emergent behaviour when a group of simple **agents** interact with each other within an **environment**.  Agents can be thought of an object with state and behaviour.  That lends itself nicely to the concept of a class from OOP. The concept of emergence is important in ABS as we are not coding the outcome of the modelling.  We are instead coding some behavioural interaction rules between the agents and exploring what happens when we run the model and set the agents loose.

> A full treatment ABS is out of scope for this course, but it is used a lot in health data science. You may have seen a number of ABS models used to model the epidemiology of COVID-19.

**Here's a description of our problem:**

* Data scientists live in a 2 dimensional gridword. That is a (n X m) grid of cells.  Each cell can only be inhabited by a single data scientist.
* In grid world data scientists either favour Python or R. 
* A Data Scientist has a similarity threshold where they desire a % of their neighbours to code in their favourite language too!  
* If the agents neighbours are not similar enough the data scientist becomes unhappy and moves to a random empty location in the world!
* The simulation will run for a maximum number of iterations.  If all agents are happy in any iteration then the simulation terminates early.
* Agents make decisions about moving based on the state of the environment in the current iteration.  That is they have no knowledge if other data scientists are happy or unhappy.

> This problem is actually a classic ABS modelling study by Schelling that studied the dynamics of segregation in populations.

## Designing your code

Before you write any code take a momemt to think about what the basic class structure of your code might look like.

### Agents

The most obvious class we could include represents an **Agent**.  In some applications, you may want multiple types of agent class, but here their general behaviour is the same i.e. they become unhappy when they are not surrounded by enough of their fellow coding peers.  So here we you can simply define a single agent class and give it the attribute `language` of type `str` that determines the agents preference for Python or R.

Remember that in the simulation we will be creating multiple objects of type `DataScientist`.  This is your **population**.  The skeleton code for `DataScientist`.  

> Reminder: we have overloaded the `__init__()` method when creating an instance of `DataScientist`.  This is used to **parameterise** the agent.  Here we pass in a unique id, the coordinates in gridworld, the prefered coding language, the environment and the similarity threshold.

The method `is_unsatisified_with_neighbours` returns `True` when an agent is unhappy with similarity of its neighbours in a grid.

In [9]:
class DataScientist:
    def __init__(self, unqiue_id, row, col, language, env, threshold):
        pass
    
    def is_unsatisified_with_neighbours(self):
        pass

### Environment 

After agents the decision making about what classes to include becomes a bit more fuzzy.  You could code the whole simulation to work with a single agent class and complete the rest using functions.  However, an environment object is a useful level of abstraction when creating a framework for running your model.  The code will also be much more readable.  Another benefit is that in a more complicated problem you might need to include multiple environments type classes (e.g. hospitals containing agents).

Here you will define a class called `GridWorld` that contains a population of `DataScientist` objects.  We will take a look how these agents are stored internally shortly.



In [5]:
class GridWorld:
    def __init__(self, n_rows, n_cols, n_empty, random_seed=None):
        pass
    
    def get_neighbours(self, row, col):
        pass
    
    def relocate(self, agent):
        pass

### The main loop

The simulation consists of a main loop that either runs for a maximum number of iterations or terminates early if all agents are happy.  Again you could just implement this as a function.  However, you may want to explore multiple **scenarios** with the model.  A scenario for example might explore a different similarity thresholds or ration of 'Python' coders to 'R' coders (for example what happens in a world where all all coders use Python and none use R!?).  For this reason you will create a simple `Model` class.  This will include a `run()` method that will start the main loop and the simulation. 


In [7]:
class Model:
    def __init__(self, environment, max_iter):
        pass
    
    def run(self):
        pass

## Coding the ABS using OOP techniques

### Imports

You will mostly be using standard python in this example, but you will also make use of the following classes:

* `itertools`: we will use the `product` function to help use quickly generate a large number of coordinate pairs.
* `random`: the movement of agents in gridworld is stochastic.  We use the the `choice` function to select a random location for movement of unhappy agents.
* `time`: for fun we will time our code to see how long it take to execute a model (using the imaginitiviely titled `time.time()` function.)
* `matplolib.pyplot`: used to plot a gridworld.  We will use circles of different colours to represent agents.

In [None]:
import itertools
import random
import time
import matplotlib.pyplot as plt

### Constants and default parameter values

In [2]:
# grid defaults
N_ROWS = 32
N_COLS = 32
N_CELLS = N_ROWS * N_COLS

# coder language constants
LANG_PYTHON = 'PYTHON'
LANG_R = 'R'

# default simulation parameters
RATIO_R_TO_PYTHON = 0.3
PERCENT_EMPTY = 0.2
SIMILARITY_THRESHOLD = 0.3
MAX_ITER = 500

## The `DataScientist` class

When creating a new `DataScientist` we will use the following syntax

```python
# create an new data scientist agent that prefers python
agent = DataScientist(unique_id=1, row=0, col=0, language=LANG_PYTHON, 
                      env=grid_env, threshold=SIMILARITY_THRESHOLD)
```

> In the above code snippet `grid_env` is an instance of the `GridWorld` class that an agent lives and interacts with other agents.

As you have already learned `agent` is an instance of the class `DataScientist`.  At this stage all the code has done is parameterise the agent and position it in the grid environment.  The code below demonstrates how you would store these parameters as class attributes using `self`.  

```python
class DataScientist:
    def __init__(self, unqiue_id, row, col, language, env,
                 threshold=SIMILARITY_THRESHOLD):
        # store the class attributes
        self.unique_id = unique_id
        self.row = row
        self.col = col
        self.language = language
        self.env = env
        self.threshold = threshold
        
        
    def is_unsatisified_with_neighbours(self):
        pass
```

The only public method that `DataScientist` makes available is `is_unsatisified_with_neighbours()`.  For the agent to makes its decision it must first gather its neighbours.  This can be done by calling `GridWorld.get_neighbours(row, col)`.  A `DataScientist` agent contains the attribute `self.env` which is a reference to the grid world.  You can therefore access the agent's neighbours like so:

```python
neighbours = self.env.get_neighbours(self.row, self.col)
```

We haven't implemented the `GridWorld` class yet, but for now you can note that `get_neighbours` returns a `list` of `DataScientist` objects that live in the neighbouring cells to the agent. Note this list is of variable length and depending on the number of empty cells next to an agent. From here is straightforward to calculate the percentage of neighbours that use the same coding language.  As an example code could:

* Calculate the number of neighbours by taking the `len` of the returned list.
* Calculate the number of similar neighbours using a loop or list comprehension.
* Check for the special case where the number of neighbours is zero and return False.
* Calculate the proportion of similar neighbours and check if this is less than the agent's similarity threshold.



In [None]:
class DataScientist:
    def __init__(self, unqiue_id, row, col, language, env,
                 threshold=SIMILARITY_THRESHOLD):
        self.unique_id = unique_id
        self.row = row
        self.col = col
        self.language = language
        self.env = env
        self.threshold = threshold
        
        
    def is_unsatisified_with_neighbours(self):
        pass

In [None]:
class DataScientist:
    def __init__(self, unqiue_id, row, col, language, env,
                 threshold):
        pass
    
    def is_unsatisified_with_neighbours(self):
        pass