## Data structures
Depending on the task we need to do, a different data structure is optimal for the job,.. they all have pros and cons and fit particular situations better than others
We saw:
- Lists, the basic one, simply holds an ordered collection of stuff
- Sets, holds an unordered collection of unique items
- Dictionaries, match keys (usually words or text) with items.

Each comes with a set of handy functionalities, methods and functions that you can apply on them,..

In Python, all these data structures are actually Classes. And we can define new classes to have new data structures that provide us with very specific features/methods,
useful for working in our particular use cases. 

In [1]:
l1 = list([1, 2])  # When we create a new list, we actually create an instance of a class => we create a 'list' object.
l1.append(3)  # We can then call instance methods on that objects, like 'append()' or '__str__()'
l1.__str__()  

'[1, 2, 3]'

In [2]:
# Reimplementation of the 'list' class
class MyList:
    #  All 'instance methods', i.e. functions that will be specific to each object's state, must be defined with 'self' as the first argument, which is the way to refer to the object's state
    #  The __init__ method is a special method (magic method), that is called implicitely when we create a new object/instance of the class. 
    def __init__(self, initial_list=None):
        if initial_list == None:
            self.inner_list = []
        else:
            self.inner_list = initial_list
    
    def append(self, number):
        self.inner_list.append(number)
        
    def __str__(self):
        return self.inner_list
    
l2 = MyList([1, 2])
l2.append(3)
l2.__str__()
l2.inner_list

[1, 2, 3]

### Vocabulary
ex: 
```
dict_1 = {"hey": "hoo"} 
dict_2 = {"mykey": 56}
```
dict_1 and dict_2 are both **instances** of the **Dictionary class**, they're both objects. The **Dictionary class** simply defines that all dictionaries are made of keys (like "hey" in dict_1 and "mykey" in dict_2) and values ("hoo", 56,..). Then, the **dictionary instances** dict_1 and dict_2 have different keys and values.

- Object: an instance of a class
- Class: a mould or factory, this is the abstract, parametric definition of what our objects should look like, for instance in Dictionaries, the class enforces that all dictionaries are represented by keys and values. Classes consist in a set of attributes, that will later hold the 'state' of the objects, and a set of methods, allowing users to interact with the object's state.
- state: the state of an object is the set of values of its attributes. (see example below)
- attributes: variables that are defined on an object. The class defines the names of these variables and then different values are set by each new instance of the class. 
- method: a python function, defined inside the definition of a class and enabling to output/modify/set,.. the state of the class's objects when called. The methods of a class define how you should interact with objects of that class. For instance, the List class defines that you should use the append() method to add items to the end of the list. 
- magic methods: a bunch of methods called implicitly by Python in common contexts. Magic methods always start and end with \_\_: like \_\_init\_\_, \_\_add\_\_, \_\_repr\_\_... 
    For instance, in Python, since everything is an object, when we write ```3 + 4```, this is what happens: ```int(3).__add__(int(4))``` -> 
    1. We create an instance of the 'int' class with value 3 
    2. We create an instance of the 'int' class with value 4
    3. We call the magic \_\_add\_\_ method on the first int object with the second int object as argument.
- constructor (\_\_init\_\_() ) : a magic method called automatically whenever we create a new object of a class -> it is used to initialise the object, by setting some values to its attributes.

In [3]:
#  Example illustrating the notion of state, here the state of any Water object is simply defined by the 'state' attribute, 
#  programming languages extend that notion of 'state' to be all the variables that describe how a particular element is right now.

#  So in programming language vocabulary: The state of a Water object is defined by one attribute/property called 'state'.


class Water:
    def __init__(self):
        print("You just created a new water object!")
        self.state = "liquid"
    
    def heat(self):
        print("heating...")
        if self.state == "solid":
            self.state = "liquid"
        else:
            self.state = "gas"
    
    def cool(self):
        print("cooling...")
        if self.state == "gas":
            self.state = "liquid"
        else:
            self.state = "solid"      
            
my_water = Water() #  We create a new instance of the class 'Water', this line of code calls the constructor '__init__' method of the Water class, which initializes the state of water to 'liquid'
print("The state of my water is " + my_water.state)  #  We can access the attribute of the water object to check which state the water is in.
my_water.heat()  #  We can call a method on our water instance to modify its state.
print("The state of my water is now " + my_water.state)
my_water.cool() 
print("The state of my water is now " + my_water.state)
my_water.cool() 
print("The state of my water is now " + my_water.state)

You just created a new water object!
The state of my water is liquid
heating...
The state of my water is now gas
cooling...
The state of my water is now liquid
cooling...
The state of my water is now solid


## The Mould metaphor
Conceptually, classes are like moulds, they just define a 'shape' of how the cake should look, and then cakes made out of that mould are objects/instances. 

The conceptual shape of the cake, defined by the class, is a parametrized definition of a concept: the class defines that a cake is represented by a list of ingredients and a quantity of sugar without injecting any values in these attributes yet. Then, using this definition, we can describe/represent many different cakes as objects of the class by associating values with these attributes. 

In [4]:
class CakeMould:       
    def __init__(self, ingredients, sugar_quantity):
        self.ingredients = ingredients
        self.sugar = sugar_quantity
        
    def get_sugar_weight(self):
        return f"This cake has {self.sugar} g of sugar"

my_first_cake = CakeMould(["flour", "sugar"], 300)
my_second_cake = CakeMould(["strawberries", "chocolate", "eggs", "sugar"], 0)
print(my_first_cake.ingredients)
print(my_second_cake.ingredients)

dir(CakeMould)
print(my_second_cake)

['flour', 'sugar']
['strawberries', 'chocolate', 'eggs', 'sugar']
<__main__.CakeMould object at 0x000001FA1C79CFA0>


In [5]:
#  Example showing inheritance: FlourAndSugarBasedCakeMould inherits from CakeMould (it has the same attributes and methods) and overrides the __init__ method
class FlourAndSugarBasedCakeMould(CakeMould):
    def __init__(self, ingredients, sugar_quantity):
        super().__init__(ingredients, sugar_quantity)  #  Here, we call the parent class' __init__ method, that is: we first initialize the object as we did for the previous class
        self.ingredients.append("flour")  #  Then, we do extra stuff that we didn't do in the CakeMould class. 
        self.ingredients.append("sugar")

my_first_flour_and_sugar_based_cake = FlourAndSugarBasedCakeMould(["eggs"], 300)
print(my_first_flour_and_sugar_based_cake.ingredients)

['eggs', 'flour', 'sugar']


So inheritance is a very powerful tool that allows us to reuse a lot of code and avoid duplicating lines of code. 

## Exercise: define the Material class

So far, we represented Materials as dictionaries, like:
```
{'id': 1,
 'name': '20 MPa concrete mix (30% FA)',
 'category': 'Minerals',
 'type': 'Concrete',
 'functional_unit': 'm³',
 'description': 'Concrete is a composite material combining sand or other fine aggregates. coarse aggregates. a binder and water. Portland cement is the most commonly used binder. however other binders. such as polymers. may also be used. Supplementary Cementitious Materials (SCM) such as Fly Ash and Ground. Granulated Blast Furnace Slag (GGBFS). are also commonly used as a part replacement for Portland cement. Additives. such as...',
 'common_uses': 'Floor slabs. suspended slabs. driveways. precast wall panels',
 'comments': nan,
 'embodied_energy': 2.026212111640545,
 'embodied_water': 4011.163289376414,
 'embodied_carbon': 250.8342200202523,
 'weight': 2335.0}
```
This is very permissive, because it allows to add any key, like a key called 'Arta' to the dictionary describing a Material, which doesn't make much sense. 
It also allows defining a material without providing a name,... since all the Dictionary class enforces is just that we have keys and values.
In this case, it could be useful to find a more systematic way to represent Materials, by creating a class whose objects will represent materials. 
That way, we can enforce that all Materials are described exactly by these attributes:
```
['id', 'name', 'category', 'type', 'functional_unit', 'description', 'common_uses', 'comments', 'embodied_energy', 'embodied_water', 'embodied_carbon', 'weight']
```
We could then define a first method on Materials that combines some of these attributes to create a unique identifier for materials. 

In [6]:
# SOLUTION
class Material:
    def __init__(self, id, name, category, type, functional_unit, description, common_uses, comments, embodied_energy, embodied_water, embodied_carbon, weight):
        self.id = id
        self.name = name
        self.category = category
        self.type = type
        self.functional_unit = functional_unit
        self.description = description
        self.common_uses = common_uses
        self.comments = comments
        self.embodied_energy = embodied_energy
        self.embodied_water = embodied_water
        self.embodied_carbon = embodied_carbon
        self.weight = weight            
    
    def __repr__(self):
        #  The __repr__ method is called implicetly when we do 'print(object)', so it should return a way to represent the object based on its attributes' values
        return f"({self.id})Material|{self.name}|{self.category}|{self.type}|{self.functional_unit}"    

### Useful helpers when the number of function arguments gets intense
#### Variable number of arguments
When we have as many arguments as the \_\_init\_\_ function above, it can get hard to keep track of all of them, 
so python function can be defined to accept a variable number of arguments, using the \* notation.

In [7]:
def function_with_random_number_of_arguments(*arguments):
    #  Inside the body of the function 'arguments' is then considered as a list. 
    print(len(arguments))
    
function_with_random_number_of_arguments()  # calling the function without any arguments
function_with_random_number_of_arguments(0)  #  calling the function with 1 argument, which is the number 0
function_with_random_number_of_arguments("hey", 34, [1, 2, 3]) # calling the function with 3 arguments, namely a string, a number, and a list. 

0
1
3


the \* notation also has another meaning when used in the function call, it can destructure a list, that is, if I call ```my_function(*my_list)``` , all the items of my_list will be extracted and fed to my_function as arguments.

In [8]:
my_arguments = ["hey", 34, [1, 2, 3]]  # we put our arguments in a list
function_with_random_number_of_arguments(*my_arguments) # we destructure the list to feed its items as arguments 
#  so here, we have 3 arguments, this is equivalent to the last line of the last code cell

3


#### Named arguments
By default, when we call a function, the order in which we specify values for arguments is important, because that is how the mapping is done in the body of the function:
for instance:
```
def minus(value1, value2):
    return value1 - value2
```
calling ```minus(3, 4)``` or ```minus(4, 3)``` is very different! In the first case, we map the value '3' with the variable called 'value1' and the value '4' with the variable called 'value2' and in the second case it's the opposite. Although this might seem okay for such a small example where we have 2 arguments, it can be harder to keep track of the expected order of arguments when we have more of them.
To clarify this, a Python function can be called while explicitely naming each argument, like:
```minus(value1=3, value2=4)```

In [9]:
def minus(value1, value2):
    return value1 - value2

print(minus(3, 4))
print(minus(4, 3))
print(minus(value1=3, value2=4)) #  Less ambiguity

-1
1
-1


### Putting it all together:variable number of named arguments
combining the idea of named arguments and variable number of arguments, we can write a function that accepts a variable number of named arguments with the double \*\* notation:

In [10]:
def function_with_random_number_of_named_arguments(**named_arguments):
    #  Inside the body of the function 'named_arguments' is then considered as a dict. 
    print(named_arguments)
    
function_with_random_number_of_named_arguments(first="hey", second=34, random_name=[1, 2, 3]) 

{'first': 'hey', 'second': 34, 'random_name': [1, 2, 3]}


Similarly to the variable number of arguments \* notation, we can also use the \*\* notation when calling a function to feed named argument to a function from a dictionary:

In [11]:
my_named_arguments = {"first":"hey", "second":34, "random_name":[1, 2, 3]}
function_with_random_number_of_named_arguments(**my_named_arguments)

{'first': 'hey', 'second': 34, 'random_name': [1, 2, 3]}


Using all these tricks, we could rewrite the Material class in a simpler way:

In [12]:
class Material:
    def __init__(self, **keyword_arguments):
        for argument_name, argument_value in keyword_arguments.items():
            setattr(self, argument_name, argument_value) # setattr takes 3 parameters: an object, an attribute name and an attribute value and runs: object.<attribute name> = <attribute value>
    
    def __repr__(self):
        return f"({self.id})Material|{self.name}|{self.category}|{self.type}|{self.functional_unit}"   

But now we went full circle, because we're just allowing people to define Materials with any named arguments we pass it, so to enforce that the attribute names are authorized, we could add:

In [13]:
class Material:
    def __init__(self, **keyword_arguments):
        allowed_attrs = ['id', 'name', 'category', 'type', 'functional_unit', 'description', 'common_uses', 
                         'comments', 'embodied_energy', 'embodied_water', 'embodied_carbon', 'weight']        
        for argument_name, argument_value in keyword_arguments.items():
            if argument_name in allowed_attrs:
                setattr(self, argument_name, argument_value) 
                
    def __repr__(self):
        return f"({self.id})Material|{self.name}|{self.category}|{self.type}|{self.functional_unit}"   

In [14]:
#  We can also use the ** notation to easily convert our list of material dictionaries into a list of Material objects, like so:

## READING THE DATA
import pandas as pd
df_mats = pd.read_csv("../data/materials.csv", sep=";")
materials = df_mats.to_dict(orient="records")

material_objects = []
for material in materials:
    material_object = Material(**material) #  we destructure the material dictionary and pass its content as arguments to the Material class constructor
    material_objects.append(material_object)

print(material_objects[34].name) #  We can access attributes of any of the objects
material_objects[:5]  # Or just output the whole list, where each Material object will be represented using our very own __repr__ method! 

Cement mortar


[(1)Material|20 MPa concrete mix (30% FA)|Minerals|Concrete|m³,
 (2)Material|20 MPa concrete mix (30% GGBFS)|Minerals|Concrete|m³,
 (3)Material|20 MPa concrete mix|Minerals|Concrete|m³,
 (4)Material|25 MPa concrete mix (30% FA)|Minerals|Concrete|m³,
 (5)Material|25 MPa concrete mix (30% GGBFS)|Minerals|Concrete|m³]

### Writing conventions:
Although you are free to name your functions, variables, methods and classes however you like, there are a couple of good practices/conventions that are usually respected. 
These writing conventions sometimes differ from one programming language to another, but here are the common rules for Python:
- class names should be written in PascalCase, with the first letter in caps, all words attached and each first letter of each word in caps
- variables, functions, methods should be written in snake_case, without any caps, and with words separated by _