<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/polyhedron-gdl/introduction-to-python/blob/main/notebooks/pragmatic_introduction_to_python_language_3.ipynb">
        <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

# Writing Structured Programs

By now you will have a sense of the capabilities of the Python programming language for processing natural language. However, if you're new to Python or to programming, you may still be wrestling with Python and not feel like you are in full control yet. In this chapter we'll address the following questions:

- How can you write well-structured, readable programs that you and others will be able to re-use easily?
- How do the fundamental building blocks work, such as loops, functions and assignment?
- What are some of the pitfalls with Python programming and how can you avoid them?

Along the way, you will consolidate your knowledge of fundamental programming constructs, learn more about using features of the Python language in a natural and concise way, and learn some useful techniques in visualizing natural language data. 

## Back to the Basics

### Assignment

Assignment would seem to be the most elementary programming concept, not deserving a separate discussion. However, there are some surprising subtleties here. Consider the following code fragment:

In [51]:
a = 'Monty'
b = a 
a = 'Python'
print(b)

Monty


This behaves exactly as expected. When we write `b = a` in the above code, the value of `a` (the string `'Monty'`) is assigned to `b`. That is, **`b` is a copy of `a`**, so when we overwrite a with a new string `'Python'`, the value of `b` is not affected.

However, assignment statements do not always involve making copies in this way. Assignment always copies the value of an expression, but a value is not always what you might expect it to be. In particular, the "value" of a structured object such as a list is actually just a **reference** to the object. In the following example, assigns the reference of `a` to the new variable `b`. Now when we modify something inside `a` on line, we can see that the contents of bar have also been changed.

In [23]:
a = ['Monty', 'Python']
b = a
print("Now we print b list...")
print(b)
a[1] = 'Jupyter'
print("Now we print again b list...")
print(b)

Now we print b list...
['Monty', 'Python']
Now we print again b list...
['Monty', 'Jupyter']


<!--
![list_assignment_and_memory.png](attachment:list_assignment_and_memory.png)
-->

In [24]:
empty=[]
nested = [empty, empty, empty]
nested

[[], [], []]

In [25]:
nested[0].append('Python')
nested

[['Python'], ['Python'], ['Python']]

In [26]:
nested[1] = 'Monty'
nested

[['Python'], 'Monty', ['Python']]

In [27]:
nested[0].append('Jupyter')
nested

[['Python', 'Jupyter'], 'Monty', ['Python', 'Jupyter']]

### Equality

Python provides two ways to check that a pair of items are the same. The `is` operator tests for object identity. We can use it to verify our earlier observations about objects. First we create a list containing several copies of the same object, and demonstrate that they are not only identical according to `==`, but also that they are one and the same object: ``

In [28]:
size     = 5
x        = ['Python']
myList   = [x] * 5
print(myList)
# check if values are the same...
print(myList[0] == myList[1] == myList[2] == myList[3] == myList[4])
# check if objects are the same...
print(myList[0] is myList[1] is myList[2] is myList[3] is myList[4])

[['Python'], ['Python'], ['Python'], ['Python'], ['Python']]
True
True


In [29]:
myList[1] = ['Python']
print(myList)
# check if values are the same...
print(myList[0] == myList[1] == myList[2] == myList[3] == myList[4])
# check if objects are the same...
print(myList[0] is myList[1] is myList[2] is myList[3] is myList[4])

[['Python'], ['Python'], ['Python'], ['Python'], ['Python']]
True
False


In [30]:
id_list = [id(x) for x in myList]
print(id_list)

[2446720697096, 2446720983368, 2446720697096, 2446720697096, 2446720697096]


## Functions: The Foundation of Structured Programming

Functions provide an effective way to package and re-use program code, as already explained in ?. They also help make it reliable. When we re-use code that has already been developed and tested, we can be more confident that it handles a variety of cases correctly. We also remove the risk that we forget some important step, or introduce a bug. The program that calls our function also has increased reliability. The author of that program is dealing with a shorter program, and its components behave transparently.

To summarize, as its name suggests, a function captures functionality. It is a segment of code that can be given a meaningful name and which performs a well-defined task. Functions allow us to abstract away from the details, to see a bigger picture, and to program more effectively.

### Divide-and-Conquer Algorithm Pattern

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

### Function Inputs and Outputs

We pass information to functions using a function's parameters, the parenthesized list of variables and constants following the function's name in the function definition. Here's a complete example:

In [33]:
def repeat(msg, num):  
    return ' '.join([msg] * num)

monty = 'Monty Python'
repeat(monty, 3) 

'Monty Python Monty Python Monty Python'

- We first define the function to take two parameters, `msg` and `num`; 
- Then we call the function and pass it two arguments, `monty` and `3`; 
- these arguments fill the "placeholders" provided by the parameters and provide values for the occurrences of `msg` and `num` in the function body.

It is not necessary to have any parameters.A function usually communicates its results back to the calling program via the `return` statement, as we have just seen.

### Parameter Passing

As we have previously seen assignment works on values, but that the value of a structured object is a reference to that object. The same is true for functions. Python interprets function parameters as values (this is known as call-by-value). In the following code, set_up() has two parameters, both of which are modified inside the function. We begin by assigning the string 'PUT' to w and an empty list to p. After calling the function, w is unchanged, while p is changed:

In [34]:
def set_up(option_type, properties):
    option_type = 'CALL'
    properties.append('strike')
    properties = 5
    
w = 'PUT'
p = ['spot', 'volatility']

set_up(w, p)

print(w)
print(p)

PUT
['spot', 'volatility', 'strike']


Notice that w was not changed by the function. When we called set_up(w, p), the value of w (an empty string) was assigned to a new variable called 'option_type'. Inside the function, the value of 'option_type' was modified. However, that change did not propagate to w.

Let's look at what happened with the list p. When we called set_up(w, p), the value of p (a reference to an empty list) was assigned to a new local variable properties, so both variables now reference the same memory location. The function modifies properties, and this change is also reflected in the value of p as we saw. The function also assigned a new value to properties (the number 5); this did not modify the contents at that memory location, but created a new local variable. 

Thus, to understand Python's call-by-value parameter passing, it is enough to understand how assignment works. Remember that you can use the id() function and is operator to check your understanding of object identity after each statement.

### Variable Scope

Function definitions create a new, local scope for variables. When you assign to a new variable inside the body of a function, the name is only defined within that function. The name is not visible outside the function, or in other functions. This behavior means you can choose variable names without being concerned about collisions with names used in your other function definitions.

When you refer to an existing name from within the body of a function, the Python interpreter first tries to resolve the name with respect to the names that are local to the function. If nothing is found, the interpreter checks if it is a global name within the module. Finally, if that does not succeed, the interpreter checks if the name is a Python built-in. This is the so-called LGB rule of name resolution: local, then global, then built-in.

### Checking Parameter Types

Python does not allow us to declare the type of a variable when we write a program, and this permits us to define functions that are flexible about the type of their arguments. For example, a tagger might expect a sequence of words, but it wouldn't care whether this sequence is expressed as a list or a tuple (or an iterator, another sequence type that is outside the scope of the current discussion).

However, often we want to write programs for later use by others, and want to program in a defensive style, providing useful warnings when functions have not been invoked correctly. The author of the following tag() function assumed that its argument would always be a string.

In [35]:
def tag(word):
    if word in ['a', 'the', 'all']:
        return 'det'
    else:
        return 'noun'

tag('the')
'det'
tag('knight')
'noun'
tag(["nothing", 'but', 'a', 'scratch'])
'noun'

'noun'

The function returns sensible values for the arguments 'the' and 'knight', but look what happens when it is passed a list — it fails to complain, even though the result which it returns is clearly incorrect. The author of this function could take some extra steps to ensure that the word parameter of the `tag()` function is a string. A naive approach would be to check the type of the argument using `if not type(word) is str`, and if word is not a string, to simply return Python's special empty value, `None`. This is a slight improvement, because the function is checking the type of the argument, and trying to return a "special", diagnostic value for the wrong input. However, it is also dangerous because the calling program may not detect that `None` is intended as a "special" value, and this diagnostic return value may then be propagated to other parts of the program with unpredictable consequences. Here's a better solution, using an `assert` statement.

In [36]:
def tag(word):
     assert isinstance(word, str), "argument to tag() must be a string"
     if word in ['a', 'the', 'all']:
         return 'det'
     else:
         return 'noun'

In [37]:
tag('the')

'det'

In [38]:
tag(['a','the','all'])        

AssertionError: argument to tag() must be a string

If the assert statement fails, it will produce an error that cannot be ignored, since it halts program execution. Additionally, the error message is easy to interpret. Adding assertions to a program helps you find logical errors, and is a kind of **defensive programming**. 

A more fundamental approach is to document the parameters to each function using docstrings as described later in this section.

### Functional Decomposition
Well-structured programs usually make extensive use of functions. When a block of program code grows longer than 10-20 lines, it is a great help to readability if the code is broken up into one or more functions, each one having a clear purpose. This is analogous to the way a good essay is divided into paragraphs, each expressing one main idea.

Functions provide an important kind of abstraction. They allow us to group multiple actions into a single, complex action, and associate a name with it.

Appropriate use of functions makes programs more readable and maintainable. Additionally, it becomes possible to reimplement a function — replacing the function's body with more efficient code — without having to be concerned with the rest of the program.



In [39]:
import nltk

from urllib import request
from bs4 import BeautifulSoup

def freq_words(url, freqdist, n):
    html = request.urlopen(url).read().decode('utf8')
    raw = BeautifulSoup(html, 'html.parser').get_text()
    for word in nltk.word_tokenize(raw):
        freqdist[word.lower()] += 1
    result = []
    for word, count in freqdist.most_common(n):
        result = result + [word]
    print(result)

In [40]:
book = "https://www.gutenberg.org/files/84/84-h/84-h.htm"
fd = nltk.FreqDist()
freq_words(book, fd, 30)

[',', 'the', 'and', 'i', '.', 'of', 'to', 'my', 'a', 'in', 'that', 'was', ';', 'me', 'with', 'but', 'had', 'you', 'he', 'not', 'which', 'it', 'as', 'his', 'for', 'by', '“', 'on', 'this', 'from']


This function has a number of problems. The function has two side-effects: it modifies the contents of its second parameter, and it prints a selection of the results it has computed. The function would be easier to understand and to reuse elsewhere if we initialize the `FreqDist()` object inside the function (in the same place it is populated), and if we moved the selection and display of results to the calling program. Given that its task is to identify frequent words, it should probably just return a list, not the whole frequency distribution.

In [41]:
def freq_words(url, n):
    html = request.urlopen(url).read().decode('utf8')
    text = BeautifulSoup(html, 'html.parser').get_text()
    freqdist = nltk.FreqDist(word.lower() for word in nltk.word_tokenize(text))
    return [word for (word, _) in fd.most_common(n)]

In [42]:
freq_words(book, 5)

[',', 'the', 'and', 'i', '.']

### Documenting Functions

In [43]:
def accuracy(reference, test):
    """
    Calculate the fraction of test items that equal the corresponding reference items.

    Given a list of reference values and a corresponding list of test values,
    return the fraction of corresponding values that are equal.
    In particular, return the fraction of indexes
    {0<i<=len(test)} such that C{test[i] == reference[i]}.

        >>> accuracy(['ADJ', 'N', 'V', 'N'], ['N', 'N', 'V', 'ADJ'])
        0.5

    :param reference: An ordered list of reference values
    :type reference: list
    :param test: A list of values to compare against the corresponding
        reference values
    :type test: list
    :return: the accuracy score
    :rtype: float
    :raises ValueError: If reference and length do not have the same length
    """

    if len(reference) != len(test):
        raise ValueError("Lists must have the same length.")
    num_correct = 0
    for x, y in zip(reference, test):
        if x == y:
            num_correct += 1
    return float(num_correct) / len(reference)

### Functions as Arguments

So far the arguments we have passed into functions have been simple objects like strings, or structured objects like lists. Python also lets us pass a function as an argument to another function. Now we can abstract out the operation, and apply a different operation on the same data. As the following examples show, we can pass the built-in function `len()` or a user-defined function `last_letter()` as arguments to another function:

In [44]:
sent = "A beginning is the time for taking the most delicate care that the balances are correct."
sent = sent.split()

In [45]:
def extract_property(prop):
     return [prop(word) for word in sent]

In [46]:
extract_property(len)

[1, 9, 2, 3, 4, 3, 6, 3, 4, 8, 4, 4, 3, 8, 3, 8]

In [47]:
def last_letter(word):
    return word[-1]

In [48]:
for c in extract_property(last_letter):
    print(c, end = " ")

A g s e e r g e t e e t e s e . 

### Lambda Expressions

Python provides us with one more way to define functions as arguments to other functions, so-called lambda expressions. Supposing there was no need to use the above last_letter() function in multiple places, and thus no need to give it a name. 
We can equivalently write the following:

In [49]:
for c in extract_property(lambda w: w[-1]):
    print(c, end = " ")

A g s e e r g e t e e t e s e . 

Lambda expressions in Python and other programming languages have their roots in lambda calculus, a model of computation invented by Alonzo Church. If you want to know more about this argument read this [blog](https://realpython.com/python-lambda/).

In the example above, the expression is composed of:

- The keyword: lambda
- A bound variable: w
- A body: w[-1]

You can apply the function written in this way to an argument by surrounding the function and its argument with parentheses. For example:

In [50]:
(lambda x: x + 1)(2)

3

Because a lambda function is an expression, it can be named. Therefore you could write the previous code as follows:

In [51]:
add_one = lambda x: x + 1
add_one(2)

3

These functions all take a single argument. You may have noticed that, in the definition of the lambdas, the arguments don’t have parentheses around them. Multi-argument functions (functions that take more than one argument) are expressed in Python lambdas by listing arguments and separating them with a comma (,) but without surrounding them with parentheses:

In [52]:
point_in_space = lambda first, second: 'the x coordinate is : %s and the y coordinate is %s' % (first, second)

In [53]:
point_in_space(3,2)

'the x coordinate is : 3 and the y coordinate is 2'

The lambda function assigned to `point_in_spaces` takes two arguments and returns a string interpolating the two parameters `first` and `second`. As expected, the definition of the lambda lists the arguments with no parentheses, whereas calling the function is done exactly like a normal Python function, with parentheses surrounding the arguments.

### Accumulative Functions and Generators

These functions start by initializing some storage, and iterate over input to build it up, before returning some final object (a large structure or aggregated result). A standard way to do this is to initialize an empty list, accumulate the material, then return the list, as shown in function search1()

In [63]:
def search1(substring, words):
    result = []
    for word in words:
        if substring in word:
            result.append(word)
    return result

In [67]:
for item in search1('zz', nltk.corpus.brown.words()):
    print(item, end=" ")

Grizzlies' fizzled Rizzuto huzzahs dazzler jazz Pezza Pezza Pezza embezzling embezzlement pizza jazz Ozzie nozzle drizzly puzzle puzzle dazzling Sizzling guzzle puzzles dazzling jazz jazz Jazz jazz Jazz jazz jazz Jazz jazz jazz jazz Jazz jazz dizzy jazz Jazz puzzler jazz jazzmen jazz jazz Jazz Jazz Jazz jazz Jazz jazz jazz jazz Jazz jazz jazz jazz jazz jazz jazz jazz jazz jazz Jazz Jazz jazz jazz nozzles nozzle puzzle buzz puzzle blizzard blizzard sizzling puzzled puzzle puzzle muzzle muzzle muezzin blizzard Neo-Jazz jazz muzzle piazzas puzzles puzzles embezzle buzzed snazzy buzzes puzzled puzzled muzzle whizzing jazz Belshazzar Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie's Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie blizzard blizzards blizzard blizzard fuzzy Lazzeri Piazza piazza palazzi Piazza Piazza Palazzo Palazzo Palazzo Piazza Piazza Palazzo palazzo palazzo Palazzo Palazzo Piazza piazza piazza piazza Piazza Piazza Palazzo palazzo Piazza piazz

In [68]:
def search2(substring, words):
    for word in words:
        if substring in word:
            yield word

In [69]:
for item in search2('zz', nltk.corpus.brown.words()):
    print(item, end=" ")

Grizzlies' fizzled Rizzuto huzzahs dazzler jazz Pezza Pezza Pezza embezzling embezzlement pizza jazz Ozzie nozzle drizzly puzzle puzzle dazzling Sizzling guzzle puzzles dazzling jazz jazz Jazz jazz Jazz jazz jazz Jazz jazz jazz jazz Jazz jazz dizzy jazz Jazz puzzler jazz jazzmen jazz jazz Jazz Jazz Jazz jazz Jazz jazz jazz jazz Jazz jazz jazz jazz jazz jazz jazz jazz jazz jazz Jazz Jazz jazz jazz nozzles nozzle puzzle buzz puzzle blizzard blizzard sizzling puzzled puzzle puzzle muzzle muzzle muezzin blizzard Neo-Jazz jazz muzzle piazzas puzzles puzzles embezzle buzzed snazzy buzzes puzzled puzzled muzzle whizzing jazz Belshazzar Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie's Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie blizzard blizzards blizzard blizzard fuzzy Lazzeri Piazza piazza palazzi Piazza Piazza Palazzo Palazzo Palazzo Piazza Piazza Palazzo palazzo palazzo Palazzo Palazzo Piazza piazza piazza piazza Piazza Piazza Palazzo palazzo Piazza piazz

The function `search2()` is a **generator**. The first time this function is called, it gets as far as the `yield` statement and pauses. The calling program gets the first word and does any necessary processing. Once the calling program is ready for another word, execution of the function is continued from where it stopped, until the next time it encounters a `yield` statement. This approach is typically more efficient, as the function only generates the data as it is required by the calling program, and does not need to allocate additional memory to store the output.  

## Error Handling

### Defensive Programming

In order to avoid some of the pain of debugging, it helps to adopt some defensive programming habits. Instead of writing a 20-line program then testing it, build the program bottom-up out of small pieces that are known to work. Each time you combine these pieces to make a larger unit, test it carefully to see that it works as expected. Consider adding assert statements to your code, specifying properties of a variable, e.g. assert(isinstance(text, list)). If the value of the text variable later becomes a string when your code is used in some larger context, this will raise an AssertionError and you will get immediate notification of the problem.

Once you think you've found the bug, view your solution as a hypothesis. Try to predict the effect of your bugfix before re-running the program. If the bug isn't fixed, don't fall into the trap of blindly changing the code in the hope that it will magically start working again. Instead, for each change, try to articulate a hypothesis about what is wrong and why the change will fix the problem. Then undo the change if the problem was not resolved.

As you develop your program, extend its functionality, and fix any bugs, it helps to maintain a suite of test cases. This is called regression testing, since it is meant to detect situations where the code "regresses" — where a change to the code has an unintended side-effect of breaking something that used to work. Python provides a simple regression testing framework in the form of the doctest module. This module searches a file of code or documentation for blocks of text that look like an interactive Python session, of the form you have already seen many times in this book. It executes the Python commands it finds, and tests that their output matches the output supplied in the original file. Whenever there is a mismatch, it reports the expected and actual values. For details consult the doctest documentation at http://docs.python.org/library/doctest.html. Apart from its value for regression testing, the doctest module is useful for ensuring that your software documentation stays in sync with your code.

Perhaps the most important defensive programming strategy is to set out your code clearly, choose meaningful variable and function names, and simplify the code wherever possible by decomposing it into functions and modules with well-documented interfaces.

### Catch Runtime Errors

First of all let's describe the difference between syntax errors and exceptions. Consider the following line of code:

In [75]:
print(0/0))

SyntaxError: invalid syntax (<ipython-input-75-ded43ae9deb7>, line 1)

The arrow indicates where the parser ran into the syntax error. In this example, there was one bracket too many. Remove it and run your code again:

In [76]:
print(0/0)

ZeroDivisionError: division by zero

This time, you ran into an exception error. This type of error occurs whenever syntactically correct Python code results in an error. The last line of the message indicated what type of exception error you ran into.

Instead of showing the message exception error, Python details what type of exception error was encountered. In this case, it was a ZeroDivisionError. Python comes with various built-in exceptions as well as the possibility to create self-defined exceptions.

In [74]:
x = 2
assert(x > 0), "The value of x must be greater than 0!"
print(5/x)

2.5


### The try and except Block: Handling Exceptions

The try and except block in Python is used to catch and handle exceptions. Python executes code following the try statement as a “normal” part of the program. The code that follows the except statement is the program’s response to any exceptions in the preceding try clause.

![image.png](exception_1.png)

As you saw earlier, when syntactically correct code runs into an error, Python will throw an exception error. This exception error will crash the program if it is unhandled. The except clause determines how your program responds to exceptions.

The following function can help you understand the try and except block:

In [1]:
def divide_by_n(m, n):
    assert(n != 0), 'n must be non 0!'
    return m/n

try:
    n = 0
    r = divide_by_n(3, n)
    print(r)
except:
    print('something went wrong with the calculation')

something went wrong with the calculation


The good thing here is that the program did not crash. But it would be nice to see if some type of exception occurred whenever you ran your code. To this end, you need to change the print instruction into something that would generate a more informative message.  In order to see exactly what went wrong, you would need to catch the error that the function threw.

In [2]:
def divide_by_n(m, n):
    assert(n != 0), 'divide by 0!'
    return m/n

try:
    n = 0
    r = divide_by_n(3, n)
    print(r)
except AssertionError as error:
    print('Something went wrong with the calculation. The function return the following error:')
    print(error)
    

Something went wrong with the calculation. The function return the following error:
divide by 0!


Here’s another example where you open a file and use a built-in exception:

In [3]:
try:
    with open('file.log') as file:
        read_data = file.read()
except:
    print('Could not open file.log')

Could not open file.log


This is an informative message, and our program will still continue to run. In the Python docs, you can see that there are a lot of built-in exceptions that you can use here. One exception described on that page is that related the situation in which  a file or directory is requested but doesn’t exist. To catch this type of exception and print it to screen, you could use the following code:

In [4]:
try:
    with open('file.log') as file:
        read_data = file.read()
except FileNotFoundError as fnf_error:
    print(fnf_error)

[Errno 2] No such file or directory: 'file.log'


You can have more than one function call in your try clause and anticipate catching various exceptions. A thing to note here is that the code in the try clause will stop as soon as an exception is encountered.Look at the following code. Here, you first call the divide_by_n() function and then try to open a file:

In [5]:
try:
    n = 1
    r = divide_by_n(3, n)
    print(r)
    with open('file.log') as file:
        read_data = file.read()
except FileNotFoundError as fnf_error:
    print(fnf_error)
except AssertionError as error:
    print('Something went wrong with the calculation. The function return the following error:')
    print(error)


3.0
[Errno 2] No such file or directory: 'file.log'


Imagine that you always had to implement some sort of action to clean up after executing your code. Python enables you to do so using the finally clause.

In [6]:
try:
  x > 3
except:
  print("Something went wrong")
else:
  print("Nothing went wrong")
finally:
  print("The try...except block is finished")

Something went wrong
The try...except block is finished


## A Very Short Introduction to Object Oriented Programming

Object Oriented Programming (OOP for short) is a particular way of programming that focuses on where responsibility rests with various tasks. The idea behind object-oriented programming is that a computer program is composed of a collection of individual units, or objects, as opposed to a traditional view in which a program is a list of instructions to the computer. Each object is capable of receiving messages, processing data, and sending messages to other objects and should be responsible only for a particular task. Just to have a clue on what an object is, you can think of it as data and functionality packaged together in some way to form a single unit of well identified code (examples will soon follow).

What is then peculiar in this approach is that special attention is given to creating the appropriate objects as opposed to focusing solely on solving the problem. For this reason OOP is often called a paradigm rather than a style or type of programming, to emphasize that OOP can change the way software is developed, by changing the way that programmer think about it. A programming paradigm provides (and determines) the view that the programmer has of the execution of the program. On one hand, for instance, in functional programming a program can be thought of as a simple sequence of function evaluations. On the other hand, as we have already pointed out, in object-oriented programming programmers can think of a program as a collection of interacting objects. Therefore the Paradigm of OOP is essentially a paradigm of design. The challenge in OOP therefore is of designing a well defined object system. 

The best way to explain the methodological approach we are talking about is to give a practical example....

### A Practical Example: the Dice Class

Our first class will be used to create basic game of dice. The Dice class will have two properties: numberOfSides and value and one method throw() that will update the dice value by generating a random number between 1 and numberOfSides (e.g. between 1 and 6 for a standard 6-side dice).

![image.png](oop_1.png)

In [7]:
#Dice Class 
import random

class Dice():
  #Constructor for the Dice class
  def __init__(self,numberOfSides=6):
    self.__numberOfSides = numberOfSides
    self.value = 1

  def throw(self):
    self.value = random.randint(1,self.__numberOfSides)
    return self.value

In [8]:
dice1 = Dice(6)
dice1.throw()
print("First Dice:")
print(dice1.value)

dice2 = Dice(6)
dice2.throw()
print("Second Dice:")
print(dice2.value)

if dice1.value>dice2.value:
  print("First dice wins!")
elif dice1.value<dice2.value:
  print("Second dice wins!")
else:
  print("It's a draw!")

First Dice:
4
Second Dice:
3
First dice wins!


As you can see from the above example, object-oriented programming is a programming paradigm that provides a means of structuring programs so that **properties and behaviors are bundled into individual objects**.

For instance, an object could represent a person with **properties** like a *name*, *age*, and *address* and **behaviors** such as *walking*, *talking*, *breathing*, and *running*. Or it could represent an email with **properties** like a *recipient list*, *subject*, and *body* and **behaviors** like *adding attachments* and *sending*.

Put another way, object-oriented programming is an approach for modeling concrete, real-world things, like cars, as well as relations between things, like companies and employees, students and teachers, and so on. OOP models real-world entities as software objects that have some data associated with them and can perform certain functions.

### Define a Class in Python

As we have seen, primitive data structures—like numbers, strings, and lists—are designed to represent simple pieces of information, such as the cost of an apple or the title of a book. What if you want to represent something more complex?

For example, let’s say you want to track employees in an organization. You need to store some basic information about each employee, such as their name, age, position, and the year they started working.

One way to do this is to represent each employee as a list:

In [None]:
kirk  = ["James Kirk", 34, "Captain", 2265]
spock = ["Spock", 35, "Science Officer", 2254]
mccoy = ["Leonard McCoy", "Chief Medical Officer", 2266]

There are a number of issues with this approach.

- First, it can make larger code files more difficult to manage. If you reference kirk[0] several lines away from where the kirk list is declared, will you remember that the element with index 0 is the employee’s name?

- Second, it can introduce errors if not every employee has the same number of elements in the list. In the mccoy list above, the age is missing, so mccoy[1] will return "Chief Medical Officer" instead of Dr. McCoy’s age.

A great way to make this type of code more manageable and more maintainable is to use classes. Classes are used to create user-defined data structures. Classes define functions called methods, which identify the behaviors and actions that an object created from the class can perform with its data. A class is a blueprint for how something should be defined. It doesn’t actually contain any data.For example, a Dog class can specifies that a name and an age are necessary for defining a dog, but it doesn’t contain the name or age of any specific dog. While the class is the blueprint, an instance is an object that is built from a class and contains real data. An instance of the Dog class is not a blueprint anymore. It’s an actual dog with a name, like Miles, who’s four years old.

All class definitions start with the class keyword, which is followed by the name of the class and a colon. Any code that is indented below the class definition is considered part of the class’s body.

In [42]:
'''
The "Stock" class has attributes "ticker", "open", "close", "volume" and 
"rate_return". 
Inside the class body, the first method is called __init__, which is a
special method. When we create a new instance of the class, 
the __init__ method is immediately executed with all the parameters that 
we pass to the "Stock" object. The purpose of this method is to set up a 
new "Stock" object using data we have provided.

'''
class Stock:
    def __init__(self, ticker, open, close, volume):
        self.ticker      = ticker
        self.open        = open
        self.close       = close
        self.volume      = volume
        self.rate_return = float(close)/open - 1

    def update(self, open, close):
        '''
        By calling the update() function, we updated
        the open and close prices of a stock. Please 
        note that when we use the attributes or call 
        the methods inside a class, we need to specify 
        them as self.attribute or self.method(), 
        otherwise Python will deem them as global 
        variables and thus raise an error.
        '''
        self.open = open
        self.close = close
        self.rate_return = float(self.close)/self.open - 1

    def daily_return(self):
        return self.rate_return


The properties that all `stock` objects must have are defined in a method called `__init__()`. Every time a new `stock` object is created, `__init__()` sets the initial state of the object by assigning the values of the object’s properties. That is, `__init__()` initializes each new instance of the class.

You can give `__init__()` any number of parameters, but the first parameter will always be a variable called `self`. When a new class instance is created, the instance is automatically passed to the `self` parameter in `__init__()` so that new attributes can be defined on the object. 

Notice that the `__init__()` method’s signature is indented four spaces. The body of the method is indented by eight spaces. This indentation is vitally important. It tells Python that the `__init__()` method belongs to the `stock` class.

In the body of `__init__()`, there are a few statements using the `self` variable. Attributes created in `__init__()` are called **instance attributes**. An instance attribute’s value **is specific to a particular instance of the class**. All `stock` objects have a ticker, but the values for the ticker attribute will vary depending on the `stock` instance.

On the other hand, class attributes are attributes that have the same value for all class instances. You can define a class attribute by assigning a value to a variable name outside of __init__().

### Instantiate an Object in Python

Creating a new object from a class is called instantiating an object. You can instantiate a new ??? object by typing the name of the class, followed by opening and closing parentheses:


In [43]:
apple  = Stock('AAPL', 143.69, 144.09, 20109375)
google = Stock('GOOGL', 898.7, 911.7, 1561616)

![oop_3.png](attachment:oop_3.png)

**IMPORTANT:**

In [44]:
a = Stock('AAPL', 143.69, 144.09, 20109375)
b = Stock('AAPL', 143.69, 144.09, 20109375)
a == b

False

In [45]:
type(a)

__main__.Stock

In this code, you create two new Stock objects with the same properties and assign them to the variables a and b. When you compare a and b using the == operator, the result is False. Even though a and b are both instances of the same class with same parameters, they represent two distinct objects in memory.

In [46]:
apple.ticker

'AAPL'

In [47]:
google.daily_return()

0.014465338822744034

In [48]:
google.update(912.8,913.4)
google.daily_return()

0.0006573181419806673

We can check what names (i.e. attributes and methods) are defined on an object using the dir() function:

### A Simple Example: Shopping Basket Class

One of the key features of any e-commerce website is the shopping basket. It is used to let the end-users add products to their basket before proceeding to checkout.

In this Python exercise we will create a Shopping Basket class to implement the different functionalities (behaviours) such as:

- Adding an item to the shopping basket,
- Removing an item from the shopping basket,
- Updating the desired quantity of an item from the shopping basket,
- Viewing/listing the content of the shopping basket,
- Calculating the total cost of the shopping basket,
- Emptying/Resetting the shopping basket,
- Checking if the shopping basket is empty.

To do so we will create the following two classes:

![image.png](attachment:image.png)

The items property of the Shopping Basket class will be a dictionary of items:quantity pairs. (The items are the keys, for each key, the quantity in the basket is the value)



In [1]:
class Item:
  # Constructor
  def __init__(self,name,description,price):
    self.name = name
    self.description = description
    self.price = price

In [2]:
class ShoppingBasket:
  # Constructor
  def __init__(self):
    self.items = {} #A dictionary of all the items in the shopping basket: {item:quantity} 
    self.checkout = False
  
  # A method to add an item to the shopping basket  
  def addItem(self,item,quantity=1):
    if quantity > 0: 
      #Check if the item is already in the shopping basket
      if item in self.items:
        self.items[item] += quantity
      else: 
        self.items[item] = quantity
    else:
      print("Invalid operation - Quantity must be a positive number!")
      
  # A method to remove an item from the shopping basket (or reduce it's quantity)  
  def removeItem(self,item,quantity=0):
    if quantity<=0: 
      #Remove the item
      self.items.pop(item, None)
    else:
      if item in self.items:
        if quantity<self.items[item]:
          #Reduce the required quantity for this item
          self.items[item] -= quantity
        else:
          #Remove the item
          self.items.pop(item, None)
          
  # A method to update the quantity of an item from the shopping basket  
  def updateItem(self,item,quantity):
    if quantity > 0: 
      self.items[item] = quantity
    else:
      self.removeItem(item)
  
  # A method to view/list the content of the basket.
  def view(self):
    totalCost = 0
    print("---------------------")
    for item in self.items:
      quantity = self.items[item]
      cost = quantity * item.price
      print(" + " + item.name + " - " + str(quantity) + " x EUR" + '{0:.2f}'.format(item.price) + \
            " = EUR" + '{0:.2f}'.format(cost))
      totalCost += cost
    print("---------------------")  
    print(" = EUR" + '{0:.2f}'.format(totalCost))
    print("---------------------")  
  
  # A method to calculate the total cost of the basket.
  def getTotalCost(self):
    totalCost = 0
    for item in self.items:
      quantity = self.items[item]
      cost = quantity * item.price
      totalCost += cost
    return totalCost
    
  # A method to empty the content of the basket
  def reset(self):
    self.items = {}
    
  # A method to return whether the basket is empty or not:
  def isEmpty(self):
    return len(self.items)==0
    
  

In [3]:
#Shopping Basket Class - www.101computing.net/shopping-basket-class/
#from Item import Item
#from ShoppingBasket import ShoppingBasket

tomatoSoup = Item("Tomato Soup","200mL can", 0.70)
spaghetti = Item("Spaghetti","500g pack", 1.10)
blackOlives = Item("Black Olives Jar","200g Jar", 2.10)
mozarella = Item("Mozarella","100g", 1.50)
gratedCheese = Item("Grated Cheese","100g",2.20)

In [4]:
myBasket = ShoppingBasket()

In [5]:
myBasket.addItem(tomatoSoup, 4)
myBasket.addItem(blackOlives, 1)
myBasket.addItem(mozarella, 2)
myBasket.addItem(tomatoSoup, 6)

In [6]:
myBasket.view()

---------------------
 + Tomato Soup - 10 x EUR0.70 = EUR7.00
 + Black Olives Jar - 1 x EUR2.10 = EUR2.10
 + Mozarella - 2 x EUR1.50 = EUR3.00
---------------------
 = EUR12.10
---------------------


# References & Credits

*Steven Bird, Ewan Klein and Edward Loper*, **Natural Language Processing with Python**, O'REILLY

*Said van de Klundert*, **[Python Exceptions: An Introduction](https://realpython.com/python-exceptions/)** 