#### Crash course for `pycodestyle`

We have completed the lesson modules for Basic Python and Advanced Python, and will be moving beyond introducing language features to thinking about how to structure, organise, and improve code.

In the industry, developers do not rely on people to catch syntax and other language feature errors. Tools exist for that, and I will introduce them on top of the lesson content from this point onwards.

In their own words:

> [`pycodestyle`](http://pycodestyle.pycqa.org/en/latest/intro.html) is a tool to check your Python code against some of the style conventions in PEP 8.

Such tools are known as **linters**. They help to pick out undesirable bits of code, kind of like picking lint off wool clothing.

The cell below loads `pycodestyle` into Jupyter Notebook and enables it in the subsequent cells:

In [None]:
# This code cell loads a PEP8 linter.
# Linting is the process of flagging programming errors,
# bugs, stylistic errors, and other code problems
# pycodestyle is a linter that highlights any syntax that
# is not PEP8-compliant.
# You only need to load it once in each notebook.

%load_ext pycodestyle_magic
%pycodestyle_on

When you run any subsequent code cells, it will highlight syntax errors. If you see any such highlighted syntax errors, please fix them. It will greatly improve the readability of your code.

# Lesson 11a: Python Objects

We started with simple types, like `int`, `float`, `str`, and `bool`, which really just store a value and little else. Python made them really useful by giving them some methods (which other programming languages don’t implement in the same way), but they are otherwise just different representations of **data types**.

We saw that it quickly gets unwieldy to juggle so many variables, so we started to group them into **data structures** instead. These data structures: `list`, `dict`, `tuple`, and `set` let us work on more data with fewer variable names to juggle.

To make more complex programs, it makes sense to start to think about **objects** rather than data types and data structures. While other programming languages treat data types, data structures, and objects differently, Python implements them all as objects. For example, when you open a file for reading or writing, the file handle that you get is an object.

An object differs from a plain data type or data structure in a number of ways.

## Object attributes

Objects can have **attributes**:

In [9]:
f = open('anexistingfilename.txt','w')
f.name

'anexistingfilename.txt'

An attribute is a variable that is attached to an object. When you create a file handle `f`, it has an attribute `f.name` which is a `str` representing the filename you opened. you can use `f.name` in almost any way you can use a `str`.

In [4]:
type(f)

_io.TextIOWrapper

The `f` file handle object is apparently a `_io.TextIOWrapper` type. We’ll see what that means shortly.

Objects can do a lot more than just bundle methods and attributes together. Within an object, you can also write your own code to make the object behave differently from a normal function or variable. For example:

    >>> f.name = 'anotherfilename.txt'
    AttributeError: attribute 'name' of '_io.TextIOWrapper' objects is not writable
    
Within itself, `f` has code that prevents us from overwriting its `name` attribute. This makes sense: `f` is a handle to the `'anexistingfilename.txt'` file (opened in `'r'` mode), and changing the `name` attribute would be misleading! If we could modify it, anyone else who later checks `f.name` would think it was a handle to `'anotherfilename.txt'`.

## Classes

Let’s create another file handle and see the similarities and differences with `f`.

In [10]:
g = open('yetanotherexistingfilename.txt','w')
g.name   

'yetanotherexistingfilename.txt'

`g` also has a `name` attribute, but its value is different.

In [3]:
type(g)    

_io.TextIOWrapper

It’s apparently the same “type” as `f`! Hmmm.

    >>> g.name = 'notanotherfilename.txt'
    AttributeError: attribute 'name' of '_io.TextIOWrapper' objects is not writable

So it looks like `f` and `g` are separate objects of the same type. In Python, we say that `f` and `g` are **instances** of the same **class**.

Let's clean up those two files and don't leave them lying around.

In [11]:
f.close()
g.close()

import os
os.remove(f.name)
os.remove(g.name)

What is a class? Let’s illustrate with an example and some code.

## Introduction to vectors

If I am trying to describe points in a Cartesian grid with an x-coordinate and a y-coordinate, I could do this:

In [12]:
x0 = 0
y0 = 0

Two separate variables. What if I want another point?

In [13]:
x1 = 1
y1 = 1

If I want the length of the line between the points (`x0`,`y0`) and (`x1`,`y1`)?

I will have to use the square root function, `sqrt()` from the `math` library:

In [14]:
import math
math.sqrt # This line is just to show you that sqrt() is a function of the math library

<function math.sqrt>

In [15]:
import math
x0_x1 = x1 - x0
y0_y1 = y1 - y0
math.sqrt(x0_x1**2 + y0_y1**2)

1.4142135623730951

I hope you can see that this is no way to be managing hundreds of points and lines, especially if you want to write your own image-editing software!

## (optional) What about using lists and dicts?

Okay, maybe we can make it simpler using lists? We could define a point `point0` as a pair of `x`,`y` values, and define a function `mod()` to get the length:

In [14]:
def mod(p1,p2):
    '''
    Returns the length of the straight line between points p1 and p2.
    '''
    import math

    if len(p1) != len(p2):
        raise ValueError(f'{p1} and {p2} do not have the same number of dimensions.')
    
    sum_of_squares = 0
    for i in range(len(p1)):
        sum_of_squares += (p2[i] - p1[i])**2
    return math.sqrt(sum_of_squares)

point0 = [0,0]
point1 = [1,1]
length_1_2 = mod(point0,point1)
print(length_1_2)

1.4142135623730951


And then how are you going to keep track of each point? Using another list, or a dict? Here’s a better way.

How do real-world programmers do it?

## Defining classes

In a Cartesian plane, we have _points_ and _lines_. Instead of trying to use simple variables to represent them, real-world programmers use **objects** so that they are reasier to think about. So we should have a `Point` class and a `Line` class, that we can create instances of point objects and line objects from.

How do we define a class?

In [16]:
#run this code cell
# Define a Point class
class Point:
    pass

This defines a class named `Point`. It’s that simple. (Now you know why you can’t use `class` as a variable name in Assignment 5.)

Notice that the class name starts with an uppercase letter.

How do we actually do something with it, like give it an `x` and `y` attribute?

In [18]:
#run this code cell
class Point:
    x = 1
    y = 2

print('x:', Point.x)
print('y:', Point.y)

x: 1
y: 2


You can access the class’s attributes using the same `class.attribute` format that you are so used to. `Point.x` returns the `x` attribute of `Point`, while `Point.y` returns the `y` attribute of `Point`.

Okay, how do we generate multiple points from this class?

In [19]:
#run this code cell
point0 = Point()
point1 = Point()

print('point0.x:',point0.x)
print('point0.y:',point0.y)
print('point1.x:',point1.x)
print('point1.y:',point1.y)

point0.x: 1
point0.y: 2
point1.x: 1
point1.y: 2


Uhh ... but I want point 1 to be (1,1) instead of (0,0).

Oops. Okay, let’s try this:

In [20]:
#run this code cell
point1.x = 1
point1.y = 1

print('point0.x:',point0.x)
print('point0.y:',point0.y)
print('point1.x:',point1.x)
print('point1.y:',point1.y)

point0.x: 1
point0.y: 2
point1.x: 1
point1.y: 1


There’s got to be a better way than this, right? Could we possibly do this instead:

In [21]:
#run this code cell
point0 = Point(0, 0)
point1 = Point(1, 1)

TypeError: object() takes no parameters

Hmm, the `Point` class doesn’t take parameters. How do we make it take an input and set its own attributes?

## Introduction to dunder methods

Introducing the first dunder (<b>d</b>ouble <b>under</b>line) method you must learn: `__init__()`.

**Methods** are functions that belong to an object. You can only call them through the object (using the `object.method()` format), they are not available outside of the object.

**Dunder** methods are special methods that enable special features on the object. We will begin to explore some of them gradually—its a bit much to release them all in one lesson. These special methods are identified by a special format in Python: they start and end with a double underline (`__`).

When you use the `dir()` function on an object, you will often see many such methods. These methods are not meant to be used directly; they are called by Python when you try to use certain operators or built-in functions.

`__init__()` (short for "**init**ialise") is the special method that Python calls when you **initialise** an object (that’s the technical term for “create”). The `__init__()` method takes in parameters that are given the the class upon initialisation, and runs the code on the newly instantiated object.

We want `Point` to take in a pair of values and set its own `x` and `y` attributes. Let’s upgrade it:

In [None]:
#run this code cell
class Point:
    def __init__(x, y):
        ??? = x
        ??? = y
        
point0 = Point(0, 0)

Hmm ... how do we write the code? Which variable do we put on the left to store the `x` and `y` variables?

We want the `__init__()` method of the object to store those attributes in it**self**. We could write it like this:

    class Point:
    def __init__(x,y):
        self.x = x
        self.y = y
        
Hold on a sec. There are only two parameters: `x` and `y`. If we use a variable `self` inside `__init__()`, we would get `local variable referenced before assignment`, wouldn’t we?

So we need to pass `self` in as the first parameter:

In [63]:
#run this code cell
class Point:
    x = None
    y = None
    # I am taking out the class attributes since they will now be defined in __init__()
    def __init__(self, b, c):
        # this is the local space of the method (function belonging to the class)
        self.x = b
        self.y = c
        print(f'y-attribute: {instance.y}')

I know lots of things don’t make sense at this point. It gets a bit clearer when you see it in action. Let’s try making some points first. I will write:

    >>> point0 = Point(0,0)
    
Hold on—the `__init__()` method was defined with 3 parameters, but here we only give it 2 arguments! Is that going to work? Let’s try:

In [65]:
#run this code cell
# this is the global space
Point(0, 1)

print('point0.x:',point0.x)
print('point0.y:',point0.y)
print('Point.x:',Point.x)
print('Point.y:',Point.y)

y-attribute: 1
point0.x: 0
point0.y: 1
Point.x: None
Point.y: None


Huh, it worked.

Yup, the creator of Python, Guido van Rossum, didn’t think it made sense to have to call a method while still telling it that the first argument is itself. Imagine if you had to call the string `.join()` method like this:

    >>> comma = ', '
    >>> comma.join(comma,[1,2,3])
    
It would be tedious, confusing, and annoying.

So internally, you have to define the methods with `self` as the first parameter, but when you instantiate an object using the class, you do not need to provide `self` as the first parameter—Python already knows.

We’re making good progress on `Point` class, what about `Line`s? Lines will also need to have `x` and `y` dimensions, although they represent lengths instead of positions.

## Task 1: Define a `Line` class

Define a `Line` class that has an `x` attribute and a `y` attribute, representing the x and y dimensions respectively.

In [None]:
# Complete the class definition by replacing the
# underscores (_____) with appropriate keywords.
_____ Line:
    _____ __init__(self,x,y):
        self.x = x
        self.y = y

You should be able to instantiate a line using this code: `line0 = Line(2,3)`

In [22]:
#run this code cell
line0 = Line(2,3)
print(f'line0: x={line0.x}, y={line0.y}')

line0: x=2, y=3


In [20]:
#run this code cell
class Line:
    def __init__(self,x,y):
        self.x = x
        self.y = y

If we have a line, we should be able to get its length. I could write a function like this:

In [34]:
#run this code cell
def length_of(line):
    '''
    Returns the length of a Line object.
    '''
    import math
    length = math.sqrt(line.x**2 + line.y**2)
    return length

line = Line(2,3)
length = length_of(line)
length

3.605551275463989

That defeats the point of working with objects. The length is a property of the object, I _want_ to be able to get it using an object method, like so:

    >>> line.length()
    3.605551275463989
    
Time to upgrade the `Line` class.

## Task 2: Add a `.length()` method to `Line`


Define a `length()` method that takes in no arguments and returns the length of the `Line` object.

In [None]:
import math

class Line:
    def __init__(self,x,y):
        self.x = x
        self.y = y
        
    def length(self):
        # Write code below to *return* the length of the object
        # Since we are within the scope of the length method,
        # we only have access to the `self` object.
        # This object, a Line object, has an x and y attribute.
        # We can assign its attributes to a variable for use.
        x = self.x
        y = self.y
        ### BEGIN SOLUTION
        return math.sqrt(x**2 + y**2)
        ### END SOLUTION


Notice that our new method, `.length()` needs to have `self` as the first parameter too, even if it doesn't need to be called with any arguments. This may take some time getting used to, but you will get the hang of it.

Great. We now have a `Point` class that can represent a `point` (x,y). We also have a `Line` class that can represent a `line` (x,y), and we can get its length using `line.length()`.

Lets do more ambitious things.

## Vector math, revisited

You learned that you can add vectors this way:

$$\vec{AB} + \vec{BC} = \vec{AC}$$

How would we implement this using our newly-baked `Line` class?

In pseudocode, it makes sense that `Line(1,2) + Line(2,3)` should return an object represented by `Line(3,5)` right?

Since we just learned methods, let’s try implementing it as a method.

## Task 3: Add an `.add()` method to `Line`

Define a `add()` method that takes in an argument `line` and returns a `Line` object representing the vector sum of the object with `line`.

In [35]:
# Add an `add()` method to the Line class
class Line:
    def __init__(self,x,y):
        self.x = x
        self.y = y
        
    def length(self):
        # Paste your `length()` method definition from above here
        return math.sqrt(self.x**2 + self.y**2)

        
    # Complete the method definition below by replacing
    # the underscores(_____) with appropriate keywords or names
    def _____(_____,line):
        new_x = self.x + line.x
        new_y = self.y + line.y
        return Line(new_x,new_y)

In [38]:
# Use this cell to test your code
line0 = Line(1,2)
line1 = Line(2,3)
line01 = line0.add(line1)
print('line01 x,y:',line01.x,line01.y)
### Expected output: line01 x,y: 3 5

line01 x,y: 3 5


Got it working? Great.

Time to learn another dunder method.

When we create a list, it is easy to examine the contents of the list by just putting in the name of the variable:

    >>> a_list = [1,2,3]
    >>> a_list
    [1, 2, 3]
    
Let’s try that with our newly instantiated line object:

In [29]:
line01

<__main__.Line at 0x7fa0b05a97f0>

That’s not very nice, I wanted to know what its x and y dimensions were.

We can make it give us something more helpful by defining another dunder method.

When we debug, Python has to have some way of showing us what the object is. The output to the console must be in text form, so Python needs a text <b>repr</b>esentation of this object. It uses the `__repr__()` dunder method to know what this text representation is. Let’s add a `__repr__()` method to our `Line` class. While we’re at it, let’s upgrade our `Point` class as well.

A _general guideline_ to implementing `__repr__()` is to use the expression we type to create that object. there will be many exceptions to this rule, but it is a good enough starting point for us.

We want to achieve the following:

### Example output

    >>> line01
    Line(3,5)
    >>> point0
    Point(0,0)
    
This is much more readable and helpful.

## Task 4: Define `__repr__()` dunder method for `Line` and `Point`

1. Add a docstring for the `Point` class
2. Define a `__repr__()` method for `Point`

In [40]:
# See how __repr__() is implemented for Line.
class Line:
    '''
    A class representing a line with x- and y-dimensions.
    
    Methods:
    - length()
    - add(another_line)
      Returns a Line object representing the vector sum of
      the current Line and another_line.
    '''
    def __init__(self,x,y):
        self.x = x
        self.y = y
 
    def __repr__(self):
        # Note that although the result appears in the output, you do not need
        # to print it yourself. Python will do that for you. You need to *return*
        # the result instead.
        return f'Line({self.x},{self.y})'
        
    # It is good practice to define all your dunder methods first,
    # starting with __init__(), before defining other methods after them.
    def length(self):
        return math.sqrt(self.x**2 + self.y**2)

    def add(self,line):
        new_x = self.x + line.x
        new_y = self.y + line.y
        return Line(new_x,new_y)
    
# Task 4
class Point:
    # (1) Add a docstring for the Point class
    def __init__(self,x,y):
        self.x = x
        self.y = y
        
    # (2) Define a __repr__() method for Point.
    def __repr__(self):
        # Type your code here
        ### BEGIN SOLUTION
        return f'Point({self.x},{self.y})'
        ### END SOLUTION

In [56]:
# Use this cell to test your code
line01 = Line(3,5)
line01
# Expected: Line(3,5)
point0 = Point(0,0)
point0
# Expected: Point(0,0)

Point(0,0)

This is helpful for debugging, but when I use a `print()` statement like:

    >>> print(f'Points on this grid: {point0} and {point1}')
    Points on this grid: Point(0,0) and Point(1,1)
    
That’s helpful, but not very _pretty_. If we ever need to display these points in a list or output them in the command line, that’s going to be too wordy. It would be nice if we could have the following:

    >>> print(f'Points on this grid: {point0} and {point1}')
    Points on this grid: (0,0) and (1,1)
    
That’s more like how you would write it in Math, and it looks neater too without losing any clarity.

That means we need another method besides `__repr__()`. Python has just  the thing for us, and it is called `__str__()`.

## Task 5: Define `__str__()` dunder method for `Line` and `Point`

In [70]:
# See how __str__() is implemented for Line.
class Line:
    '''
    A class representing a line with x- and y-dimensions.
    
    Methods:
    - length()
    - add(another_line)
      Returns a Line object representing the vector sum of
      the current Line and another_line.
    '''
    def __init__(self,x,y):
        self.x = x
        self.y = y
 
    def __repr__(self):
        return f'Line({self.x},{self.y})'
    
    def __str__(self):
        return f'({self.x},{self.y})'
        
    def length(self):
        return math.sqrt(self.x**2 + self.y**2)

    def add(self,line):
        new_x = self.x + line.x
        new_y = self.y + line.y
        return Line(new_x,new_y)
    
# Task 4
class Point:
    # (1) Add a docstring for the Point class
    def __init__(self,x,y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        # Paste your __repr__() representation for Point here

    # (2) Define a __str__() method for Point.
    # Type your code here
    ### BEGIN SOLUTION
    def __str__(self):
        return f'({self.x},{self.y})'
    ### END SOLUTION


In [71]:
# Use this cell to test your code
line01 = Line(3,5)
point0 = Point(0,0)
point1 = Point(1,1)
print(f'Points on this grid: {point0} and {point1}')

Points on this grid: (0,0) and (1,1)


## `__repr__()` vs `__str__()`

These two methods seem very similar. Which is more important, and why do we need two?

These methods are called by Python when the functions `repr()` and `str()` are called. You have already been instroduced to `str()`, but not to `repr()`.

The main purpose of `repr()`, as far as I can tell, is to get the “source code” of the object; to know how to create the same object in the Python shell. This is useful mainly for debugging, when the skimpy `print()` statements don’t give enough information or if you need to try creating that object in your test code.

`str()`, on the other hand, is called when the `print()` function encounters an object that is not a `str`. That is why it can handle non-`str` types—provided they have a `__str__()` that can be called. So the main purpose of `str()` is to give a concise, simple representation of the object for printing.

If `__str__()` is not implemented in an object, Python will check if the object has a `__repr__()` method and use that. But if `__repr__()` is not implemented, Python will just give the default representation (e.g. `<__main__.Line at 0x7fa0b05a97f0>`) without checking for a `__str__()` method.

For this reason, **you should always have a `__repr__()` method** for every object you create. If you need something different to use in `print()`, you can implement a `__str__()` method as well.

# Feedback and suggestions

Any feedback or suggestions for this assignment?