# Python functions and class methods

## What are functions?

Functions are self-contained code which performs some set of operations on one or more inputs and returns an output. [Python has a number of built-in functions](https://docs.python.org/3/library/functions.html). You might think of a function as behaving analagously to a Bash command. For example, the Bash command `wc -l` takes a file as input and returns the number of lines in the file. Python has the function `len()` which takes and object as input and returns the object's length.

In [1]:
mylist = [1, 2, 3]
len(mylist)

3

Functions are another Python object class. While `str`s, `int`s, etc. store data, functions store a sequence of commands to execute. If you run a function without parentheses you can see an output describing what the function is.

In [2]:
len

<function len(obj, /)>

Adding parentheses at the end of the function name tells Python to execute (also said "call") the function. When executing a function, you must provide inputs if the function is expecting them. Otherwise you get an error telling you that you executed the function wrong.

In [3]:
len()

TypeError: len() takes exactly one argument (0 given)

Any time you put parentheses after a object name, Python tries to run it as if it were a function. If you add parentheses to a variable name, you get an error because Python tried to run it.

In [4]:
mylist()

TypeError: 'list' object is not callable

## Function return classes

We've already talked a bit about the class of object you get back when you perform certain operations. For example, if you have a `list` of `int`s, then when you extract a single index from that `list`, the returned object will be an `int`. If you instead extract a range of indices using a slice, you will get back a `list`. We can see this explicitely using another built-in Python function, `type()`.

`type()` returns the class of any object it is given. For example:

In [5]:
type(1)

int

In [6]:
type("a")

str

In [7]:
type([1,2,3])

list

In [8]:
# Indexing like this works exactly the same as if we store the list in a variable first
type([1,2,3][0])

int

In [9]:
type([1,2,3][:1])

list

We can use `type()` to check the return class of a function as well. You can think of a function call behaving just like process substitution in Bash (i.e., `$(<command>)`). When a function is executed, it is as if the function is replaced by its output within the line of code.

For example, using the `mylist` variable we made above, which has a length of 3.

In [10]:
mylist

[1, 2, 3]

In [11]:
# len(mylist) returns 3
len(mylist)

3

In [12]:
# This is the same as running type(3) 
type(len(mylist))

int

Knowing the class of object returned by a function means we know how we can interact with the output of a function. For example, `len()` always returns an `int`, so we can do things like addition with the output of `len()`.

In [13]:
(len(mylist)+3)*5

30

It is important to keep in mind the classes of objects you are working with when writing Python code. There are several reasons for this. Foremost is that each object class is able to do different things or be used in different ways.

We have already seen that `str`s `int`s and `list`s all support `+` operations. We have also seen that you can only use `+` to add objects of the same class together. However, class-specific methods do more than just restrict your ability to add. Each class has methods that define functionality to perform operations of an instance of that class.

## Instances

I just said that class methods perform operations on an "instance", but what does that mean? Basically, any object which is a `str` is an instance of a `str`. The word "instance" simply refers to a thing of a given class. Every string is a `str` instance. Every list is a `list` instance. It's simply jargon to refer to individual objects that have a certain class.

Consider the following variables

In [14]:
string1 = "a"
string2 = "b"

Both of those variables have the class `str`. However they are different `str` variables; they contain different data. Here, `string1` and `string2` are said to be different instances of the `str` class.

Now consider an example from the first notebook. When illustrating an issue with trying to copy mutable classes like `list`s, I used the following example

In [15]:
l = [1, 2, 3]
new_l = l

As we saw before, both `l` and `new_l` point to the same data. In fact, they point to the same instance of the `list` class. When changes are made to that instance, both variable names still point to that same instance and so either can be used to retrieve the data.

You might be wondering why we need to worry about "instances" instead of treating all objects of the same class in the same way. The answer is that each instance has functions which act only on that instance and not on other objects. Those functions are called the "methods" of that class.

## Methods

### What are methods?

Methods are very similar to functions. They take inputs, perform operations, and return outputs. However, methods are associated with an instance of an object class and they often modify their instance. A good example of this behavior is the `list.append()` method, which adds a new element to the end of a list. `.append()` adds something to the end of the `list` instance for which it was called. i.e., instead of calling append as `list.append()`, it is called using `<list instance>.append()`. For example:

In [16]:
# list1 and list2 are both instances of the list class
list1 = [1,2,3]
list2 = ['a', 'b', 'c']

In [17]:
# calling the append() method of list1 adds an element to list1
list1.append(4)
list1

[1, 2, 3, 4]

In [18]:
# Other list instances are not effected
list2

['a', 'b', 'c']

Another useful `list` method is `.index()`. It can be used to identify the index in a `list` where a certain value is located.

In [20]:
list2.index("b")

1

Notice that when we run `.append()` nothing is printed, but when we run `.index()` an `int` is printed. That is because `.append()` doesn't return anything. Instead, `.append()` simply modifies the `list` and there is no need for it to return anything. That means that we can use `.index()` as part of a larger process, while `.append()` doesn't have a useful return type. For example, we can use `.index()` to find the element that comes after "b" in `list2`:

In [21]:
# Add 1 to index of "b" to get next index
list2[list2.index("b")+1]

'c'

### Other important methods to know

For today's class there are a few other class methods you need to know. They are enumerated below:

* `list` (We've already covered both of these)
    1. `.append()`
    2. `.index()`
* `str`
    1. `.split()`
    2. `.join()`
    3. `.strip()`
    4. `.replace()`

Let's work through how the `str` methods listed above work. With these methods, you should be able to do most of the things you could need to do with `list`s and `str`s.

#### `str.split()`

`str.split()` splits up a `str` and returns a `list`, using a delimiter. The default delimiter is any whitespace character. For example:

In [22]:
x = "These are some words"
x.split()

['These', 'are', 'some', 'words']

You can also use other delimiters. Note that the delimiter is not included in the returned `list`

In [23]:
x.split("e")

['Th', 's', ' ar', ' som', ' words']

In the above example, using "e" as a delimiter doesn't make any sense. However, a common format for storing column-based data is .csv (comma separated values). When reading csv files, we can get a list of the columns, by splitting a line on commas.

In [24]:
line = "this is,column-separated,data"
line.split(",")

['this is', 'column-separated', 'data']

Note that the `str` instance being split is not modified when the `.split()` method is called.

In [25]:
line

'this is,column-separated,data'

#### `str.join()`

`str.join()` a `list` of `str`s using a delimiter and returns a new `str`. In the case of the `.join()` method, the delimiter is the data stored in the `str` instance, while the `list` being joined is given as input. i.e., `<delimiter>.join(<list>)`. An example should make that clearer...

In [26]:
delim = " "
my_list = ["first", "second", "third"]
delim.join(my_list)

'first second third'

The delimiter used can be any `str`, but you'll typically join on whitespace or commas.

In [27]:
" and ".join(my_list)

'first and second and third'

In [28]:
",".join(my_list)

'first,second,third'

#### `str.strip()`

`str.strip()` removes characters from either end of a `str` and returns a modified version of the `str`. Sometimes, you'll have to deal with strings that have whitespace on either end. For example, if you are reading lines of a file with indentation, or if there are trailing newlines on the ends of lines. `str.strip()` removes whitespace by default, but you can remove any substring from the beginning and end of a `str` with this method.

In [30]:
some_string = "\t\tthis is the data xyz"
print(some_string) # To render tabs in a Jupyter notebook we need to use print()

		this is the data xyz


In [31]:
some_string.strip()

'this is the data xyz'

In [33]:
some_string.strip(' xyz')

'\t\tthis is the data'

#### `str.replace()`

`str.replace()` replaces a substring in a `str` with another string and returns the modified `str`. You must provide two inputs: the substring to find, and the string to replace it with.

In [34]:
filename = "some_genome.fasta"
filename.replace(".fasta", ".fna")

'some_genome.fna'