# Intro to Python

## Python Classes

There are several built-in Python classes. We won't discuss all the built-in classes in this course. However, some of the most useful will be illustrated in this demonstration.

### Strings

Strings (class name: `str`) are a data type which can store characters. Strings have a couple of properties that you should know:
* They are ordered. The position of each character within the string is meaningful and can be refered to.
* They are immutable. Any modifications you make to a string require copying all data.

Strings support the operations you would expect:

In [1]:
 # store a string in variable frog
frog = "a frog on a log"
frog

'a frog on a log'

In [2]:
# Concatenation - combining two strings
frog_pond = frog + ' in a pond' # Note single or double quotes can be used
frog_pond

'a frog on a log in a pond'

We'll stick with our original variable, `frog` for the following examples

In [3]:
# Indexing - extracting individual or ranges of elements
# Indexes are specified using square brackets
frog[0] # Note counting starts at 0 (called zero-base numbering)

'a'

In [4]:
# Ranges of indices are called a "slice"
# Format is [<start>:<stop>:<step>]
frog[2:6]

'frog'

In [5]:
# If you omit start, stop, or step, the defaults are 0, the end of your string, and 1
frog[:6]

'a frog'

In [6]:
# Indices can also be specified relative to the end of the string (last index is -1)
frog[-5:]

'a log'

In [7]:
# Being immutable, you can't change parts of a string
frog[0] = "b"

TypeError: 'str' object does not support item assignment

Before we move on, take a look at the error message above. Notice that it points to the line and tells you what was wrong with the line. It also names a kind of error, "TypeError".

Python error messages are clearer than those of some other languages. However, part of learning Python is learning to interpret what the error message is telling you that you have done wrong. This will take time. If you google the last line of the error, you will invariably find a Stack Overflow thread describing what the problem was.

### Numbers

Integers (class name: `int`) are whole numbers. Floats (class name: `float`) are numbers with a fractional component or decimal place. In most cases, you can use ints and floats in the same ways. A notable exception is in indexing which was shown above. Only ints can be used to index other objects.

Python has the ability to perform mathematical calculations. Much of the syntax of mathematical operations in Python will be familiar to you. However, some operations use a syntax you may not have seen before. You can see the list of supported operations in the Python docs [here.](https://docs.python.org/3.11/library/stdtypes.html#numeric-types-int-float-complex)

In [8]:
# addition
5 + 10

15

In [9]:
# subtraction
10 - 5

5

In [10]:
# multiplication
2 * 3

6

In [11]:
# division
6 / 2 # Note division always returns a float

3.0

In [12]:
# floor division - divide then round down to nearest int
5 // 2 # Note floor division with ints returns an int

2

In [13]:
# powers
2 ** 8

256

In [14]:
# Specify order of operations with parentheses
2 * 1 + 1

3

In [15]:
2 * (1 + 1)

4

In [16]:
# Mixing ints and floats returns a float
5.0//2

2.0

In [17]:
# Note when a float is equal to a whole number, it can be represented with a dot but no decimal places
2 ** 4. # 4. is equivalent to 4.0

16.0

As you see in the above examples, ints and floats both support mathematical operations. You can even combine ints and floats in the same calculation.

Something to note here, which is something you will see again and again is that whenever Python does a calculation, or performs a function, it returns an object of one class or another. The class that is returned is predictable. For example, if we add two ints together, we will get an int back. If we add an int and a float, we will get a float back.

The class we get back is called the "return type" and is something you need to pay close attention to. As mentioned above, there are differences in what you can do with different lasses. `float`s can't be used to index. Therefore we can index with the result of calculations that return an `int`, but not those that return a `float`.

In [18]:
frog[1+2]

'r'

Here, `frog[1+2]` is equivalent to `frog[3]`, as Python performs the calculation and uses the result to index the `frog` variable. However, if we use division instead, which returns a float, it doesn't work.

In [19]:
# This returns 3.0, which is numerically equivalent to 3, but the class is not supported
frog[6/2]

TypeError: string indices must be integers, not 'float'

Pay close attention to classes as we move forward. It is helpful to try to think about the operations you are performing in terms if the classes used and the return types.

### Lists

`List`s are a container object. They have the following important properties:
* They are ordered
* They can contain any kind of Python object, even other lists
* They are mutable (i.e., you can change them without having to copy all their data)

`List`s are defined using a syntax like `[x, y, z]`, where square brackets enclosing the elements of the `list` and commas (,) separate `list` elements. For example:

In [20]:
# Lists can contain elements of different classes
mylist = ["abc", 123, frog]
mylist

['abc', 123, 'a frog on a log']

As `list`s are ordered, they can be indexed using positional indices and slices (the same way as `str`s above)

In [21]:
# Index 0 is the first index, but is referred to as the zeroth index
mylist[0]

'abc'

In [22]:
# Slice syntax is the same as for str. i.e., [<start>:<stop>:<step>]
mylist[1:]

[123, 'a frog on a log']

In [23]:
# base-0 numbering begins at the start index and ends before the stop index
mylist[1:2]

[123]

Note that when you extract a single element from a `list`, the class of the object returned is the class of the element extracted. For example, the zeroth element of `mylist` is `'abc'`, which is a `str`. `mylist[0]` therefore returns a `str`. However, the oneth element of `mylist` is `123`, which is an `int`. `mylist[1]` therefore returns an `int`.

When extracting a range of elements from a `list` using a slice, no matter what the class is of the elements extracted, or the length of the slice, a `list` is always returned. You can see in the printed output above that the extracted slice is within square brackets. That indicates the object is a `list`. In addition, you can explicitely check the class of an object. We'll get to how you can do that a bit later. For now, just remember that the class returned by indexing `list`s is not always the same (it is whatever the type is of the element at that index), while a slice of a `list` always returns a `list`.

### Dicts

`Dict`s are another container object. `Dict`s allow the association between two objects: a key and a value. Keys can be any hashable object (basically anything immutable; e.g., `str`, `int`, etc.), while values can be any Python object. `Dict`s are useful in any case where you wish to store data and then retrieve it again later using a name or other identifier. If you will only want to retrieve data using an index, use a list instead.

`Dict`s have the following important properties:
* They are ordered (Python >=3.7). Note the order is simply the order keys were added. You cannot directly index `dict`s using their order, but can rely on order for iteration (more later).
* Keys can be any hashable Python object (basically just immutable objects like `str` and `int`)
* Values can be any Python oject class
* Keys can only have a single associated value, but values can be container classes like a `list`, containing multiple elements
* They are mutable - keys can be added/removed and values can be changed
* Key lookup, value retrieval, and value change operations are almost instantaneous

`Dict`s are defined using a syntax like `{"key1": "value1", "key2": "value2"}`, where curly braces enclose the `dict`, key-value pairs are written as `key: value`, and each key-value pair is separated with a comma. For example:

In [24]:

fruit_veg_dict = {"carrot": "veg", "apple": "fruit", "banana": "fruit"}
fruit_veg_dict["banana"]

'fruit'

Keys and values can be added to a `dict` by using a syntax that looks just like indexing. Simply index the `dict` and then assign the value using the `=` assignment operation

In [25]:
fruit_veg_dict["tomato"] = "fiercely debated"
fruit_veg_dict

{'carrot': 'veg',
 'apple': 'fruit',
 'banana': 'fruit',
 'tomato': 'fiercely debated'}

Note that while key creation involves indexing with a key that does not yet exist, if you try to index with a non-existant key to retrieve a value, you will get an error.

In [26]:
fruit_veg_dict["not a key"]

KeyError: 'not a key'

Note that `dict`s remember the order that keys are added. You may see online that `dict`s are unordered. However, since Python version 3.7, `dict` order has been guaranteed to be reliable. Note that updating the value associated with a key does not change order.

In [27]:
fruit_veg_dict

{'carrot': 'veg',
 'apple': 'fruit',
 'banana': 'fruit',
 'tomato': 'fiercely debated'}

In [28]:
fruit_veg_dict["avocado"] = "fruit"
fruit_veg_dict

{'carrot': 'veg',
 'apple': 'fruit',
 'banana': 'fruit',
 'tomato': 'fiercely debated',
 'avocado': 'fruit'}

As you can see, the order printed is sorted according to the order in which keys were created. Changing a key added earlier does not move the key to the end of the list.

In [29]:
fruit_veg_dict["tomato"] = "fruit"
fruit_veg_dict

{'carrot': 'veg',
 'apple': 'fruit',
 'banana': 'fruit',
 'tomato': 'fruit',
 'avocado': 'fruit'}

## A note about copies and pointers

A common mistake for people starting out in Python is to assume that the following operations will make a copy of an object.

In [30]:
l = [1, 2, 3]
new_l = l
new_l

[1, 2, 3]

However, while, both objects now refer to a list of numbers, look what happens if one of the lists is modified.

In [31]:
l += [4, 5, 6]
l

[1, 2, 3, 4, 5, 6]

In [32]:
new_l

[1, 2, 3, 4, 5, 6]

As you can see, a change to the original list was also reflected in our `new_l` object. That is because the `new_l` object was not a copy of the original list, but rather a new *pointer* to the same list. You can think of this as analagous to a hardlink in Bash. Both variables are actually just pointers to data in your computer's memory. If that data in that memory is changed, then when you access the data using either variable then you get back the modified data.

Note that the analogy to hardlinks is not a good one. The structure of filepaths pointing to data on your hard drive is analagous to variable names pointing to data in memory. However, the way Python pointers work is different from how Bash hardlinks work. For example, if you replace a variables data with new data, that does not impact variables which were, until that point, pointers to the same data.

In [33]:
l = "Now I'm a string"
new_l

[1, 2, 3, 4, 5, 6]

Note also that this is not a problem for immutable data types. That is because when you do something that might appear to modify the data, what you are actually doing is copying the data to a new instance. We can see that with a `str` example.

In [34]:
s = "some string"
new_s = s
new_s

'some string'

In [35]:
s += " and more string"
s

'some string and more string'

In [36]:
new_s

'some string'

Instead of changing the data that both `s` and `new_s` were pointing at, we instead made a new `str` and changed `s` so that it was pointing at the new data. `new_s` is still pointing to the same, unchaged data that it was before.

We can see the same thing for `int`s.

In [37]:
x = 5
y = x
x += 5
x

10

In [38]:
y

5

In both cases, the `+=` operation looks like it should be modifying the existing data, like it did in the `list` example. Instead, `+=` copied the existing data to a new location. That means that what we see above for `str` and `int` is actually the same process as when we replaced `l` with a `str`.

These examples illustrate that mutable and immutable classes are working differently behind the scenes. The difference in behavior is important to bear in mind in order to predict how your code will behave if you want to make a copy of a variable.

If you do wish to make a copy of a variable with a mutable class, the syntax may differ depending on the class. To make a shallow copy of a `list` or `dict`, you can use the following. Don't worry about understanding the`.copy()` syntax yet. We will cover that next.

In [39]:
l = [1, 2, 3]
new_l = l[:]
l += [4, 5, 6]
new_l

[1, 2, 3]

In [40]:
d = {'a': 'apple', 'b': 'banana'}
new_d = d.copy()
d['c'] = 'carrot'
new_d

{'a': 'apple', 'b': 'banana'}

While this is ahead of what we are going to cover just yet, you could also take a look at [copy.deepcopy](https://docs.python.org/3/library/copy.html#copy.deepcopy) to copy a variable and make a copy of all contained variables. This is important if you have nested objects like a list of lists. A shallow copy will only copy the outermost list but will use pointers for all levels deeper than the top level. 