# Mutable and Immutable Data Types in Python

**Welcome!** This notebook will teach you about comparing data in Python and the concept of mutable and immutable. By the end of this notebook, you'll know why some data types are immutable, and how to properly compare mutable and immutable data types using different operators.

<hr>

## When two pieces of data are considered equal?

### Data in Memory

We have learned <code>class</code>, and we know that, when we are creating variables and assigning values to them, we are creating **instances** of some classes.

In [None]:
# "hello, python!" is an instance of str
a = "hello, python!"

# 2024 is an instance of int
b = 2024

We know that computers have _memory_ , and imaging the _memory_ of our computer works like a list of empty slots, where each instance is stored in a specific slot.

In python, we can use <code>id()</code> to check which "slot" an instance occupies.

In [None]:
id(a)

In [None]:
id(b)

### References

If we create another variable, and assign the same value to these new variables

In [None]:
new_a = a
new_b = b

Their values are of course equal

In [None]:
print(new_a == a)
print(new_b == b)

The <code>id</code> of our new variables stay the same:

In [None]:
id(new_a)

In [None]:
id(new_b)

In [None]:
print(id(new_a) == id(a))
print(id(new_b) == id(b))

This makes sense, because both <code>a</code> and <code>new_a</code> are refering to the same <code>"hello, python!"</code> instance. Same for <code>b</code> and <code>new_b</code>.

We say <code>"hello, python!"</code> is the instance, and <code>a</code> and <code>new_a</code> are both references of the same instance.

### Values

what if we create new variables again, but this time we give their values explicity, not from other variables?

In [None]:
new_a = "hello, python!"
new_b = 2024

Their values are again equal

In [None]:
print(new_a == a)
print(new_b == b)

However, <code>id</code> of the values of our new variables changed:

In [None]:
id(new_a)

In [None]:
id(new_b)

In [None]:
print(id(new_a) == id(a))
print(id(new_b) == id(b))

This means that the <code>"hello, python!"</code> of <code>new_a</code> uses a different "slot" than that of <code>a</code>, and same applies to <code>new_b</code> and <code>b</code>.

In other words, the two <code>"hello, python!"</code> are **different instances** of <code>str</code>, and <code>a</code> and <code>new_a</code> are refering to different instances.

### Two types of "being equal"

We have learned the equality operator <code>==</code>, which compares the equality of the **values** of two variables:

In [None]:
new_a == a

In [None]:
new_b == b

We just learned that we can use <code>id(a) == id(b)</code> to check if the data represented by <code>a</code> and <code>b</code> are located in the same "slot" of the memory, or in other words, if <code>a</code> and <code>b</code> are refering to the same **instance**.

In [None]:
id(new_a) == id(a)

In [None]:
id(new_b) == id(b)

For this, Python gives us an operator <code>is</code> which does exactly the same thing

In [None]:
new_a is a

In [None]:
new_b is b

Hence we have two types of "being equal":
1. two variables can be considered equal because they refer to the same instance
2. two variables can be considered equal because their represent the same value, for example, the same integer

### More examples

In [None]:
a = 1000
b = 10**3    # 10 to the Power of 3

print(a == b)
print(a is b)

In [None]:
a = "hello, python!"
b = "hello, " + "python!"

print(a == b)
print(a is b)

## References vs. Values, the problem

Lets take a look of this example: we have two lists that have identical value:

In [None]:
# a and b have the same value

a = [1, 2, 3]
b = [1, 2, 3]

print("a is", a)
print("b is", b)

a == b

Changing list <code>a</code> will not affect list <code>b</code>

In [None]:
# chaning a will not affect b

a.append(4)

print("a is", a)
print("b is", b)

a == b

However, when two variables are refering to the same list instance

In [None]:
# a and b have the same value

a = [1, 2, 3]
b = a

print("a is", a)
print("b is", b)

a == b

Changing list <code>a</code> affects list <code>b</code>

In [None]:
# chaning a affects b

a.append(4)

print("a is", a)
print("b is", b)

a == b

References of the same instance can make us getting confused between **the value a variable represents** and **the instance a variable is refering to**, and sometimes this can be a problem.

```
I didn't do anything, but why b is changed?!!!
```

## Mutable and Immutable

This problem can be significant for primitive data types such as numbers and string because we always care about their values, not which instances of it. We don't want the value of a number to be changed unexpectedly.

The solution of this problem is pretty simple, what for certain data types, we don't allow chaing the value of an instance once it is created. And these data types are called immutable data types.

### Immutable data types

In python, <code>int</code>, <code>float</code>, <code>bool</code>, <code>str</code>, <code>tuple</code>, <code>complex</code> (complex numbers) are immutable data types.

In [None]:
a = 2024
b = a

a is b

In [None]:
# there is no way you can do something like
# a.set_value(2025), so you cannot change
# 2024 once you created it.
#
# the only thing you can do, is assign another
# instance to variable a, which does not affect
# variable b

a = 2025
b

In [None]:
a = "hello, python!"
b = a

a is b

In [None]:
# there is no way you can do something like
# a.set_value("code finished"), so you cannot
# change "hello, python!" once you created it.
#
# the only thing you can do, is assign another
# instance to variable a, which does not affect
# variable b

a = 2025
b

a = "code finished."
b

### Mutable data types

Mutable data types are the opposite of immutable data types, which means you can change the value of an instance once it is created.

In python, many other data types, including <code>list</code>, <code>set</code>, <code>dict</code>, and your own data types, are mutable.

In [None]:
a = [1, 2, 3]
b = a

a is b

In [None]:
a.append(4)
b

classes are mutable data types:

In [None]:
class Person:
    def __init__(self, name):
        self.name = name

user1 = Person("john")
user2 = user1

user1 is user2

In [None]:
user1.name = "emma"
user2.name

### Comparing mutables and immutables

To avoid confusion and bugs, the general comparision rules are
1. always use <code>==</code> for immutable data types,
2. normally, use <code>is</code> for mutable data types,
3. when you are explicitly comparing values, you can use <code>==</code> for mutable data types.

For our own classes, <code>==</code> by default compares if two variables are the same instances, so

In [None]:
Person("john") == Person("john")

gives us False.

To change that, we need to define the <code>\_\_eq\_\_</code> method for our class. It is again a reserved magic name of python, you can take a look [here](https://docs.python.org/3/reference/datamodel.html#basic-customization) for more details

In [None]:
class Person:
    def __init__(self, name):
        self.name = name
    
    def __eq__(self, other):
        if type(other) is Person:
            return self.name == other.name
        return False

In [None]:
Person("john") == Person("john")

### set and dict issue in our last session

In our last session we had a problem that we canot added a <code>dict</code> into a <code>set</code>. And the reason behind also related to the topic of mutable and immutable.

<code>set</code> contains unique items, and the way it compares if two items are the same or different, is by looking at their _hash_ values.

In python, we use <code>hash()</code> to check the hash value of an instance

In [None]:
hash(3)

In [None]:
hash("hello")

A _hash_ is an integers computed for an instance. If two instances are considered the same, then their _hash_ should be the same, and vice versa.

In other words, by comparing _hash_ values, we are comparing if two instances are the same.

```
hash(a) == hash(b) is true, if a and b are the same
```

But wait a second, isn't this what <code>==</code> and <code>is</code> do?

The difference is that, _hash_ does not define what "the same" means, so it is us who makes the decision. The same _hash_ can mean that two items are the same instance, or two items are different instances but having the same value.

In python, **for immutable data types, _hash_ represents the equality of values**.

In [None]:
# a and b are different instances,
# but they are equal and they have the same hash

a = "burger king"
b = "burger " + "king"

print(a == b)
print(a is b)
print(hash(a) == hash(b))

In [None]:
# dispite being different instances,
# a and b represent the same key

type_of_food = {a : "burger"}
type_of_food[b]

however, **for mutable data types, _hash_ aligns with the default definition of equality, which is comparing instances.**

In [None]:
class Person:
    def __init__(self, name):
        self.name = name

In [None]:
# a and b are different instances,
# and they have different hashes

a = Person("john")
b = Person("john")

print(a == b)
print(a is b)
print(hash(a) == hash(b))

In [None]:
# dispite both are named as "john"
# a and b represent different keys

study = {a : "architecture"}
study[b]

if we changed the default definition of "equality", our data type becomes "unhashable" because there is a confusion between comparing by instances and comparing by values

In [None]:
class Person:
    def __init__(self, name):
        self.name = name
    
    def __eq__(self, other):
        if type(other) is Person:
            return self.name == other.name
        return False

In [None]:
# a and b are different instances but comparable by value,
# they have no hash because of the confusion

a = Person("john")
b = Person("john")

print(a == b)
print(a is b)
print(hash(a) == hash(b))

In [None]:
# un-hashable data cannot be used as the key

study = {a : "architecture"}
study[b]

In this case we must explicitly implement the <code>\_\_hash\_\_</code> method to resolve this confusion

In [None]:
class Person:
    def __init__(self, name):
        self.name = name
    
    def __eq__(self, other):
        if type(other) is Person:
            return self.name == other.name
        return False
    
    def __hash__(self):
        return hash(self.name)

In [None]:
# a and b are different instances but comparable by value,
# their hashes are computed using their values

a = Person("john")
b = Person("john")

print(a == b)
print(a is b)
print(hash(a) == hash(b))

In [None]:
# dispite being different instances,
# a and b represent the same key because they have the same hash

study = {a : "architecture"}
study[b]

## Summary

In this session we learned that, in python, there are two types of "being equal"

1. two variables are considered equal because they refer to the same instance, they share the same <code>id()</code>, and we can use <code>is</code> operator to compare them.

2. two variables are considered equal because their represent the same value, for example, the same integer, they may have different <code>id()</code>, but we can still use <code>==</code> operator to compare them.

Furthermore, we learned that, many data types are immutable, because we want to avoid the confusion between **the value a variable represents** and **the instance a variable is refering to**. This confusion can cause the problem of

```
I didn't do anything, but why the value of my variable is changed?!!!
```

Lastly, we learned that, the reason python cannot add <code>dict</code> to a <code>set</code>, is that <code>dict</code> is not hashable. This is because <code>dict</code> is mutable and value-comparable at the same time, which leands to the confusion whether we should compare <code>dict</code> by instances or by values.