# Primitive data structures aren't intended for complex data

Let's say for any vehicle, we want to store some basic values associated with it
- miles per gallon (mpg)
- fuel tank size
- max passengers
- transmission

We could store it as a list

In [12]:
sedan = [35, 15, 5, "automatic"]

What happens if we have missing data? For example, what if we want to define a pickup truck, but don't know how big the gas tank is

In [13]:
pickup = [20, 3, "automatic"]

If we try to call the number of passengers, we have to use different indexes

In [14]:
print(f"Sedan capacity: {sedan[2]} passengers")
print(f"Pickup capacity: {pickup[1]} passengers")

Sedan capacity: 5 passengers
Pickup capacity: 3 passengers


Alternatively, we could use a dictionary. Using the dictionary keys, missing data won't affect the call.

In [15]:
sedan = {"mpg": 35, "tank_size": 15, "passengers": 5, "transmission": "automatic"}
pickup = {"mpg": 20, "passengers": 3, "transmission": "automatic"}

print(f"Sedan capacity: {sedan['passengers']} passengers")
print(f"Pickup capacity: {pickup['passengers']} passengers")

Sedan capacity: 5 passengers
Pickup capacity: 3 passengers


The downside of this approach is that every time a new vehicle is added, each field will have to be written out manually. As the data become more complicated, defining a new vehicle requires more complex code.

Let's look at another issue with a dictionary approach. Because we have the fuel tank capacity in gallons and the fuel efficiency in miles per gallon, we can define a function to calculate the maximum range of the vehicle by multiplying those two values.

In [16]:
def calculate_max_range(vehicle):
    return vehicle["mpg"] * vehicle["tank_size"]

Now we can add a new field to the dictionary for the maximum range

In [17]:
sedan["max_range"] = calculate_max_range(sedan)
print(f"Sedan max range: {sedan['max_range']} miles")

Sedan max range: 525 miles


However, this field can't be defined at the same time as our dictionary because the variable hasn't been instantiated yet

In [19]:
sports_car = {
    "mpg": 17,
    "tank_size": 15,
    "passengers": 2,
    "transmission": "manual",
    "max_range": calculate_max_range(sports_car)
}

NameError: name 'sports_car' is not defined

The result is that if we want to define a new vehicle and use a function to calculate a value for a new field, we have to first define the vehicle and then perform the `max_range` assignment in a second line of code.

In [21]:
sports_car = {
    "mpg": 17,
    "tank_size": 15,
    "passengers": 2,
    "transmission": "manual",
}
sports_car["max_range"] = calculate_max_range(sports_car)
print(f"Sports car max range: {sports_car['max_range']} miles")

Sports car max range: 255 miles


If you need to calculate values for many fields using the other values associated with the vehicle, this problem becomes unsustainable quickly.

# Classes in Python

A `Class` is an object that serves as a blueprint for data structures. Let's think about an example of a built-in Python class: strings. We have used strings in many different ways so far in this course and even used some of the more advanced operations associated with `Class` objects. If we use the `help()` function, we can get a better idea of what is associated with the `str` class.

In [24]:
help(str) # THIS ONE MIGHT BE TOO COMPLICATED, although most built-ins will be...

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

We can define a custom class for our example vehicles. Let's start without defining anything within the class:

In [28]:
class Vehicle:
    pass

A couple of points immediately stand out. First, unlike the conventional `snake_case` used for naming variables and functions, classes employ a `CapitalizedWord` convention. Second, while defining functions, expected arguments are contained in parentheses. Adding parentheses to a class definition performs a different task that will be covered later. Instead, arguments passed to the class are handled by the `__init__()` function, which runs whenever a new `instance` of the class is defined.

In [29]:
class Vehicle:
    def __init__(self, mpg, tank_size, passengers, transmission):
        self.mpg = mpg
        self.tank_size = tank_size
        self.passengers = passengers
        self.transmission = transmission

The `__init__()` function is defined within the class as you would any other function. The surrounding double-underscores (or "dunders") indicate that this function is not meant to be called by the user, but rather upon the satisfaction of some condition internally within the class. `__init__()` runs whenever an `instance` is initialized. There are many of these "magic" or "dunder" functions, some of which we will cover later.

Let's look more closely at the `__init__()` function we have defined for `Vehicle`. We are passing the arguments that we would expect for our vehicles, but there is an additional `self` argument. `self` is also references in the next few lines of definitions. This refers to the `instance` that has just been created and the variables assigned to it with dot notation are `instance attributes`, specific to the current instance only, not accross all `Vehicle` instances.

We have now created a `Vehicle` class, but if we try to call it, we get an error:

In [27]:
Vehicle()

TypeError: __init__() missing 4 required positional arguments: 'mpg', 'tank_size', 'passengers', and 'transmission'

As mentioned earlier, the `Class` is merely a blueprint for storing data in a specific structure. We need to define an `instance` that corresponds to our specific vehicles. This is done by simply calling the `Vehicle` class with the input values.

In [31]:
sedan = Vehicle(35, 15, 5, "automatic")
print(sedan)

<__main__.Vehicle object at 0x7fdb1f077790>


We can access the values for our sedan through dot notation:

In [32]:
print(f"Sedan fuel efficiency: {sedan.mpg} mpg")
print(f"Sedan fuel capacity: {sedan.tank_size} gallons")
print(f"Sedan passenger capacity: {sedan.passengers} passengers")
print(f"Sedan transmission: {sedan.transmission}")

Sedan fuel efficiency: 35 mpg
Sedan fuel capacity: 15 gallons
Sedan passenger capacity: 5 passengers
Sedan transmission: automatic


Now we can make some tweaks to solve the problems we encountered with primitives. First, we can add default values to the `__init__()` arguments for handling missing data (we can use `None` for the default). Next, we can add our maximum range function as a class function and use it to define `max_range` for the `Vehicle`.

In [1]:
class Vehicle:
    def __init__(self, mpg=None, tank_size=None, passengers=None, transmission=None):
        self.mpg = mpg
        self.tank_size = tank_size
        self.passengers = passengers
        self.transmission = transmission
        self.max_range = self.calculate_max_range()
        
    def calculate_max_range(self):
        if (self.mpg is None) or (self.tank_size is None):
            return None
        return self.mpg * self.tank_size

`calculate_max_range` is defined similarly to our earlier example, except for a couple of changes. It is indented within the class so that it is defined within scope of `Vehicle` and is called through `self.calculate_max_range`. Additionally, the only input argument is `self`. `self` allows us to access the `instance attributes` we have already defined, meaning that when the function is called at the end of `__init__()`, we can use the input arguments to automatically calculate the range of the vehicle. I have also included some simple logic to prevent an error if either of the input variables are missing. Now let's try our sedan again:

In [3]:
sedan = Vehicle(35, 15, 5, "automatic")
print(f"Sedan max_range: {sedan.max_range} miles")

Sedan max_range: 525 miles


And the pickup truck and sports car:

In [4]:
pickup = Vehicle(mpg=20, passengers=3, transmission="automatic")
sports_car = Vehicle(mpg=17, tank_size=15, passengers=2, transmission="manual")

print(f"Pickup max_range: {pickup.max_range}")
print(f"Sports car max_range: {sports_car.max_range} miles")

Pickup max_range: None
Sports car max_range: 255 miles


# Data encapsulation

Sometimes we need to restrict the ways certain data are accessed. Let's think about class for bank accounts.

In [20]:
class BankAccount:
    def __init__(self, account_number, balance):
        self.__account_number = account_number
        self.__balance = balance

    def deposit(self, amount):
        if amount > 0:
            self.__balance += amount
            print(f"Deposited ${amount} into account {self.__account_number}.")
        else:
            print("Invalid deposit amount.")

    def withdraw(self, amount):
        if amount > 0 and amount <= self.__balance:
            self.__balance -= amount
            print(f"Withdrew ${amount} from account {self.__account_number}.")
        else:
            print("Invalid withdrawal amount or insufficient balance.")

    def get_balance(self):
        return self.__balance

In [21]:
# Creating an instance of BankAccount
account = BankAccount("1234567890", 1000)

Trying to directly access the balance attribute (encapsulation violation) will produce an `AttributeError`.

In [22]:
print(account.__balance)

AttributeError: 'BankAccount' object has no attribute '__balance'

Instead, we must use the `get_balance()` class function.

In [23]:
print("Account balance: ", account.get_balance())

Account balance:  1000


Additionally, trying to make changes to the balance directly is prohibited.

In [24]:
account.__balance += 500

AttributeError: 'BankAccount' object has no attribute '__balance'

Again, we are forced to use class functions to change the balance.

In [25]:
account.deposit(500)
print("Account balance: ", account.get_balance())

account.withdraw(200)
print("Account balance: ", account.get_balance())

Deposited $500 into account 1234567890.
Account balance:  1500
Withdrew $200 from account 1234567890.
Account balance:  1300


When sensitive information needs to be associated with a class instance, encapsulation allows the developer to hide that information. For example, our `BankAccount` class has a private attribute for account number, but no way to access it from outside of the instance.