# OOP 3: Special Methods (__init__, __str__, __repr__, etc.)

In this notebook, we'll explore special methods in Python classes, which allow us to define how objects are initialized, represented, and interacted with. We'll illustrate these concepts with examples relevant to data science.

## Table of Contents

1. [Introduction to Special Methods](#1)
2. [\_\_init\_\_: Object Initialization](#2)
3. [\_\_str\_\_ and \_\_repr\_\_: Object Representation](#3)
4. Other Useful Special Methods
   1. [\_\_getitem\_\_, \_\_setitem\_\_, \_\_delitem\_\_](#4.1)
   2. [Comparison Methods (\_\_eq\_\_, \_\_lt\_\_, \_\_le\_\_, \_\_gt\_\_, \_\_ge\_\_, \_\_ne\_\_)](#4.2)
   3. [Arithmetic Methods (\_\_add\_\_, \_\_sub\_\_, \_\_mul\_\_, \_\_truediv\_\_)](#4.3)
   4. [\_\_len\_\_](#4.4)
   5. [\_\_class\_\_](#4.5)
   6. [\_\_dict\_\_](#4.6)
   7. [\_\_bases\_\_](#4.7)
5. [Example: DataFrame Wrapper Class](#5)
6. [Exercise: Implementing a Dataset Class](#6)

---
## 1. Introduction to Special Methods <a id="1"></a>

Special methods in Python, also known as dunder methods (double underscore methods), allow us to define the behavior of objects for built-in functions and operators. These methods are defined with double underscores before and after their names (e.g., \_\_init\_\_, \_\_str\_\_).


---
## 2. \_\_init\_\_: Object Initialization <a id="2"></a>

The `__init__` method is called when an object is instantiated. It's used to initialize the object's attributes.

**Example**

In [1]:
class DataScientist:
    def __init__(self, name, expertise_level="Junior"):
        self.name = name
        self.expertise_level = expertise_level

ds = DataScientist("Alice")
print(ds.name, ds.expertise_level)

Alice Junior



In this example, the `__init__` method initializes the `name` and `expertise_level` attributes of the `DataScientist` object.

---
## 3. \_\_str\_\_ and \_\_repr\_\_: Object Representation <a id="3"></a>

The `__str__` method defines the string representation of an object, intended to be readable. The `__repr__` method defines the official string representation of an object, intended to be unambiguous.

**Example**

In [4]:
repr("This is a string")

"'This is a string'"

In [10]:
class DataScientist:
    def __init__(self, name, expertise_level):
        self.name = name
        self.expertise_level = expertise_level
    
    def __str__(self):
        return f"DataScientist: {self.name}, Level: {self.expertise_level}"
        
    def __repr__(self):
        return f"DataScientist(name='{self.name}', expertise_level='{self.expertise_level}')"
    


ds = DataScientist("Alice", "Senior")
print(str(ds))  # Calls __str__
print(repr(ds))  # Calls __repr__

DataScientist: Alice, Level: Senior
DataScientist(name='Alice', expertise_level='Senior')


In [14]:
# command 1
# command 2
ds

DataScientist(name='Alice', expertise_level='Senior')

In this example, `__str__` provides a readable string representation, while `__repr__` provides a more detailed and unambiguous string representation.

---
## 4. Other Useful Special Methods <a id="4"></a>


### 4.1 \_\_getitem\_\_, \_\_setitem\_\_, \_\_delitem\_\_ <a id="4.1"></a>

These methods allow objects to use the indexing syntax for getting, setting, and deleting items.

**Example**

In [16]:
import pandas as pd

class DataFrameWrapper:
    def __init__(self, data):
        self._data = pd.DataFrame(data)
    
    def __getitem__(self, key):
        return self._data[key]
    
    def __setitem__(self, key, value):
        self._data[key] = value
    
    def __delitem__(self, key):
        del self._data[key]

# Usage
data = {'age': [25, 30, 35, 40], 'salary': [50000, 60000, 70000, 80000]}
df_wrapper = DataFrameWrapper(data)
print(df_wrapper['age'])  # __getitem__
df_wrapper['bonus'] = [5000, 6000, 7000, 8000]  # __setitem__
print(df_wrapper['bonus'])
del df_wrapper['age']  # __delitem__

0    25
1    30
2    35
3    40
Name: age, dtype: int64
0    5000
1    6000
2    7000
3    8000
Name: bonus, dtype: int64


### 4.2 Comparison Methods (\_\_eq\_\_, \_\_lt\_\_, \_\_le\_\_, \_\_gt\_\_, \_\_ge\_\_, \_\_ne\_\_) <a id="4.2"></a>

These methods define behavior for comparison operators.

**Example**

True

In [31]:
class DataScientist:
    def __init__(self, name, expertise_level, salary):
        self.name = name
        self.expertise_level = expertise_level
        self.salary = salary
    
    def __eq__(self, other):
        return self.salary == other.salary
    
    def __lt__(self, other):
        return self.name < other.name
    
    def __le__(self, other):
        return self.salary <= other.salary

ds1 = DataScientist("Alice", "Senior", 100000)
ds2 = DataScientist("Bob", "Junior", 80000)

print(ds1 == ds2)  # __eq__
print(ds1 < ds2)  # __lt__
print(ds1 <= ds2)  # __le__

False
True
False


### 4.3 Arithmetic Methods (\_\_add\_\_, \_\_sub\_\_, \_\_mul\_\_, \_\_truediv\_\_) <a id="4.3"></a>

These methods define behavior for arithmetic operators.

**Example**

In [24]:
class DataScientist:
    def __init__(self, name, salary):
        self.name = name
        self.salary = salary
    
    def __add__(self, other):
        return self.salary + other.salary
    
    def __sub__(self, other):
        return self.salary - other.salary

ds1 = DataScientist("Alice", 100000)
ds2 = DataScientist("Bob", 80000)

print(ds1 + ds2)  # __add__
print(ds1 - ds2)  # __sub__

180000
20000


### 4.4 \_\_len\_\_ <a id="4.4"></a>

The `__len__` method is used to define the behavior of the `len()` function for an object.
It should return the number of items in the object.

**Example**

In [22]:
len([2,3,6,5])

4

In [None]:
class DataCollection:
    def __init__(self, data):
        self.data = data
    
    def __len__(self):
        return len(self.data)

# Usage
data = [25, 30, 35, 40, 45]
collection = DataCollection(data)
print(len(collection))  # __len__

### 4.5 \_\_class\_\_ <a id="4.5"></a>

The `__class__` attribute returns the class of an instance.

**Example**

In [25]:
ds = DataScientist("Alice", "Senior")
print(ds.__class__)

<class '__main__.DataScientist'>


### 4.6 \_\_dict\_\_ <a id="4.6"></a>

The `__dict__` attribute returns a dictionary representation of an object's attributes.

**Example**

In [27]:
print(DataScientist.__dict__)

{'__module__': '__main__', '__firstlineno__': 1, '__init__': <function DataScientist.__init__ at 0x0000020362AEC860>, '__add__': <function DataScientist.__add__ at 0x0000020362AED580>, '__sub__': <function DataScientist.__sub__ at 0x0000020362AECFE0>, '__static_attributes__': ('name', 'salary'), '__dict__': <attribute '__dict__' of 'DataScientist' objects>, '__weakref__': <attribute '__weakref__' of 'DataScientist' objects>, '__doc__': None}


### 4.7 \_\_bases\_\_ <a id="4.7"></a>

The `__bases__` attribute returns a tuple containing the base classes of a class.

**Example**

In [28]:
print(DataScientist.__bases__)

(<class 'object'>,)


---
### 5. Example: DataFrame Wrapper Class <a id="5"></a>

Let's create a more comprehensive example by implementing a `DataFrame` wrapper class that uses various special methods.

**Step-by-Step Example**

In [None]:
import pandas as pd

class DataFrameWrapper:
    def __init__(self, data):
        self._data = pd.DataFrame(data)
    
    def __str__(self):
        return f"DataFrameWrapper with {len(self._data)} rows and {len(self._data.columns)} columns"
    
    def __repr__(self):
        return f"DataFrameWrapper(data={self._data.to_dict()})"
    
    def __getitem__(self, key):
        return self._data[key]
    
    def __setitem__(self, key, value):
        self._data[key] = value
    
    def __len__(self):
        return len(self._data)

# Usage
data = {
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}
df_wrapper = DataFrameWrapper(data)

# Using special methods
print(str(df_wrapper))  # __str__
print(repr(df_wrapper))  # __repr__
print(df_wrapper['age'])  # __getitem__
df_wrapper['salary'] = [52000, 62000, 72000, 82000]  # __setitem__
print(len(df_wrapper))  # __len__
print(df_wrapper.__class__)  # __class__
print(df_wrapper.__dict__)  # __dict__
print(DataFrameWrapper.__bases__)  # __bases__

Explanation

1. **Initialization (\_\_init\_\_)**: Initializes the DataFrame with the provided data.
2. **String Representation (\_\_str\_\_ and \_\_repr\_\_)**: Provides readable and detailed string representations.
3. **Item Access (\_\_getitem\_\_ and \_\_setitem\_\_)**: Allows accessing and modifying DataFrame columns using bracket notation.
4. **Length (\_\_len\_\_)**: Returns the number of rows in the DataFrame.
5. **Class and Attribute Dictionary (\_\_class\_\_ and \_\_dict\_\_)**: Demonstrates the use of special attributes to access class information and attribute dictionaries.

---
## 6. Exercise: Implementing a Dataset Class <a id="6"></a>

Create a `Dataset` class that wraps around a pandas `DataFrame` and includes several special methods to provide a robust and user-friendly interface for data science operations.

**Requirements**
- **Initialization (\_\_init\_\_)**:
  - It should take a dictionary of data, convert it into a pandas `DataFrame` and store it in a protected variable called `_data`.
- **String Representation (\_\_str\_\_ and \_\_repr\_\_)**:
  - Implement \_\_str\_\_ to return a readable summary of the dataset, including the number of rows and columns.
  - Implement \_\_repr\_\_ to return a detailed representation of the dataset, including its content.
- **Item Access and Modification (\_\_getitem\_\_, \_\_setitem\_\_, \_\_delitem\_\_)**:
  - Implement \_\_getitem\_\_ to allow column access using the indexing syntax.
  - Implement \_\_setitem\_\_ to allow column modification using the indexing syntax.
  - Implement \_\_delitem\_\_ to allow column deletion using the del keyword.
- **Length (\_\_len\_\_)**:
  - Implement it to return the number of rows in the dataset.
- **Comparison (\_\_eq\_\_)**:
  - Implement it to compare two `Dataset` objects based on the equality of their underlying DataFrames.
- **Addition (\_\_add\_\_)**:
        Implement it to concatenate two `Dataset` objects along the rows and return a new `Dataset` object.

In [None]:
# your code here
class ...

><details>
><summary>Do you need some help?</summary>
> 
> Here is a working solution:
> ```python
> import pandas as pd
>
> class Dataset:
>    def __init__(self, data):
>       self._data = pd.DataFrame(data)
>    
>    def __str__(self):
>        return f"Dataset with {len(self._data)} rows and {len(self._data.columns)} columns"
>    
>    def __repr__(self):
>        return f"Dataset(data={self._data.to_dict()})"
>    
>    def __getitem__(self, key):
>        return self._data[key]
>    
>    def __setitem__(self, key, value):
>        self._data[key] = value
>    
>    def __delitem__(self, key):
>        del self._data[key]
>    
>    def __len__(self):
>        return len(self._data)
>    
>    def __eq__(self, other):
>        return self._data.equals(other._data)
>    
>    def __add__(self, other):
>        combined_data = pd.concat([self._data, other._data], ignore_index=True)
>        return Dataset(combined_data)
> ```

Try now if your code worked as expected. Run the following cell:

In [None]:
# Initialize two datasets
data1 = {
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}
data2 = {
    'age': [45, 50],
    'salary': [90000, 100000]
}
ds1 = Dataset(data1)
ds2 = Dataset(data2)

# String representation
print(str(ds1))  # Should print a summary of ds1
print(repr(ds1))  # Should print a detailed representation of ds1

# Item access and modification
print(ds1['age'])  # Should print the 'age' column of ds1
ds1['bonus'] = [5000, 6000, 7000, 8000]  # Should add/modify the 'bonus' column in ds1
print(ds1['bonus'])
del ds1['age']  # Should delete the 'age' column from ds1

# Length
print(len(ds1))  # Should print the number of rows in ds1

# Comparison
print(ds1 == ds2)  # Should compare ds1 and ds2 for equality

# Addition
ds3 = ds1 + ds2  # Should concatenate ds1 and ds2 and return a new Dataset
print(ds3)