# OOP 2: Abstraction and Encapsulation

In this notebook, we'll explore the concepts of abstraction and encapsulation in Object-Oriented Programming (OOP) using Python. We'll illustrate these concepts with examples relevant to data science.

## Table of Contents


1. [Introduction to Abstraction](#1)
2. [Abstraction in Python](#2)
3. [Introduction to Encapsulation](#3)
4. [Encapsulation in Python](#4)
5. [Example: Data Processing Pipeline](#5)
6. [Exercise: Build a different DataProcessor Class](#6)

---
## 1. Introduction to Abstraction <a id="1"></a>

Abstraction is the process of hiding the complex implementation details and showing only the essential features of an object. It helps to reduce complexity and allows the programmer to focus on interactions at a higher level.

In data science, a common abstraction might be a DataProcessor class that abstracts away the details of reading, cleaning, and transforming data, providing simple methods to perform these tasks.

---
## 2. Abstraction in Python <a id="2"></a>

To implement abstraction in Python, we can use abstract classes and methods provided by the abc module.

In this example, `DataProcessor` is an abstract class with abstract methods `read_data` and `process_data`. The `CSVDataProcessor` class provides concrete implementations of these methods.

In [None]:
from abc import ABC, abstractmethod

class DataProcessor(ABC):
    @abstractmethod
    def read_data(self, source):
        pass
    
    @abstractmethod
    def process_data(self):
        pass

class CSVDataProcessor(DataProcessor):
    def read_data(self, source):
        # Implementation for reading CSV data
        pass
    
    def process_data(self):
        # Implementation for processing CSV data
        pass

---
## 3. Introduction to Encapsulation <a id="3"></a>

Encapsulation is the bundling of data and methods that operate on that data within a single unit (class), and restricting access to some of the object's components. This means that the internal representation of an object is hidden from the outside.

Encapsulation ensures that the data within a DataProcessor object can only be modified through specific methods, ensuring data integrity and preventing unintended interference.


---
## 4. Encapsulation in Python <a id="4"></a>

**In Python, encapsulation is more about convention than enforcement.** Python does not have true private variables like some other languages. Instead, it uses naming conventions to indicate the intended visibility of attributes.

### Single Underscore (_)

A single underscore before a method or attribute name is a convention indicating that it is intended for internal use only. It signals to other programmers that this is a "protected" member of the class and should not be accessed directly from outside the class. However, it does not enforce this restriction; it is just a convention.


### Double Underscore (__)

A double underscore before a method or attribute name triggers name mangling, which means that the interpreter changes the name of the method or attribute to include the class name. This makes it harder to accidentally override or access the method or attribute from outside the class. It's a way to ensure that the method or attribute is private and intended to be used only within the class itself.

In this example, `_data` is a protected attribute, `_clean_data` is a protected method, and `__load_data` is a private method.

In [None]:
class DataProcessor:
    def __init__(self):
        self._data = None  # Protected attribute
    
    def read_data(self, source):
        self.__load_data(source)  # Private method
    
    def process_data(self):
        # Public method for processing data
        pass
    
    def _clean_data(self):
        # Protected method for cleaning data
        pass
    
    def __load_data(self, source):
        # Private method for loading data
        self._data = "Data from " + source
        
# Using the DataProcessor class
processor = DataProcessor()
processor.read_data("train.csv")
print(processor._data)  # Accessing protected attribute
# processor.__load_data("example.csv") # Raises AttributeError




<br>

><details>
><summary>Curious about the underscores in Python?</summary>
> 
> You can find more info [here](https://realpython.com/python-double-underscore/#double-leading-underscore-in-classes-pythons-name-mangling).
</details>


---
## 5. Example: Data Processing Pipeline <a id="5"></a>

Let's create a more comprehensive example of a data processing pipeline that uses both abstraction and encapsulation.

**Step-by-Step Example**

In [None]:
from abc import ABC, abstractmethod
import pandas as pd

class DataProcessor(ABC):
    def __init__(self):
        self._data = None
    
    @abstractmethod
    def read_data(self, source):
        pass
    
    @abstractmethod
    def process_data(self):
        pass
    
    def get_data(self):
        return self._data

class CSVDataProcessor(DataProcessor):
    def read_data(self, source):
        self._data = pd.read_csv(source)
    
    def process_data(self):
        self._clean_data()
        self._transform_data()
    
    def _clean_data(self):
        self._data.dropna(inplace=True)
    
    def _transform_data(self):
        self._data['processed'] = True

# Using the CSVDataProcessor class
processor = CSVDataProcessor()
processor.read_data("sample.csv")
processor.process_data()
print(processor.get_data().head())


**Explanation**

1. **Abstract Class**: `DataProcessor` is an abstract class with abstract methods `read_data` and `process_data`.

2. **Concrete Class**: `CSVDataProcessor` provides concrete implementations of the abstract methods and defines protected methods `_clean_data` and `_transform_data`.

3. **Encapsulation**: Protected attributes and methods ensure that the data is manipulated in a controlled manner.

4. **Usage**: An instance of `CSVDataProcessor` is created, data is read from a CSV file, processed, and the processed data is accessed using a public method.

---
## 6. Exercise: Build a different DataProcessor Class <a id="6"></a>

Create a subclass JSONDataProcessor. The JSONDataProcessor will be designed to read and process JSON data. Implement the class with the following functionalities:

### Requirements

- Subclass `JSONDataProcessor`

  - It inherits from `DataProcessor`
  
  - Methods:
    - `read_data(source)`: Reads JSON data from the specified source.

    - `process_data()`: Calls _clean_data() and _transform_data() to process the data.

    - `_clean_data()`: Handles data cleaning (e.g., removing missing values).
    
    - `_transform_data()`: Handles data transformation (e.g., adding a new column).

In [None]:
# your code here
class ...

><details>
><summary>Do you need some help?</summary>
> 
> Here is a working solution
> ```python
> class JSONDataProcessor(DataProcessor):
>    def read_data(self, source):
>        self._data = pd.read_json(source)
>    
>    def process_data(self):
>        self._clean_data()
>        self._transform_data()
>    
>    def _clean_data(self):
>        self._data.dropna(inplace=True)
>   
>    def _transform_data(self):
>        self._data['processed'] = True
> ```

Try now if your code worked as expected. Run the following cell:

In [None]:
# Using the CSVDataProcessor class
processor = JSONDataProcessor()
processor.read_data("sample.json")
processor.process_data()
print(processor.get_data().head())
