# OOP 1: Introduction to Classes and Objects

In this notebook, we'll explore the basic concepts of Object-Oriented Programming (OOP) in Python, focusing on classes and objects. We'll use examples related to data science to illustrate these concepts.

## Table of Contents

1. [What is a Class?](#1)
2. [Creating an Object](#2)
3. [Defining Attributes](#3)
4. [Defining Methods](#4)
5. [Example: Data Scientist Class](#5)
6. [Exercise: Build a Class](#6)


---
## 1. What is a Class? <a id="1"></a>
A class is a blueprint for creating objects. Classes encapsulate data and functions that operate on the data.

**Syntax**:

```python
class ClassName:
    # class attributes and methods
```

**Example**:

Let's define a simple class `DataScientist`:



In [15]:
class DataScientist:
    pass

This defines a new class `DataScientist` with no attributes or methods.


---
## 2. Creating an Object <a id="2"></a>

An object is an instance of a class. We can create an object by calling the class like a function.

**Example**:


In [3]:
# Creating an object of the DataScientist class
person1 = DataScientist()
print(person1)

<__main__.DataScientist object at 0x000001EC8A862660>


What can we do with an instance of that class?

In [4]:
person1.height = 1.80


In [5]:
person1.height

1.8

In [16]:
person2 = DataScientist()


In [19]:
type(person2)

__main__.DataScientist

---
## 3. Defining Attributes <a id="3"></a>

### Instance Attributes
Instance attributes are specific to each object created from a class. They are defined within the `__init__` method.

**Example**

In [25]:
class DataScientist:
    def __init__(self, name=None, expertise_level=None):
        self.name = name
        self.expertise_level = expertise_level

### Creating objects with instance attributes
ds1 = DataScientist("Alice", "Senior")
ds2 = DataScientist("Bob", "Junior")

print(ds1.name, ds1.expertise_level)
print(ds2.name, ds2.expertise_level)

Alice Senior
Bob Junior


### Class Attributes

Class attributes are shared by all instances of a class. They are defined directly within the class, outside any methods.

**Example**

In [30]:
class DataScientist:
    count = 0
    role = "Data Scientist"  # class attribute
    
    def __init__(self, name, expertise_level):
        self.name = name
        self.expertise_level = expertise_level
        DataScientist.count += 1

### Creating objects
print(DataScientist.count)
ds1 = DataScientist("Alice", "Senior")
print(DataScientist.count)
ds2 = DataScientist("Bob", "Junior")
print(DataScientist.count)

print(ds1.role)
print(ds2.role)
print(DataScientist.role)  # Accessing the class attribute directly

0
1
2
Data Scientist
Data Scientist
Data Scientist


---
## 4. Defining Methods <a id="4"></a>

Methods are functions that belong to a class. They define behaviors for the objects.

**Example**:

In [8]:
class DataScientist:
    def __init__(self, name, expertise_level):
        self.name = name
        self.expertise_level = expertise_level

    def introduce(self):
        print(f"My name is {self.name} and I am a {self.expertise_level} Data Scientist.")

### Creating an object and calling a method
ds1 = DataScientist("Alice", "Senior")
ds1.introduce()

My name is Alice and I am a Senior Data Scientist.


### Method with Parameters

In [9]:
class DataScientist:
    def __init__(self, name, expertise_level):
        self.name = name
        self.expertise_level = expertise_level

    def add_skill(self, skill):
        print(f"{self.name} has acquired a new skill: {skill}")

### Adding a skill to a Data Scientist
ds1 = DataScientist("Alice", "Senior")
ds1.add_skill("Machine Learning")

Alice has acquired a new skill: Machine Learning


---
## 5. Example: Data Scientist Class <a id="5"></a>

Let's create a class `DataScientist` to illustrate the concepts we've learned so far. This class will include attributes for a data scientist's name, programming languages, and projects. We'll also define methods to add new languages and projects.

**Step-by-Step Example**:

In [38]:
class DataScientist:
    def __init__(self, name, languages=[], projects=None) -> None:
        self.name = name
        self.languages = languages
        self.projects = projects if projects is not None else []

    def add_language(self, language):
        if language not in self.languages:
            self.languages.append(language)
        else:
            print(f"{language} is already known.")

    def add_project(self, project):
        self.projects.append(project)

    def display_info(self):
        print(f"Data Scientist: {self.name}")
        print(f"Known Languages: {', '.join(self.languages)}")
        print(f"Projects: {', '.join(self.projects)}")

# Creating a DataScientist object
ds1 = DataScientist("Bob")
ds1.add_language("Python")
ds1.add_language("R")
ds1.add_project("Customer Segmentation")
ds1.display_info()

# Creating another DataScientist object with initial data
ds2 = DataScientist("Carol", languages=["Python", "SQL"], projects=["Sales Prediction"])
ds2.display_info()

Data Scientist: Bob
Known Languages: Python, R
Projects: Customer Segmentation
Data Scientist: Carol
Known Languages: Python, SQL
Projects: Sales Prediction


In [40]:
DataScientist.__annotations__

{}

**Explanation**

1. **Class Definition**: We define a DataScientist class with an `__init__` method to initialize the attributes.

2. **Adding Languages**: We add a method `add_language` to add new programming languages.

3. **Adding Projects**: We add a method `add_project` to add new projects.

4. **Displaying Information**: We add a method `display_info` to print out the data scientist's details.


---
## 6. Exercise: Build a Class <a id="6"></a>

Create a class `DataFrameAnalyzer` from scratch. This class will be used to analyze *pandas* DataFrames. Implement the class with the following functionalities:

**Requirements**

- Class Initialization: The class should be initialized with a pandas DataFrame.

- Methods to Implement:
    - `get_summary()`: Returns a summary of the DataFrame's statistics using dataframe.describe().
    - `get_missing_values()`: Returns the number of missing values in each column.
    - `get_column_types()`: Returns the data types of each column.

 You can find references to the pandas API [here](https://pandas.pydata.org/docs/reference/index.html).

In [34]:
import pandas as pd

class DataFrameAnalyzer:
    def __init__(self, dataframe: pd.DataFrame):
        self.dataframe = dataframe

    def get_summary(self):
        return self.dataframe.describe()
    
    def get_missing_values(self):
        return self.dataframe.isnull().sum()

    def get_column_types(self):
        return self.dataframe.dtypes


><details>
><summary>Do you need some help?</summary>
> 
> Here is a working solution:
> 
> ```python
> import pandas as pd
>
> class DataFrameAnalyzer:
>    def __init__(self, dataframe):
>        self.dataframe = dataframe
>
>    def get_summary(self):
>        return self.dataframe.describe()
>
>    def get_missing_values(self):
>        return self.dataframe.isnull().sum()
>
>    def get_column_types(self):
>        return self.dataframe.dtypes
> ```

Try now if your code worked as expected. Run the following cell:

In [35]:
# Creating a sample DataFrame
data = {
    'age': [25, 30, 35, 40, None],
    'salary': [50000, 60000, 70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Creating a DataFrameAnalyzer object
analyzer = DataFrameAnalyzer(df)

# Using the methods
print("Summary Statistics:\n", analyzer.get_summary())
print("\nMissing Values:\n", analyzer.get_missing_values())
print("\nColumn Types:\n", analyzer.get_column_types())

Summary Statistics:
              age        salary
count   4.000000      5.000000
mean   32.500000  70000.000000
std     6.454972  15811.388301
min    25.000000  50000.000000
25%    28.750000  60000.000000
50%    32.500000  70000.000000
75%    36.250000  80000.000000
max    40.000000  90000.000000

Missing Values:
 age       1
salary    0
dtype: int64

Column Types:
 age       float64
salary      int64
dtype: object
