
## Programming for Data Science

### Lecture 7, Part 1: Object-Oriented Programming (OOP)

### Instructor: Farhad Pourkamali 


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/farhad-pourkamali/CUSucceedProgrammingForDataScience/blob/main/Lecture7_OOP_Part1.ipynb)


### Introduction
<hr style="border:2px solid gray">

* Python is a versatile language that supports two programming approaches: POP and OOP.
    * Procedure-Oriented Programming (POP)
        * Emphasizes a step-by-step execution of procedures.
        * Program is a set of functions that perform specific tasks.
        * Data and functions are separate entities.
        * Reusability is often limited. 
        
    * Object-Oriented Programming (OOP)
        * Emphasizes the organization of code into reusable, self-contained units called `objects`.
        * Program is a collection of interacting `objects`.
        * Data (known as `attributes`) and functions (known as `methods`) are encapsulated within `objects`.
        
* Object-Oriented Programming (OOP) Components:
    * Class:
        * Blueprint or template for creating objects.
        * Defines attributes (data) and methods (functions) that objects of the class will have.
        * Acts as a user-defined data type.
    * Object:
        * Instance of a class, created based on the class blueprint.
        * Possesses unique attributes and behaviors defined by the class.
        * Represents a specific occurrence or entity in the program.
        
* Class names should be in `CamelCase`, where each word begins with a capital letter.
Example: `MyClass` and `CarModel`. 

* Let's break down the basics of Object-Oriented Programming (OOP) using a simple example of a `Student` class with `student_id`, `name`, and one `method`.

    * Constructor `def __init__(self, student_id, name)`
        * Initializes the class with attributes (student_id and name).
        * `self` refers to the instance of the class being created.
        
* In Python, when defining attributes in a class, you don't use `()` because parentheses are used to call methods or functions. Attributes are essentially variables that store data within an object, and they don't require parentheses.

In [1]:
class Student:
    
    # Class initialization method (constructor)
    def __init__(self, student_id, name):
        self.student_id = student_id
        self.name = name

    # Method to display student information
    def display_info(self):
        print(f"Student ID: {self.student_id}")
        print(f"Name: {self.name}")

In [2]:
# Creating an instance of the Student class
student1 = Student(student_id=10372, name="John Doe")

# Accessing attributes
print(student1.student_id, student1.name)


10372 John Doe


In [3]:
# Calling the method
student1.display_info()

Student ID: 10372
Name: John Doe


In [4]:
# Creating another instance of the Student class with a different name
student2 = Student(student_id=10463, name="Jane Smith")

student2.display_info()

Student ID: 10463
Name: Jane Smith


* Let's add a method named `add_courses` to the `Student` class to store a list of courses that a student has taken. Here's the modified class.

In [5]:
class Student:
    def __init__(self, student_id, name):
        self.student_id = student_id
        self.name = name
        self.courses_taken = []  # Initialize an empty list for courses

    # modified method 
    def display_info(self):
        print(f"Student ID: {self.student_id}")
        print(f"Name: {self.name}")
        print("Courses Taken:", self.courses_taken)

    # new method 
    def add_courses(self, courses):
        self.courses_taken.extend(courses)

# Creating an instance of the Student class
student1 = Student(student_id=10372, name="John Doe")

# Displaying information for student1
student1.display_info()

Student ID: 10372
Name: John Doe
Courses Taken: []


In [6]:
# Adding courses for student1
student1.add_courses(["Mathematics", "History", "Programming"])

# Displaying information for student1
student1.display_info()

Student ID: 10372
Name: John Doe
Courses Taken: ['Mathematics', 'History', 'Programming']


In [7]:
# Adding more courses for student1
student1.add_courses(["Statistics"])

# Displaying information for student1
student1.display_info()

Student ID: 10372
Name: John Doe
Courses Taken: ['Mathematics', 'History', 'Programming', 'Statistics']


* Here's a concise summary of the main syntax for defining a class in Python.

```
class ClassName:
    def __init__(self, arguments):
        # Constructor method to initialize object attributes
        # self refers to the instance being created
        # arguments are parameters passed during instance creation

    def other_methods(self, arguments):
        # Additional methods to define behaviors of the class
        # self refers to the instance calling the method
        # arguments are parameters specific to each method
```

# Instance vs class attributes 
<hr style="border:2px solid gray">

* Instance Attributes:

    * Belong to a specific instance of the class.
    * Unique to each instance.
    * Defined within the constructor method (`__init__`), using self.
    * Modified using the instance name (self).
    
* Class Attributes:
    * Belong to the class itself.
    * Shared by all instances of the class.
    * Defined outside of any method, typically at the top of the class.  

In [8]:
import math

class Circle:
    # "Class attribute" representing the value of pi
    pi = math.pi

    def __init__(self, radius):
        # "Instance attribute" for the radius of the circle
        self.radius = radius

    def calculate_area(self):
        # Method to calculate the area of the circle
        area = Circle.pi * (self.radius ** 2)
        return area

    def calculate_circumference(self):
        # Method to calculate the circumference of the circle
        circumference = 2 * Circle.pi * self.radius
        return circumference

# Creating instances of the Circle class
circle1 = Circle(radius=5)
circle2 = Circle(radius=8)

# Accessing class attribute
print(f"All circles use the value of pi: {Circle.pi}")

# Accessing instance attributes and calling methods
area1 = circle1.calculate_area()
circumference2 = circle2.calculate_circumference()

print(f"The area of circle1 is: {area1}")
print(f"The circumference of circle2 is: {circumference2}")

All circles use the value of pi: 3.141592653589793
The area of circle1 is: 78.53981633974483
The circumference of circle2 is: 50.26548245743669


* In Python, `object.__dict__` is a dictionary that holds the `attributes` of an object. Each instance of a class has its own `__dict__` attribute, which is a dictionary containing the object's attributes and their corresponding values.

In [9]:
class Example:
    def __init__(self, name, age):
        self.name = name
        self.age = age

# Creating an instance of Example
obj = Example(name="John", age=25)

# Accessing the __dict__ attribute
obj_dict = obj.__dict__

# Displaying the dictionary of attributes
print(obj_dict)


{'name': 'John', 'age': 25}


In [10]:
type(obj_dict)

dict

* Below is a basic implementation of a `CustomArray` class that mimics some of the functionalities of a NumPy array, focusing on Python lists with the attribute 'n', representing the length, along with basic operations such as dot product and vector norm.

In [11]:
import math

class CustomArray:
    def __init__(self, data):
        if not isinstance(data, list):
            raise TypeError("Input data must be a list")
            
        self.data = data
        self.n = len(data)

    def dot_product(self, other):
        if self.n != other.n:
            raise ValueError("Input arrays must have the same length")
            
        return sum(x * y for x, y in zip(self.data, other.data))

    def vector_norm(self):
        return math.sqrt(sum(x**2 for x in self.data))


In [12]:
# Example usage:
array1 = CustomArray([1, 2, 3])
array2 = CustomArray([4, 5, 6])

# Dot product
result_dot_product = array1.dot_product(array2)
print("Dot Product:", result_dot_product)

# Vector norm
result_vector_norm = array1.vector_norm()
print("Vector Norm:", result_vector_norm)


Dot Product: 32
Vector Norm: 3.7416573867739413


In [13]:
# Do we get an error message?

array1 = CustomArray((1, 2, 3))

TypeError: Input data must be a list

In [14]:
# Do we get an error message?

array1 = CustomArray([1, 2, 3])

array2 = CustomArray([4, 5, 6, 8])

array1.dot_product(array2)

ValueError: Input arrays must have the same length

In [15]:
# instance attribute
array1.n

3

### HW 7

1. Add two additional methods, `pearson_correlation` and `lp_norm`, to the `CustomArray` class in this notebook. The Pearson correlation coefficient between two vectors $X$ and $Y$ is given by the formula:

$$
\rho(X, Y) = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}}.
$$

Note that the above formula requires you to calculate the mean of both vectors or lists ($\bar{X}$ and $\bar{Y}$), the covariance between the two vectors (the term in the numerator), and the standard deviations of each vector (the two terms in the denominator).

The Lp norm of a vector $X$ with with elements $X_i$ is defined as 

$$
\|X\|_p = \left( \sum_{i=1}^{n} |X_i|^p \right)^{\frac{1}{p}}.
$$

After implementation, verify the functionality of your class with the provided test cases to confirm it operates as expected.

In [None]:
import math

class CustomArray:
    def __init__(self, data):
        if not isinstance(data, list):
            raise TypeError("Input data must be a list")
            
        self.data = data
        self.n = len(data)

    def dot_product(self, other):
        if self.n != other.n:
            raise ValueError("Input arrays must have the same length")
            
        return sum(x * y for x, y in zip(self.data, other.data))

    def vector_norm(self):
        return math.sqrt(sum(x**2 for x in self.data))

    def pearson_correlation(self, other):
        if self.n != other.n:
            raise ValueError("Vectors must have the same length for Pearson correlation")

        # Your Code 

    def lp_norm(self, p):
        if p <= 0:
            raise ValueError("p must be greater than 0 for Lp norm")

        # Your Code


In [None]:
# Test case for Pearson Correlation
vector_x = CustomArray([1, 2, 3, 4, 5])
vector_y = CustomArray([2, 3, 4, 5, 6])

pearson_corr = CustomArray.pearson_correlation(vector_x, vector_y)
print(f"Pearson Correlation between X and Y: {pearson_corr}")

# Test case for Lp Norm
vector_z = CustomArray([1, -2, 3, -4, 5])
p = 3

lp_norm_z = CustomArray.lp_norm(vector_z, p)
print(f"Lp Norm of Z with p={p}: {lp_norm_z}")

2. Let's tackle a simple data science problem: linear regression. Create a custom class named `LinearRegression` to handle linear regression analysis.

* The linear regression model can be represented by the equation:

$$y=\beta_0+\beta_1 x$$

where $y$ is the dependent variable (output), $x$ is the independent variable (input), $\beta_0$ is the intercept (y-intercept),and $\beta_1$ is the slope (coefficient) of the independent variable.
    
* Coefficient Calculation: The coefficients $\beta_0$ and $\beta_1$ are calculated using the following formulas:

$$\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}
$$

$$\beta_0= \bar{y} - \beta_1 \cdot \bar{x}
$$

where $n$ is the number of data points, $x_i$ and $y_i$ are individual data points, and $\bar{x}$ and $\bar{y}$ are the means of $x$ and $y$ vectors. The coefficients $\beta_0$ and $\beta_1$ are determined to best fit a line to the given data points. 

* Prediction: Once the coefficients are calculated, you can make predictions for new values of independent variable $x$ using:

$$y=\beta_0+\beta_1 \cdot \text{new }x$$

In the following cell, you see a template for creating the `LinearRegression` class. Add your code to the `calculate_coefficients` and `predict` methods. Once completed, run the subsequent cell to verify the correctness of your code using the example usage. 


In [None]:
class LinearRegression:
    def __init__(self, x, y):
        if len(x) != len(y):
            raise ValueError("Input arrays must have the same length")
            
        self.x = x
        self.y = y
        self.n = len(x)
        self.beta_0 = None
        self.beta_1 = None

    def calculate_coefficients(self):
        if self.n == 0:
            raise ValueError("Empty dataset, cannot perform linear regression")
        
        # Your Code

    def predict(self, new_x):
        if self.beta_0 is None or self.beta_1 is None:
            raise ValueError("Coefficients are not calculated, please run calculate_coefficients() first")
        
        # Your Code 


In [None]:
# Example usage:
x_data = [1, 2, 3, 4, 5]

y_data = [2*xi + 3 for xi in x_data]

regression_model = LinearRegression(x_data, y_data)

# Calculate coefficients
beta_0, beta_1 = regression_model.calculate_coefficients()
print("Linear Regression Coefficients: beta_0 =", beta_0, ", beta_1 =", beta_1)

# Make predictions
new_x_value = 6
predicted_y = regression_model.predict(new_x_value)
print(f"Predicted y for x={new_x_value}: {predicted_y}")


3. Create a Python class named `NumpyDatasetAnalyzer` that offers basic functionalities to analyze a data set represented as a 2D Numpy array. Each row in the array represents a data record, and each column represents a different variable.

Requirements:

* Class Initialization: The `NumpyDatasetAnalyzer` class should initialize with a single parameter: dataset, which is a 2D Numpy array.

* Calculate Mean: Implement a method `calculate_mean(column_index)` that calculates and returns the mean of a specified column in the dataset.

* Calculate Median: Implement a method `calculate_median(column_index)` that calculates and returns the median of a specified column.

* Find Minimum and Maximum: Implement methods `find_minimum(column_index)` and `find_maximum(column_index)` that return the minimum and maximum values of a specified column, respectively.

After implementation, verify the functionality of your class with the provided test cases to confirm it operates as expected.

In [None]:
import numpy as np

class NumpyDatasetAnalyzer:
    # Your code

In [None]:
# Test case 

import numpy as np

dataset = np.array([
    [1, 20, 300],
    [2, 30, 400],
    [3, 40, 500]
])

analyzer = NumpyDatasetAnalyzer(dataset)
mean = analyzer.calculate_mean(1)  # Mean of the second column
median = analyzer.calculate_median(1)  # Median of the second column
minimum = analyzer.find_minimum(2)  # Minimum of the third column
maximum = analyzer.find_maximum(2)  # Maximum of the third column

print(mean, median, minimum, maximum)