# Classes Group Exercise

Now let's try to do some things with classes.

## Recap: What we already know of classes

A class is a custom object type. It has

* class attributes (share between instances)
* instance attributes (not shared)
* methods

In [None]:
class DuckClass:
    """A class of data objects"""
    scientific_name = 'Anas platyrhynchos'     # A class attribute

    def __init__(self, name = 'Donald'):       # The init method
        self.name = name                       # An instance attribute
        self.fav_food = []                     # Another instance attribute

    def say_quack(self):                       # A method
        return 'Quack, quack!'

    def add_food(self, food):                  # The correct way to manipulate instance attribs is via a method
        self.fav_food.append(food)

We initialize a DuckClass object like so:

In [None]:
donald = DuckClass()

In [None]:
donald

<__main__.DuckClass at 0x7d11ba699720>

Check the instance attribs of Donald:

In [None]:
donald.name

'Donald'

In [None]:
donald.fav_food

[]

Check the class attrib.

The syntax is the same from the 'user perspective', i.e. we cannot see from here that one attrib is a class attrib (shared among all instances) and the other is a a private instance attrib.

In [None]:
donald.scientific_name

'Anas platyrhynchos'

## A class of data objects

In this exercise, you will define a class that makes working with data objects easier from the user's perspective. Essentially, we wrap some code complexity into the class and the user only needs to do the call to the method.

We'll start out with a very simple class definition:

In [None]:
#collect imports here for readability
import pandas as pd
import numpy as np

class Data:
    def __init__(self, filename):
        """
        Initializes the Data class with a dataset.

        Parameters:
        - filename (str): The path to the file to load
        """
        self.filename = filename
        self.df = None

        #run the load data method at initialization
        self.load_data()

    #our only method for now: load the data
    def load_data(self):
        #we should probably wrap try ... except around this
        self.df = pd.read_csv(self.filename)
        #implement here some checks if data is loaded correctly


And now we can define a data object:

In [None]:
my_data = Data(filename = "https://raw.githubusercontent.com/Center-for-Health-Data-Science/Python_part2/main/data/diabetes_clean.csv")

In [None]:
my_data

<__main__.Data at 0x7bab1dd8a500>

Let's check the instance attributes

In [None]:
my_data.filename

'https://raw.githubusercontent.com/Center-for-Health-Data-Science/Python_part2/main/data/diabetes_clean.csv'

The class runs `load_data()` at initialization, so the data is already loaded into the `.df` instance attribute:

In [None]:
my_data.df

Unnamed: 0,Age,Sex,BloodPressure,GeneticRisk,BMI,PhysicalActivity,Married,Work,Smoker,Diabetes
0,34,Male,84,0.619,24.7,93,Yes,Self-employed,Unknown,0
1,25,Male,74,0.591,22.5,102,No,Public,Unknown,0
2,50,Male,80,0.178,34.5,98,Yes,Self-employed,Unknown,1
3,27,Female,60,0.206,26.3,82,Yes,Private,Never,0
4,35,Male,84,0.286,35.0,58,Yes,Private,Smoker,1
...,...,...,...,...,...,...,...,...,...,...
483,40,Female,88,0.403,34.5,72,Yes,Private,Smoker,1
484,58,Male,82,0.528,39.2,85,No,Public,Never,1
485,37,Female,84,0.696,24.5,128,No,Private,Never,0
486,29,Female,86,0.808,35.6,51,No,Private,Smoker,1


`my_data.df` will behave like any other dataframe, i.e. we can call methods on it:

In [None]:
my_data.df.describe()

## Adding more functionality to the class

So far so good. Now we have a basic skeleton of a class definition to work with.

Below we have defined some tasks for you do to so you can improve the functionality of your class. They are numbered but you can do them out of order unless they depend on each other. You are also welcome to come up with your own modifications. You may find that as you progress through the tasks, you want to adjust some things you have done earlier.

Every time you complete a task, run the code chunk that defines the updated class so you get the new functionality and **test** that it works as you intended.

We recommend that you work in a group. Think about:

* What information will you need to do the task? And how can this information be supplied and move around the class?
* Should any of the user-supplied infos be saved in an instance attribute?
* What about the result, do you wanna save that as an instance attribute? See `my_data.df`.
* If it's a method, are there any parameters you want to be able to pass?

Add doc strings to methods and write comments where you think it is necessary. Have fun!

## Task 1

Expand the `load_data` method with a separator parameter so our class will also be able to read files that are i.e. separated by ';' instead of ','. Make this as general as possible. The user should supply the separator since they should know. Also, add a default value for this parameter for comma separated files. Have another look at the functions notebook to remember how you set default parameters.

## Task 2

Add a `try ... except` block in the data loading function that catches if the user gives a filename that doesn't exist.

## Task 3

Add a method the allows the user to make one of the columns the target value. This column should be removed from the pandas df and be saved in its own attribute, i.e. `.target`.  

## Task 4

Add a method to create a boxplot of the asked for column, i.e. with a call like `my_data.plot_column(column_name)`. Expand the method to let the user choose a plot type, i.e. histogram, violin, bar plot.

<details>
<summary>Hint</summary>

Remember, methods can have a return just like functions so you can do something like:

```python
#inside the class definition!
def load_data(self):
    #do things

    return plt #return the plot object
    #OR:
    plt.show() #force display of the plot object

# when you call the method, the plot should display under the code cell
my_data.plot_column('Age')
```

</details>

## Task 5

Add a method that will calculate a correlation matrix. The user should indicate if they also want it plotted as a heatmap.


## Task 6

Add a method that will let the user define which columns are numeric. How can you store this information?

## Task 7

Add a method that produces a scaled data object (to i.e. give to a PCA or do a linear regression). You need to complete **task 6** to be able to do this.

You can choose in which form (array, df, something else???) this scaled object should be saved inside the class, if at all.

## Task 8

Add a method that performs PCA on the data. This could be based on functions you have already written for earlier exercises.  You need to complete **task 6 and 7** to be able to do this.

Think about what you want to do with the PCA result, do you to save or return it?

## Task 9

Add a method that takes a PCA result and creates a biplot. You will need to complete **task 8** to be able to do that.