## Collaborator 2

### Function scat_plt (var1, var2, groups)

Given the variables var1 and var2, creates a scatterplot of the two variables, displaying the information given on groups using different colors(or symbols). That is, observations belonging to a given group 1 will be displayed on a given color, observations belonging to group 2 will be displayed with a different color and so on. var1 is displayed in the xaxis and var2 in the y-axis. The obtained plot should contain a legend displaying the information regarding to groups

- Inputs:
var1 and var2: Two given variables of the same length
groups: A variable the same size as var1 and var2 where the information regarding to group belonging is contained.

In [19]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, MinMaxScaler

import seaborn as sns
import matplotlib.pyplot as plt

def scat_plt(var1, var2, groups):
    # Create a DataFrame
    data = {'var1': var1, 'var2': var2, 'groups': groups}
    df = pd.DataFrame(data)

    # Use seaborn to create scatter plot with different colors for each group
    sns.scatterplot(x='var1', y='var2', hue='groups', data=df, palette='rainbow')

    # Set labels and legend
    plt.xlabel('Variable 1')
    plt.ylabel('Variable 2')
    plt.legend(title='Groups')
    plt.show()

### Function normalize (df, op)

Given a dataframe df, normalizes all variables according to the options in op. op can only take two values, 0 for normalizing the variables based on the z-score and 1 for normalizing the variables based on the min_max approach. The function returns a dataframe consisting on normalized variables.
Be aware of avoiding normalizing variables that are supposed to be categorical, even if the type of such variables is not specifically categorical (that is, a variable can be of type numerical even when it represents categories)

- Inputs:
df: A given dataframe
op: numeric variable (either 0 or 1)

- Output
norm_df: normalized dataframe

In [25]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler

def normalize(df, op):
    # Identify numerical and categorical columns
    numeric_cols = df.select_dtypes(include='number').columns

    # Make a copy of the original DataFrame to avoid modifying it directly
    norm_df = df.copy()

    # Normalize numerical columns based on the selected option
    if op == 0:  # Z-score normalization
        scaler = StandardScaler()
        norm_df[numeric_cols] = scaler.fit_transform(norm_df[numeric_cols])
    elif op == 1:  # Min-Max normalization
        scaler = MinMaxScaler()
        norm_df[numeric_cols] = scaler.fit_transform(norm_df[numeric_cols])

    return norm_df
