In [1]:
import pandas as pd

# Generate sample data
data = pd.DataFrame({
    'age': [25, 30, 35, 40, None],
    'salary': [50000, 60000, 70000, 80000, 90000],
    'gender': ['male', 'female', 'female', None, 'male'],
    'purchased': ['yes', 'no', 'yes', 'no', 'yes']
})

# Display the initial data
print("Initial data:")
print(data)

# Handle missing values by filling with median for numerical and mode for categorical
data['age'].fillna(data['age'].median(), inplace=True)
data['gender'].fillna(data['gender'].mode()[0], inplace=True)

print("\nData after handling missing values:")
print(data)

# Encode categorical variables using one-hot encoding
data_encoded = pd.get_dummies(data, columns=['gender', 'purchased'], drop_first=True)

print("\nData after encoding categorical variables:")
print(data_encoded)

# Scale numerical features
data_encoded['age'] = (data_encoded['age'] - data_encoded['age'].mean()) / data_encoded['age'].std()
data_encoded['salary'] = (data_encoded['salary'] - data_encoded['salary'].mean()) / data_encoded['salary'].std()

print("\nData after scaling numerical features:")
print(data_encoded)

# Create new features (example: interaction term)
data_encoded['age_salary_interaction'] = data_encoded['age'] * data_encoded['salary']

print("\nData with new feature:")
print(data_encoded)


Initial data:
    age  salary  gender purchased
0  25.0   50000    male       yes
1  30.0   60000  female        no
2  35.0   70000  female       yes
3  40.0   80000    None        no
4   NaN   90000    male       yes

Data after handling missing values:
    age  salary  gender purchased
0  25.0   50000    male       yes
1  30.0   60000  female        no
2  35.0   70000  female       yes
3  40.0   80000  female        no
4  32.5   90000    male       yes

Data after encoding categorical variables:
    age  salary  gender_male  purchased_yes
0  25.0   50000         True           True
1  30.0   60000        False          False
2  35.0   70000        False           True
3  40.0   80000        False          False
4  32.5   90000         True           True

Data after scaling numerical features:
        age    salary  gender_male  purchased_yes
0 -1.341641 -1.264911         True           True
1 -0.447214 -0.632456        False          False
2  0.447214  0.000000        False         

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['age'].fillna(data['age'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['gender'].fillna(data['gender'].mode()[0], inplace=True)
