# Creating or Removing Columns from DataFrame

In data analysis with pandas, it's often necessary to add new columns to a DataFrame to derive new insights or perform calculations based on existing data. This section explores methods for creating new columns and removing existing columns from a DataFrame, using the `apply` function for custom operations.

## Using `apply` to Create New Columns

The `apply` function in pandas allows you to apply a function along an axis of the DataFrame. It is particularly useful for creating new columns by applying a function to each row or column.

#### Example Using Gapminder Dataset

In [4]:
import pandas as pd

# URL to the raw CSV file on GitHub
url = 'https://raw.githubusercontent.com/kirenz/datasets/master/gapminder.csv'

# Read the CSV file into a DataFrame
gapminder = pd.read_csv(url)

#### Example 1: 

Create a new column 'population_millions' by dividing 'pop' by 1,000,000

In [2]:
gapminder['population_millions'] = gapminder['pop'].apply(lambda x: x / 1e6)
print(gapminder.head())

       country continent  year  lifeExp       pop   gdpPercap  \
0  Afghanistan      Asia  1952   28.801   8425333  779.445314   
1  Afghanistan      Asia  1957   30.332   9240934  820.853030   
2  Afghanistan      Asia  1962   31.997  10267083  853.100710   
3  Afghanistan      Asia  1967   34.020  11537966  836.197138   
4  Afghanistan      Asia  1972   36.088  13079460  739.981106   

   population_millions  
0             8.425333  
1             9.240934  
2            10.267083  
3            11.537966  
4            13.079460  


#### Example 2: 

Create a new column 'life_expectancy_category' based on life expectancy ranges

In [3]:
def categorize_life_expectancy(life_exp):
    if life_exp > 75:
        return 'High'
    elif life_exp > 60:
        return 'Medium'
    else:
        return 'Low'

gapminder['life_expectancy_category'] = gapminder['lifeExp'].apply(categorize_life_expectancy)
print(gapminder.head())

       country continent  year  lifeExp       pop   gdpPercap  \
0  Afghanistan      Asia  1952   28.801   8425333  779.445314   
1  Afghanistan      Asia  1957   30.332   9240934  820.853030   
2  Afghanistan      Asia  1962   31.997  10267083  853.100710   
3  Afghanistan      Asia  1967   34.020  11537966  836.197138   
4  Afghanistan      Asia  1972   36.088  13079460  739.981106   

   population_millions life_expectancy_category  
0             8.425333                      Low  
1             9.240934                      Low  
2            10.267083                      Low  
3            11.537966                      Low  
4            13.079460                      Low  


```{note} 
* **Example 1**: Uses `apply` with a lambda function to create a new column 'population_millions' by dividing the 'pop' column values by 1,000,000.

* **Example 2**: Defines a custom function `categorize_life_expectancy` to categorize life expectancy values into 'High', 'Medium', or 'Low'. The function is then applied to create a new column 'life_expectancy_category' based on the 'lifeExp' column.
```

## Removing Columns from DataFrame

To remove columns from a DataFrame, you can use the `.drop()` method or Python's `del` statement.

#### Example 3: 

Remove the 'gdpPercap' column from the gapminder DataFrame


In [None]:
gapminder.drop(columns=['gdpPercap'], inplace=True)
print(gapminder.head())

# Alternatively, use the del statement to remove a column
del gapminder['population_millions']
print(gapminder.head())

```{Notes}

* **`.drop()` Method**: Removes specified columns from the DataFrame using the columns parameter and setting `inplace=True` to modify the DataFrame in place.

* **`del` Statement**: Provides an alternative way to delete columns from a DataFrame by specifying the column name after del.
```