Youâ€™ve been hired by a nonprofit called ConnectSA, which provides digital literacy and streaming
services to underserved communities in South Africa.They have collected customer data from outreach efforts in Cape Town, Durban, and Soweto.Your job is to clean, explore, and visualize this data to help them understand who their customers
are and how engagement has grown over time.Use the dataset titled customers.csv containing 100 customer records from a nonprofit streaming
initiative.

In [3]:
# --- Solution for 1a ---
# Import necessary libraries
import pandas as pd
import numpy as np

# Load the dataset with a simple variable name
df = pd.read_csv('customers.csv')

# 1. Display the first 5 rows of the DataFrame
print("--- First 5 Rows of the DataFrame ---")
print(df.head())
print("\n" + "-"*40 + "\n") # Adding a separator for clarity

# 2. Use .info() to explore the structure of the data
print("--- DataFrame Info ---")
df.info()
print("\n" + "-"*40 + "\n") # Adding a separator for clarity

# 3. Use .describe() to get a statistical summary
print("--- Descriptive Statistics ---")
print(df.describe())

--- First 5 Rows of the DataFrame ---
   customer_id             name  age             city  \
0            1      John Rivers   31       New Angela   
1            2  Richard Mcclure   21        Jamieside   
2            3       Eric Smith   34   Port Mariastad   
3            4      Megan Price   47   South Samantha   
4            5     Pedro Guzman   20  South Laurabury   

                        country subscription_date  
0         Sao Tome and Principe        2024-09-17  
1  Svalbard & Jan Mayen Islands        2021-07-19  
2                     Nicaragua        2025-06-08  
3                        Turkey        2021-12-09  
4                     Mauritius        2023-03-11  

----------------------------------------

--- DataFrame Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   customer_id        100 non-null    int6

In [None]:
# --- Solution for 1b ---
print("--- Question 1b: Age Statistics ---")

# Calculate the mean and standard deviation of the 'age' column using NumPy
# CORRECTED: Changed 'df_customers' to 'df' for consistency
mean_age = np.mean(df['age'])
std_dev_age = np.std(df['age'])

print(f"Mean Age: {mean_age:.2f}")
print(f"Standard Deviation of Age: {std_dev_age:.2f}\n")

# Create a boolean array to identify customers under 25
# This will be a Series of True/False values
is_under_25 = df['age'] < 25
print("Boolean array for the first 5 customers under 25:")
print(is_under_25.head())
print("-" * 30)


# --- Solution for 1c ---
print("\n--- Question 1c: Filtering for Customers Under 25 ---")

# Use the boolean array 'is_under_25' to filter the DataFrame
# This selects only the rows where the 'is_under_25' value is True
customers_under_25 = df[is_under_25]

# Display the resulting DataFrame
print("Displaying all customers under the age of 25:")
print(customers_under_25)
print("-" * 30)


# --- Solution for 1d ---
print("\n--- Question 1d: Creating the 'age_group' Column ---")

# Define the conditions (bins) and the corresponding labels for our groups
# [cite_start]Bins: 0-24 (Youth), 25-59 (Adult), 60+ (Senior) [cite: 68, 69, 70]
bins = [0, 24, 59, float('inf')]
labels = ['Youth', 'Adult', 'Senior']

# Create the new 'age_group' column using the pandas.cut() function
# This function segments the data into the bins we defined
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels, right=True)

# Display the first 10 rows of the DataFrame to show the new column
print("DataFrame with the new 'age_group' column:")
print(df.head(10))
print("-" * 30)

--- Question 1b: Age Statistics ---
Mean Age: 44.70
Standard Deviation of Age: 17.38

Boolean array for the first 5 customers under 25:
0    False
1     True
2    False
3    False
4     True
Name: age, dtype: bool
------------------------------

--- Question 1c: Filtering for Customers Under 25 ---
Displaying all customers under the age of 25:
    customer_id              name  age                 city  \
1             2   Richard Mcclure   21            Jamieside   
4             5      Pedro Guzman   20      South Laurabury   
10           11       Erin Lucero   21            Smithland   
37           38     Jerry Johnson   22       Lake Ashleyton   
39           40      Leroy Santos   18    South Thomasmouth   
46           47    Tiffany Arnold   21     South Samuelport   
47           48  Meredith Johnson   21         Foleychester   
56           57     Vanessa Perez   24         East Vincent   
58           59        Joel Brown   24  West Jefferyborough   
61           62    Cathy