## Using pandas to Analyze Age in the GSS Dataset

In [None]:
# Step 1. Import required library
import pandas as pd

# Step 2. Read the dataset 
DATA_PATH = "GSS.csv"  
df = pd.read_csv(DATA_PATH)

# Step 3. Check the dataset structure
print("Total rows:", len(df))
print("Columns:", len(df.columns))
print("Numeric columns:", df.select_dtypes(include='number').columns.tolist())

# Step 4. Select the target numeric column
target_col = 'age'

# Step 5. Clean the data
ages = pd.to_numeric(df[target_col], errors='coerce').dropna()

# Step 6. Compute mean, median, and mode
mean_age = ages.mean()
median_age = ages.median()
mode_age = ages.mode().tolist()

# Step 7. Print results
print(f"Mean age: {mean_age:.2f}")
print(f"Median age: {median_age:.2f}")
print(f"Mode age(s): {mode_age}")


Total rows: 2832
Columns: 152
Numeric columns: ['id', 'age', 'cohort', 'educ', 'prozfor2', 'prozfor3', 'sei', 'wordsum']
Mean age: 45.56
Median age: 42.00
Mode age(s): [34.0]


## Calculating Mean, Median, and Mode â€” the Hard Way

In [3]:
# Step 1. Import standard libraries only
import csv

# Step 2. Read the CSV file manually
DATA_PATH = "GSS.csv"

ages = []

with open(DATA_PATH, "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        try:
            age_value = float(row["age"])
            ages.append(age_value)
        except:
            # Skip if missing or invalid
            continue

print("Total valid age entries:", len(ages))

# Step 3. Compute mean (average)
mean_age = sum(ages) / len(ages)

# Step 4. Compute median
sorted_ages = sorted(ages)
n = len(sorted_ages)
if n % 2 == 1:
    median_age = sorted_ages[n // 2]
else:
    median_age = (sorted_ages[n // 2 - 1] + sorted_ages[n // 2]) / 2

# Step 5. Compute mode (most frequent value)
counts = {}
for value in ages:
    counts[value] = counts.get(value, 0) + 1

max_count = max(counts.values())
modes = [val for val, freq in counts.items() if freq == max_count]

# Step 6. Print results
print(f"Mean age (manual): {mean_age:.2f}")
print(f"Median age (manual): {median_age:.2f}")
print(f"Mode age(s) (manual): {modes}")


Total valid age entries: 2828
Mean age (manual): 45.56
Median age (manual): 42.00
Mode age(s) (manual): [34.0]


## Text-Based Visualization with Emoji Bars

In [5]:
# Step 1. Import pandas for data prep
import pandas as pd

DATA_PATH = "GSS.csv"
df = pd.read_csv(DATA_PATH)

# Step 2. Prepare clean age data
ages = pd.to_numeric(df["age"], errors="coerce").dropna()

# Step 3. Define bins and labels correctly
bins = [10, 20, 30, 40, 50, 60, 70, 80, 90]  # 9 edges â†’ 8 bins
labels = [f"{b}s" for b in range(10, 90, 10)]  # creates: ['10s', '20s', ..., '80s']

# Step 4. Cut into bins
age_groups = pd.cut(ages, bins=bins, labels=labels, right=False)

# Step 5. Count frequency per group
age_counts = age_groups.value_counts().sort_index()

# Step 6. Visualize (standard library only)
SCALE = 10  # each symbol = ~10 people
SYMBOL = "ðŸŸ©"

print("ðŸ“Š Age Distribution in the GSS Dataset")
print(f"(Each {SYMBOL} represents ~{SCALE} respondents)\n")

for group, count in age_counts.items():
    bar = SYMBOL * int(count / SCALE)
    print(f"{group:>4} | {bar} ({int(count)} people)")


ðŸ“Š Age Distribution in the GSS Dataset
(Each ðŸŸ© represents ~10 respondents)

 10s | ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ© (42 people)
 20s | ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ© (495 people)
 30s | ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ© (687 people)
 40s | ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ© (580 people)
 50s | ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ðŸŸ©ð