# Project 3



## Gender Differences in Support for Abortion: Evidence from the General Social Survey

Public attitudes toward abortion have long been a central topic in American social and political life. While debates often focus on legal rights, morality, and religion, public opinion is also shaped by demographic characteristics such as gender. A common assumption is that women, who are more directly affected by reproductive policies, tend to be more supportive of abortion than men. But is this assumption actually supported by survey data?

In this project, I explore whether men and women differ in their support for abortion under several specific circumstances using data from the **General Social Survey (GSS)**. Rather than focusing on abortion in the abstract, the analysis examines support across four concrete scenarios: abortion in cases of rape, when the woman’s health is at risk, when the family is poor, and for any reason.

---

## Data Source

The data used in this project come from the **General Social Survey (GSS)**, accessed via the NORC GSS Data Explorer:

https://gssdataexplorer.norc.org/

The GSS is a nationally representative survey of U.S. adults that has been conducted regularly since 1972. It collects detailed information on social attitudes, demographics, and political beliefs. For this project, I use a **subset of the full GSS dataset**, provided as a CSV file (`GSS.csv`), and focus on a small number of variables related to gender and abortion attitudes.

---

## Research Question and Hypothesis

**Research Question:**  
Do men and women differ in their support for abortion under different circumstances?

**Key Variables:**
- `woman` (gender: Man or Woman)
- `abrape` (abortion in case of rape)
- `abhlth` (abortion when the woman’s health is at risk)
- `abpoor` (abortion when the family is poor)
- `abany` (abortion for any reason)

**Hypothesis:**  
I initially hypothesize that women are more likely than men to support abortion, especially in less restrictive scenarios such as abortion for any reason.

In the sections that follow, I clean and reshape the data, compute gender-specific support rates across the four scenarios, and visualize the results to evaluate this hypothesis.


## Step 1. Import Libraries and Load the Dataset


In [13]:
# Step 1. Import required libraries
import pandas as pd
import os

# Step 2. Define data path
DATA_PATH = "GSS.csv"

# Step 3. Check whether the file exists
if not os.path.exists(DATA_PATH):
    raise FileNotFoundError(f"File not found: {DATA_PATH}")

# Step 4. Read the dataset
df = pd.read_csv(DATA_PATH)

# Step 5. Display basic information
df.head()


Unnamed: 0,id,abany,abdefctw,abdefect,abhlth,abnomore,abpoor,abpoorw,abrape,age,...,usepsyc1,usepsyc2,usepsyc3,usepsyc4,useskill,wordsum,wrkstat,xnorcsiz,woman,region4
0,1,,dk,dk,,dk,dk,dk,dk,60.0,...,,,,,,0.0,working fulltime,"suburb, lrg city",Man,South
1,2,,wrong only sometimes,,,,,dk,,27.0,...,,,,,,0.0,working parttime,"suburb, lrg city",Woman,South
2,3,,always wrong,,,,,always wrong,,36.0,...,,,,,,0.0,working fulltime,"suburb, lrg city",Man,South
3,4,no,wrong only sometimes,yes,yes,no,no,almost always wrong,yes,21.0,...,,,,,,0.0,working fulltime,"suburb, lrg city",Man,South
4,5,yes,,yes,yes,yes,yes,,yes,35.0,...,somewht unlikely,somewhat likely,somewhat likely,somewhat likely,a lot,7.0,working fulltime,"suburb, lrg city",Woman,South


## Step 2. Inspect Key Variables

In [14]:
# Step 2. Inspect column names
df.columns


Index(['id', 'abany', 'abdefctw', 'abdefect', 'abhlth', 'abnomore', 'abpoor',
       'abpoorw', 'abrape', 'age',
       ...
       'usepsyc1', 'usepsyc2', 'usepsyc3', 'usepsyc4', 'useskill', 'wordsum',
       'wrkstat', 'xnorcsiz', 'woman', 'region4'],
      dtype='object', length=152)

In [16]:
# Step 2. Inspect key variables that actually exist in this dataset

print("Gender (woman) distribution:")
print(df["woman"].value_counts(dropna=False))

print("\nSupport for abortion in case of rape:")
print(df["abrape"].value_counts(dropna=False))

print("\nSupport for abortion when woman's health is endangered:")
print(df["abhlth"].value_counts(dropna=False))

print("\nSupport for abortion when family is poor:")
print(df["abpoor"].value_counts(dropna=False))

print("\nSupport for abortion for any reason:")
print(df["abany"].value_counts(dropna=False))



Gender (woman) distribution:
woman
Woman    1600
Man      1232
Name: count, dtype: int64

Support for abortion in case of rape:
abrape
yes    1439
NaN     953
no      357
dk       83
Name: count, dtype: int64

Support for abortion when woman's health is endangered:
abhlth
yes    1578
NaN    1036
no      218
Name: count, dtype: int64

Support for abortion when family is poor:
abpoor
no     994
NaN    954
yes    789
dk      95
Name: count, dtype: int64

Support for abortion for any reason:
abany
NaN    1054
no     1050
yes     728
Name: count, dtype: int64


## Step 3. Select relevant columns

In [17]:
# Step 3. Select only relevant variables for the analysis
gss_sub = df[["woman", "abrape", "abhlth", "abpoor", "abany"]].copy()

gss_sub.head()


Unnamed: 0,woman,abrape,abhlth,abpoor,abany
0,Man,dk,,dk,
1,Woman,,,,
2,Man,,,,
3,Man,yes,yes,no,no
4,Woman,yes,yes,yes,yes


## Step 4. Clean the abortion response variables

In [19]:
# Step 4. Clean abortion response variables
# Keep only rows where at least one abortion question is answered with yes or no

valid_responses = ["yes", "no"]

for col in ["abrape", "abhlth", "abpoor", "abany"]:
    gss_sub[col] = gss_sub[col].where(gss_sub[col].isin(valid_responses))

gss_sub.head()


Unnamed: 0,woman,abrape,abhlth,abpoor,abany
0,Man,,,,
1,Woman,,,,
2,Man,,,,
3,Man,yes,yes,no,no
4,Woman,yes,yes,yes,yes


## Step 5. Reshape data from wide to long format

In [20]:
# Step 5. Convert the dataset from wide format to long format
gss_long = gss_sub.melt(
    id_vars="woman",
    value_vars=["abrape", "abhlth", "abpoor", "abany"],
    var_name="abortion_scenario",
    value_name="support_abortion"
)

gss_long.head()


Unnamed: 0,woman,abortion_scenario,support_abortion
0,Man,abrape,
1,Woman,abrape,
2,Man,abrape,
3,Man,abrape,yes
4,Woman,abrape,yes


## Step 6. Create binary support indicator

In [21]:
# Step 6. Create a binary indicator for abortion support
# 1 = yes, 0 = no
gss_long["support_flag"] = (gss_long["support_abortion"] == "yes").astype(int)

# Drop rows with missing support after cleaning
gss_long = gss_long[gss_long["support_abortion"].notna()]

gss_long.head()


Unnamed: 0,woman,abortion_scenario,support_abortion,support_flag
3,Man,abrape,yes,1
4,Woman,abrape,yes,1
5,Man,abrape,yes,1
8,Woman,abrape,yes,1
9,Man,abrape,yes,1


## Step 7. GROUPING by gender and abortion scenario

In [22]:
# Step 7. Group by gender and abortion scenario
grouped = (
    gss_long
    .groupby(["woman", "abortion_scenario"])["support_flag"]
    .mean()
    .reset_index()
)

# Convert to percentages
grouped["support_percent"] = grouped["support_flag"] * 100

grouped


Unnamed: 0,woman,abortion_scenario,support_flag,support_percent
0,Man,abany,0.41779,41.778976
1,Man,abhlth,0.892388,89.238845
2,Man,abpoor,0.448322,44.832215
3,Man,abrape,0.823138,82.31383
4,Woman,abany,0.403475,40.34749
5,Woman,abhlth,0.868472,86.847195
6,Woman,abpoor,0.438343,43.834297
7,Woman,abrape,0.785441,78.544061


## Step 8. Reshape again for presentation

In [23]:
# Step 8. Pivot the grouped table to wide format
plot_data = grouped.pivot(
    index="abortion_scenario",
    columns="woman",
    values="support_percent"
)

plot_data


woman,Man,Woman
abortion_scenario,Unnamed: 1_level_1,Unnamed: 2_level_1
abany,41.778976,40.34749
abhlth,89.238845,86.847195
abpoor,44.832215,43.834297
abrape,82.31383,78.544061


## Step 9. Create a Readable Summary Table

In [24]:
# Step 9. Map scenario codes to readable labels
scenario_labels = {
    "abrape": "Rape",
    "abhlth": "Health risk",
    "abpoor": "Family poor",
    "abany": "Any reason"
}

plot_data_readable = plot_data.rename(index=scenario_labels)

plot_data_readable.round(1)


woman,Man,Woman
abortion_scenario,Unnamed: 1_level_1,Unnamed: 2_level_1
Any reason,41.8,40.3
Health risk,89.2,86.8
Family poor,44.8,43.8
Rape,82.3,78.5


## Step 10. Reshape the Summary Table for Visualization


In [25]:
# Step 10. Convert the summary table to long format for Plotly

df_plot = (
    plot_data_readable
    .reset_index()
    .melt(
        id_vars="abortion_scenario",
        value_vars=["Man", "Woman"],
        var_name="Gender",
        value_name="Support_Percent"
    )
)

df_plot


Unnamed: 0,abortion_scenario,Gender,Support_Percent
0,Any reason,Man,41.778976
1,Health risk,Man,89.238845
2,Family poor,Man,44.832215
3,Rape,Man,82.31383
4,Any reason,Woman,40.34749
5,Health risk,Woman,86.847195
6,Family poor,Woman,43.834297
7,Rape,Woman,78.544061


## Step 11. Visualization of Gender Differences in Abortion Support


In [27]:
import plotly.express as px
from IPython.display import HTML

fig = px.bar(
    df_plot,
    x="abortion_scenario",
    y="Support_Percent",
    color="Gender",
    barmode="group",
    hover_data={
        "abortion_scenario": True,
        "Gender": True,
        "Support_Percent": ':.1f'
    },
    labels={
        "abortion_scenario": "Abortion Scenario",
        "Support_Percent": "Percent Supporting Abortion (%)",
        "Gender": "Gender"
    },
    title="Support for Abortion by Gender and Scenario (GSS)"
)

HTML(fig.to_html(include_plotlyjs="cdn", full_html=False))


## Conclusion

This project examined whether men and women differ in their support for abortion under different circumstances using data from the General Social Survey (GSS).

After reshaping the data from wide to long format and grouping by both gender and abortion scenario, the results do not support my original hypothesis. I expected women to show higher levels of support for abortion than men, especially in less restrictive scenarios such as abortion for any reason. Instead, men consistently show slightly higher support across all four scenarios: in cases of rape, when the woman’s health is at risk, when the family is poor, and for abortion for any reason.

The gender gap is relatively small, and both men and women show high levels of support in the “rape” and “health risk” scenarios. However, the fact that male support is slightly higher in each case is an unexpected finding. One possible explanation is that other factors—such as age, religion, or political ideology—may be correlated with both gender and abortion attitudes, and these factors are not controlled for in this simple analysis.

A limitation of this study is that it uses only a limited set of variables and relies on a single cross-sectional dataset. Future research could extend this work by adding more control variables, exploring time trends, or estimating regression models to better understand why men appear slightly more supportive of abortion than women in this sample.
