# FORMATIVE ASSESSMENT OF ADOLESCENT GIRLS AND YOUNG WOMEN’S HIV, GENDER-BASED VIOLENCE AND SEXUAL AND REPRODUCTIVE HEALTH STATUS

## Background
Teenage pregnancy and motherhood have been a major health and social concern in Uganda as it infringes upon the human rights of girls but also hinders their ability to achieve their full socioeconomic development. Teenagers who engage in sexual intercourse at a young age face an elevated risk of becoming pregnant and giving birth. The 2022 UDHS indicated that 23.5% of women age 15-19 had initiated childbearing by the time of the survey, with 18.4% having already had a live birth, while 5.1% were pregnant with their first child.

Patterns by background characteristics:
* By age 16, 1 in every 10 women age 15-19 has begun childbearing. This percentage significantly rises to almost 4 out of every 10 by the time they reach 18 (Table 5.12).
* Teenagers in rural areas started childbearing earlier than those in urban areas. Twenty five percent of women age 15-19 in rural areas have begun childbearing, compared with 21% in urban areas.
* Teenage childbearing varies by region. The percentage of women age 15-19 who have begun childbearing ranges from 15% in Kigezi region to 28 % -30% in Busoga and Bukedi sub regions.
* The proportion of women age 15-19 who have begun childbearing decreases with both education and wealth.

Regions: The selection of the districts that we surveyed was informed by HIV prevalence dynamics and implementing partner support: we went to districts where there were Global Fund-supported implementing partners working to reduce the new number of new HIV infections among AGYW, improve SRH (e.g. reduce teenage pregnancy) and GBV indicators in the targeted districts.

## Data Analysis

The output of this notebook includes a data analysis responding to the research questions.

### Data Loading

In [3]:
# Libraries
import warnings
import os
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder
from spicy import stats
from scipy.stats import zscore
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm
from sklearn.metrics import r2_score

# Set-up environment
pd.options.display.float_format = '{:.2f}'.format
pd.set_option('display.max_colwidth', None)
sns.set_theme(style="whitegrid", context="paper")
os.chdir('/Users/nataschajademinnitt/Documents/5. Data Analysis/teenage_pregnancy')
print("Current directory:", os.getcwd())
warnings.filterwarnings("ignore")

Current directory: /Users/nataschajademinnitt/Documents/5. Data Analysis/teenage_pregnancy


In [56]:
# Load the data
df_raw = pd.read_csv("./data/processed_df.csv")

## 1. Wealth and teenage pregnancy

Research question: How does household wealth predict the likelihood of teenage pregnancy?

Sample:
* Cases: Girls who experienced pregnancy between the ages of 10–19 (irrespective of their current age) = 1,925
* Controls: Girls who have not been pregnant and are currently older than 19 years = 1,513 (to ensure that non-pregnant girls have aged out of the risk period to avoid censoring issues–i.e. ensuring that non‑pregnant girls have had enough time to experience (or not experience) a pregnancy).

Controls:
* Age (age_completed)
* Marital status (been_married_binary)

Results:
* Household Wealth: Compared to girls in the High wealth group (the reference category), those in the Low wealth group have 5.73 times the odds of experiencing a teenage pregnancy, and those in the Medium wealth group have 3.10 times the odds. Both effects are highly statistically significant (p < 0.001).

In [7]:
# Subset the data
df = df_raw.loc[
    ((df_raw['been_preg'] == 1) & (df_raw['age_preg'] <= 19)) |
    ((df_raw['been_preg'] == 0) & (df_raw['age_completed'] >= 20))
]

# Create wealth dummies
wealth_dummies = pd.get_dummies(df['wealth_tertile'], prefix='wealth', drop_first=True)

# Concatenate the wealth dummies with the control variables
df_model_cat = pd.concat([df, wealth_dummies], axis=1)

# Combine predictors: wealth dummies plus controls
controls = ['age_completed', 'been_married_binary']
predictors = list(wealth_dummies.columns) + controls

# Build the design matrix
X_cat = df_model_cat[predictors]
X_cat = sm.add_constant(X_cat)
X_cat = X_cat.astype(float)

# Outcome variable
y_cat = df_model_cat['been_preg']

# Drop missing values in the predictors and outcome
df_model_cat = df_model_cat.dropna(subset=predictors + ['been_preg'])
X_cat = sm.add_constant(df_model_cat[predictors]).astype(float)
y_cat = df_model_cat['been_preg']

# Fit the logistic regression model
model_cat = sm.Logit(y_cat, X_cat).fit(disp=False)
print(model_cat.summary())

# Convert coefficients to odds ratios
or_cat = np.exp(model_cat.params)
print("Odds Ratios (categorical with controls):\n", or_cat)

                           Logit Regression Results                           
Dep. Variable:              been_preg   No. Observations:                 3438
Model:                          Logit   Df Residuals:                     3433
Method:                           MLE   Df Model:                            4
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                  0.3884
Time:                        15:54:11   Log-Likelihood:                -1442.3
converged:                       True   LL-Null:                       -2358.3
Covariance Type:            nonrobust   LLR p-value:                     0.000
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                   3.8658      0.593      6.521      0.000       2.704       5.028
wealth_Low              1.7452      0.118     14.786      0.000       1.514       1.977
wealth_Medium   

## 2. Pregnancy and school drop out

Research question: Does pregnancy increase dropout risk among adolescents?
* This model examines dropout differences among all adolescents and assesses how pregnancy status, among other factors, predicts dropout. The coefficient on been_preg tells you, adjusting for the other factors, how much more (or less) likely girls who have been pregnant are to drop out compared to those who have not.

Sample:
* Girls who have experienced teenage pregnancy (been_preg = 1; pregnancy occurred at ≤ 19)
* Girls who have not been pregnant (been_preg = 0) and who are still in the risk window (age ≤ 19)

Controls:
* Household Wealth: Represented by two dummy variables—wealth_Low and wealth_Medium (with the High wealth group as the reference).
* Age: Using age_completed (although within this age‐restricted sample, there remains variation).
* Marital Status: Using been_married_binary (1 if ever married, 0 otherwise).

Results:
* Teenage Pregnancy (been_preg): Girls who have experienced teenage pregnancy have 41% higher odds of dropping out relative to those who have not, after adjusting for these factors.
* Marital Status (been_married_binary): Girls who have been married have significantly lower odds of school dropout (OR ≈ 0.61), suggesting that marriage (which may be associated with intentional early childbearing) is inversely related to dropout in this context.
* Age (age_completed): Older adolescents (within the ≤19 group) are slightly less likely to drop out (OR ≈ 0.94 per additional year).
* Household Wealth: Girls from Low wealth households have about 4 times the odds of dropping out, while those from Medium wealth households have roughly 2.4 times the odds, compared to girls from High wealth households.
* School Location: Urban location (scol_loc_2.0 – township) is associated with a massive reduction in dropout odds compared to the reference (rural), though the "other" category (scol_loc_3.0) is unstable.

Follow-up:
* How to handle schol_loc_3.0?

In [9]:
# Not been pregnant (currently 10-19) | Been pregnant (currently 10-19)
df_ado = df_raw.loc[(df_raw['age_completed'] <= 19)]

# Wealth tertiles
wealth_dummies = pd.get_dummies(
    df_ado['wealth_tertile'], prefix='wealth', drop_first=True
)

# School dummies
scol_loc_dummies = pd.get_dummies(df_ado['scol_location'], prefix='scol_loc', drop_first=True)

predictors = ['been_preg', 'been_married_binary', 'age_completed'] + list(wealth_dummies.columns) + list(scol_loc_dummies.columns)

# Build design matrix:
df_model_cat = pd.concat([df_ado, wealth_dummies, scol_loc_dummies], axis=1)
df_model_cat = df_model_cat.dropna(subset=predictors + ['dropped_out'])
X_new = df_model_cat[predictors]
X_new = sm.add_constant(X_new).astype(float)
y_new = df_model_cat['dropped_out']

# Fit the logistic regression:
model_new = sm.Logit(y_new, X_new).fit(disp=False)
print(model_new.summary())
or_new = np.exp(model_new.params)
print("\nAdjusted Odds Ratios with Additional Controls:\n", or_new)

                           Logit Regression Results                           
Dep. Variable:            dropped_out   No. Observations:                 4845
Model:                          Logit   Df Residuals:                     4837
Method:                           MLE   Df Model:                            7
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                  0.1078
Time:                        15:54:11   Log-Likelihood:                -1410.1
converged:                      False   LL-Null:                       -1580.5
Covariance Type:            nonrobust   LLR p-value:                 1.171e-69
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -1.9363      0.351     -5.511      0.000      -2.625      -1.248
been_preg               0.3469      0.161      2.158      0.031       0.032       0.662
been_married_bin

Dropping out wealth to understand the effect of marriage on school drop out.
* change schol_loc 3 to township (peri-urban)

In [50]:
# Not been pregnant (currently 10-19) | Been pregnant (currently 10-19)
df_ado = df_raw.loc[(df_raw['age_completed'] <= 19)]

# Wealth tertiles
wealth_dummies = pd.get_dummies(
    df_ado['wealth_tertile'], prefix='wealth', drop_first=True
)

# School dummies
scol_loc_dummies = pd.get_dummies(df_ado['scol_location'], prefix='scol_loc', drop_first=True)

predictors = ['been_preg', 'been_married_binary']

# Build design matrix:
df_model_cat = pd.concat([df_ado], axis=1)
df_model_cat = df_model_cat.dropna(subset=predictors + ['dropped_out'])
X_new = df_model_cat[predictors]
X_new = sm.add_constant(X_new).astype(float)
y_new = df_model_cat['dropped_out']

# Fit the logistic regression:
model_new = sm.Logit(y_new, X_new).fit(disp=False)
print(model_new.summary())
or_new = np.exp(model_new.params)
print("\nAdjusted Odds Ratios with Additional Controls:\n", or_new)

                           Logit Regression Results                           
Dep. Variable:            dropped_out   No. Observations:                 4845
Model:                          Logit   Df Residuals:                     4842
Method:                           MLE   Df Model:                            2
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                0.004151
Time:                        16:41:33   Log-Likelihood:                -1574.0
converged:                       True   LL-Null:                       -1580.5
Covariance Type:            nonrobust   LLR p-value:                  0.001416
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -2.2485      0.053    -42.668      0.000      -2.352      -2.145
been_preg               0.5714      0.154      3.722      0.000       0.271       0.872
been_married_bin

## 3. Pregnancy and school drop out

Research question: Among girls who experienced a teenage pregnancy, what factors predict who drops out versus completes (or stays in) school?
* This model examines differences in dropout among girls who have had a teenage pregnancy, with a special focus on the timing of marriage (whether they married before or after the pregnancy) rather than just a binary "pregnant" indicator. The idea is to understand variation within the pregnant group rather than comparing pregnant to non-pregnant girls.

Sample:
* Girls who experienced a teenage pregnancy (pregnancy at ≤ 19).
* Outcome: School dropout (dropped_out).

Predictors/Controls:
Marital Status (Timing):
* married_after: Girls who married after their teenage pregnancy
* married_before: Girls who married before their teenage pregnancy
* (Reference: Girls who were never married)
Household Wealth:
* wealth_Low and wealth_Medium (compared to High wealth)
School Location:
* scol_loc_2.0 and scol_loc_3.0 (with rural [code 1] as the reference)
Age at Pregnancy:
* age_preg

Interpretation: 
* Marital Status: Among girls who experienced a teenage pregnancy, being married (either before or after) is associated with substantially lower odds of school dropout compared to those who remain never married. Specifically, those who married after pregnancy have a 72% reduction (OR = 0.28) and those who married before have a 50% reduction (OR = 0.50) in dropout odds.
* Household Wealth: Lower household wealth is associated with a markedly higher risk of dropout. Girls from Low wealth households have about 4 times the odds of dropping out (OR = 4.09), and those from Medium wealth households about 2.36 times the odds, compared to girls from High wealth households.
* Age at Pregnancy and School Location: Age at pregnancy does not have a statistically significant effect in this model. School location (specifically category 2) shows a dramatic effect, but the estimates for school location, especially category 3, may be unstable and require further checking.
* Model Fit and Considerations: The overall model has a pseudo R-squared of 0.0410, indicating that these predictors explain about 4.1% of the variation in dropout odds. While not very high, this is common in logistic regressions of behavioral outcomes.

In [11]:
# For the subgroup of girls who experienced teenage pregnancy
df_preg = df_raw.loc[(df_raw['been_preg'] == 1) & (df_raw['age_preg'] <= 19)].copy()

# Create dummies for marriage timing, with 'never' as the reference
timing_dummies = pd.get_dummies(df_preg['marriage_timing'], prefix='married', drop_first=False)
timing_dummies = timing_dummies.drop(columns=['married_never'])

# Create wealth dummies (using High as reference)
wealth_dummies = pd.get_dummies(df_preg['wealth_tertile'], prefix='wealth', drop_first=True)

# Create schol_location dummies
scol_loc_dummies = pd.get_dummies(df_preg['scol_location'], prefix='scol_loc', drop_first=True)

# Define additional controls
additional_controls = ['age_preg']

# Combine predictors
predictors = list(timing_dummies.columns) + list(wealth_dummies.columns) + list(scol_loc_dummies.columns) + additional_controls

# Build design matrix
df_model_abortion = pd.concat([df_preg, timing_dummies, wealth_dummies, scol_loc_dummies], axis=1)
df_model_abortion = df_model_abortion.dropna(subset=['dropped_out'] + predictors)
X = df_model_abortion[predictors]
X = sm.add_constant(X)
X = X.astype(float)
y = df_model_abortion['dropped_out']

# Fit the logistic regression
model_expanded = sm.Logit(y, X).fit(disp=False)
print(model_expanded.summary())

# Calculate odds ratios
or_expanded = np.exp(model_expanded.params)
print("\nAdjusted Odds Ratios with Additional Controls:\n", or_expanded)

                           Logit Regression Results                           
Dep. Variable:            dropped_out   No. Observations:                 1925
Model:                          Logit   Df Residuals:                     1917
Method:                           MLE   Df Model:                            7
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                 0.04100
Time:                        15:54:11   Log-Likelihood:                -644.20
converged:                      False   LL-Null:                       -671.73
Covariance Type:            nonrobust   LLR p-value:                 1.440e-09
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const             -1.4239      0.881     -1.616      0.106      -3.151       0.303
married_after     -1.2634      0.315     -4.014      0.000      -1.880      -0.647
married_before    -0.6916      0.156

In [52]:
# For the subgroup of girls who experienced teenage pregnancy
df_preg = df_raw.loc[(df_raw['been_preg'] == 1) & (df_raw['age_preg'] <= 19)].copy()

# Create dummies for marriage timing, with 'never' as the reference
timing_dummies = pd.get_dummies(df_preg['marriage_timing'], prefix='married', drop_first=False)
timing_dummies = timing_dummies.drop(columns=['married_never'])

# Create wealth dummies (using High as reference)
wealth_dummies = pd.get_dummies(df_preg['wealth_tertile'], prefix='wealth', drop_first=True)

# Create schol_location dummies
scol_loc_dummies = pd.get_dummies(df_preg['scol_location'], prefix='scol_loc', drop_first=True)

# Define additional controls
additional_controls = ['age_preg']

# Combine predictors
predictors = list(timing_dummies.columns)

# Build design matrix
df_model_abortion = pd.concat([df_preg, timing_dummies], axis=1)
df_model_abortion = df_model_abortion.dropna(subset=['dropped_out'] + predictors)
X = df_model_abortion[predictors]
X = sm.add_constant(X)
X = X.astype(float)
y = df_model_abortion['dropped_out']

# Fit the logistic regression
model_expanded = sm.Logit(y, X).fit(disp=False)
print(model_expanded.summary())

# Calculate odds ratios
or_expanded = np.exp(model_expanded.params)
print("\nAdjusted Odds Ratios with Additional Controls:\n", or_expanded)

                           Logit Regression Results                           
Dep. Variable:            dropped_out   No. Observations:                 1925
Model:                          Logit   Df Residuals:                     1922
Method:                           MLE   Df Model:                            2
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                 0.01997
Time:                        16:51:29   Log-Likelihood:                -658.32
converged:                       True   LL-Null:                       -671.73
Covariance Type:            nonrobust   LLR p-value:                 1.495e-06
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const             -1.7228      0.097    -17.742      0.000      -1.913      -1.532
married_after     -1.1630      0.312     -3.726      0.000      -1.775      -0.551
married_before    -0.6130      0.154

**4. How do the age at first sexual intercourse and the context of that initial encounter (including consent, protective behavior, substance influence, and partner type) influence the risk of teenage pregnancy among Ugandan adolescent girls?**

Rationale and Variables
* Age at First Sex (sex_age): Earlier sexual initiation is often associated with higher risk of teenage pregnancy.
* Voluntariness (will_sex_binary): Whether the girl was willing to have sex may reflect her ability to negotiate safer behaviors.
* Protective Behavior (do_anything): This variable can capture whether any contraception or precaution was taken during the first encounter.
* Substance Influence (under_influe): Being under the influence may impair judgment and reduce the likelihood of using contraception.
* Partner Type (person_sex): The type of partner (e.g., boyfriend, stranger, teacher) can provide insight into power dynamics and associated pregnancy risks.
* Inclusion Criterion (life_sex): Restricting to those who have initiated sex ensures that you're looking at meaningful differences in the context of first sexual experiences.

Results:
* sex_age (OR = 0.92): For each additional year in age at first sexual intercourse, the odds of experiencing teenage pregnancy decrease by about 8% (p = 0.008).
* will_sex_binary (OR = 1.45): Girls whose first sexual encounter was classified as “willing” have 1.45 times the odds of experiencing teenage pregnancy compared to girls who were not willing, holding other factors constant (p = 0.004)
* do_anything_binary (OR = 0.36): Girls who reported taking some protective action (e.g., using a condom or other method) at their first sexual encounter have only 36% the odds of becoming pregnant compared to those who did not, meaning they have a 64% reduction in odds (p < 0.001).
* partner_husband (OR = 3.49): If the first sexual partner was a husband (as opposed to the reference category—likely “boyfriend”), the odds of experiencing teenage pregnancy are 3.48 times as high (p < 0.001).

Interpretation:
* A later age at first sex significantly reduces the likelihood of teenage pregnancy.
* Girls who report being willing at first sex have higher odds of teenage pregnancy.
* Taking protective measures at first sex is strongly protective.
* The type of partner matters greatly: having a husband as the first sexual partner is associated with a more than threefold higher risk of teenage pregnancy compared to a boyfriend, while first sex with a stranger appears to lower risk, though less robustly.

In [58]:
# Subset to adolescents who have initiated sex (life_sex == 1) and are ≤19 years old.
df_analysis = df_raw[(df_raw['life_sex'] == 1) & (df_raw['age_completed'] <= 19)].copy()

# Dropna
variables = ['sex_age', 
             'will_sex_binary', 
             'do_anything_binary', 
             'under_influe_binary', 
             'person_sex_group']
df_analysis = df_analysis.dropna(subset=variables)

# Create dummy variables for partner type from 'person_sex_group'
partner_dummies = pd.get_dummies(df_analysis['person_sex_group'], prefix='partner', drop_first=True)

# Define the predictors and outcome
predictor_vars = ['sex_age', 'will_sex_binary', 'do_anything_binary', 'under_influe_binary']
X_basic = df_analysis[predictor_vars]
X = pd.concat([X_basic, partner_dummies], axis=1)
X = sm.add_constant(X).astype(float)
y = df_analysis['been_preg']

# Fit logistic regression
model = sm.Logit(y, X).fit(disp=False)
print(model.summary())

# Calculate odds ratios
odds_ratios = np.exp(model.params)
print("\nOdds Ratios:\n", odds_ratios)

                           Logit Regression Results                           
Dep. Variable:              been_preg   No. Observations:                 1630
Model:                          Logit   Df Residuals:                     1623
Method:                           MLE   Df Model:                            6
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                 0.07077
Time:                        17:02:53   Log-Likelihood:                -978.04
converged:                       True   LL-Null:                       -1052.5
Covariance Type:            nonrobust   LLR p-value:                 1.277e-29
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                   1.0841      0.511      2.119      0.034       0.082       2.087
sex_age                -0.0855      0.032     -2.714      0.007      -0.147      -0.024
will_sex_binary 

**5. Does early initiation of contraceptive use predict sustained contraceptive use, after controlling for marital status and pregnancy-related concern?**

Results:
* do_anything_binary (Early Contraceptive Use at First Sex) (OR = 3.07): Holding marital status and pregnancy concern constant, girls who reported taking any preventive action at their first sexual encounter have about 3.07 times the odds of sustained condom use compared to those who did not take any preventive action. This is a statistically significant effect (p < 0.001), suggesting that early initiation of contraceptive behavior is strongly associated with later sustained condom use.
* been_married_binary (Marital Status) (OR = 1.02): Controlling for the other factors, girls who have ever been married have only 22% of the odds of sustained condom use compared to girls who have never been married. In other words, being married is associated with a substantial decrease (about 78% lower odds) in the likelihood of consistent condom use. This result is statistically significant (p < 0.001).
* worry_preg_reverse (Pregnancy Concern) (OR = 1.02): The association between the level of pregnancy concern and sustained condom use is very weak and not statistically significant (p = 0.833). This suggests that, after accounting for the other predictors, how concerned a girl is about getting pregnant does not meaningfully predict whether she consistently uses condoms.

Interpretation:
* There is strong evidence that initiating contraceptive behavior at first sex (do_anything_binary) increases the odds of sustained condom use over a 12‑month period. Girls who took any preventive action at their first sexual encounter are more than three times as likely to report consistent condom use later.
* Being married is associated with a much lower likelihood of sustained condom use. This could reflect that marriage may involve different dynamics around fertility intentions, where pregnancy might be more acceptable or even desired.
* The level of concern about pregnancy (worry_preg_reverse) does not appear to significantly affect sustained condom use when other factors are taken into account.

In [15]:
# Subset to adolescents who have initiated sex (life_sex == 1) and are ≤19 years old.
df_model = df_raw[(df_raw['life_sex'] == 1) & (df_raw['age_completed'] <= 19)].copy()

# Create a binary variable for sustained condom use (some_times_binary): 1 if Always, 0 if not.
df_model['some_times_binary'] = np.where(df_model['some_times'] == 3, 1, 0)

# Build the design matrix
predictors = ['do_anything_binary', 'been_married_binary', 'worry_preg_reverse']
X = df_model[predictors]

# Drop rows with missing data in our predictors and the outcome variable
df_model = df_model.dropna(subset=predictors + ['some_times_binary'])

# Rebuild X and y on the cleaned DataFrame
X = sm.add_constant(df_model[predictors])
y = df_model['some_times_binary']

# Fit the logistic regression model
model = sm.Logit(y, X).fit(disp=False)
print(model.summary())

# Calculate and display odds ratios
odds_ratios = np.exp(model.params)
print("\nOdds Ratios:\n", odds_ratios)

                           Logit Regression Results                           
Dep. Variable:      some_times_binary   No. Observations:                  669
Model:                          Logit   Df Residuals:                      665
Method:                           MLE   Df Model:                            3
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                 0.06320
Time:                        15:54:12   Log-Likelihood:                -434.39
converged:                       True   LL-Null:                       -463.70
Covariance Type:            nonrobust   LLR p-value:                 1.166e-12
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -0.8610      0.353     -2.438      0.015      -1.553      -0.169
do_anything_binary      1.1210      0.229      4.906      0.000       0.673       1.569
been_married_bin

**6. Among adolescent girls who have experienced a pregnancy, what is the prevalence of induced abortion and which factors (e.g., marital status, socioeconomic status, schooling) predict the likelihood of seeking an induced abortion?**

Results:
* Intercept (const = -1.6143, OR ≈ 0.20): This is the estimated log‑odds of having had an induced abortion for a girl with the reference characteristics (i.e., not married, with high wealth, and in the reference school location [rural], and with age_preg = 0—which is not meaningful per se but serves as a baseline in the model).
* been_married_binary (coef = 0.6457, OR ≈ 1.91, p = 0.006): After controlling for the other predictors, being married is associated with a 91% increase in the odds of having had an induced abortion compared to not being married. This effect is statistically significant.
* age_preg (coef = -0.0903, OR ≈ 0.91, p = 0.153): For each additional year in age at pregnancy, the odds of having an induced abortion decline by about 9%; however, this effect is not statistically significant (p = 0.153).
* wealth_Low (coef = -0.2433, OR ≈ 0.78, p = 0.416) and wealth_Medium (coef = -0.0159, OR ≈ 0.98, p = 0.958): Neither of these indicators is statistically significant, implying that, in this model, belonging to a lower or medium wealth group (versus the high wealth reference) is not significantly associated with the odds of induced abortion.
* scol_loc_2.0 (coef = -0.3257, OR ≈ 0.72, p = 0.752): Girls in the urban category (if coded as 2) have an estimated 28% lower odds compared to those in rural areas, but this effect is not statistically significant.
* scol_loc_3.0 (coef = -23.1316, OR ≈ ~0, p = 1.000): This coefficient is extremely unstable (with an enormous standard error), likely because there are very few observations in this category (school location “other”). As a result, this estimate is unreliable and should be interpreted with caution or possibly recoded (for example, by combining with another category).

Interpretation:
* Being married is the only strong and statistically significant predictor in the model. Girls who have been married have about 1.91 times the odds of having had an induced abortion relative to those who have not been married. In the context of teenage pregnancy, this may indicate that marriage is associated with different fertility choices (or pressures) that lead to a higher likelihood of seeking an abortion.

In [17]:
# Been pregnant 10-19 (irrespective of current age)
df_preg = df_raw.loc[(df_raw['been_preg'] == 1) & (df_raw['age_preg'] <= 19)]

# Map the had_abort variable to indicate induced abortion: 1 if abortion, 0 if not.
df_preg['had_abort_binary'] = df_preg['had_abort'].map({1.0: 1, 2.0: 0})
# Fill missing values (if any) with 0 and convert to integer.
df_preg['had_abort_binary'] = df_preg['had_abort_binary'].fillna(0).astype(int)

# Create wealth dummies (reference: Low)
wealth_dummies = pd.get_dummies(df_preg['wealth_tertile'], prefix='wealth', drop_first=True)

# Create dummies for school location.
# If scol_location is coded as 1 = Rural, 2 = Urban, 3 = Other, we treat it as categorical.
scol_loc_dummies = pd.get_dummies(df_preg['scol_location'], prefix='scol_loc', drop_first=True)

# Define predictors
predictors = ['been_married_binary', 'age_preg'] + list(wealth_dummies.columns) + list(scol_loc_dummies.columns)

# Combine predictors into our model DataFrame.
df_model_abortion = pd.concat([df_preg, wealth_dummies, scol_loc_dummies], axis=1)

# Drop missing values for the variables in our predictor list plus the outcome.
df_model_abortion = df_model_abortion.dropna(subset=['had_abort_binary'] + predictors)

# Build design matrix X and outcome variable y.
X = df_model_abortion[predictors]
X = sm.add_constant(X)
X = X.astype(float)
y = df_model_abortion['had_abort_binary']

# Fit the logistic regression model.
model_abortion = sm.Logit(y, X).fit(disp=False)
print(model_abortion.summary())

# Calculate and display odds ratios.
odds_ratios = np.exp(model_abortion.params)
print("\nOdds Ratios:\n", odds_ratios)

                           Logit Regression Results                           
Dep. Variable:       had_abort_binary   No. Observations:                 1925
Model:                          Logit   Df Residuals:                     1918
Method:                           MLE   Df Model:                            6
Date:                Tue, 08 Apr 2025   Pseudo R-squ.:                 0.01460
Time:                        15:54:12   Log-Likelihood:                -409.94
converged:                      False   LL-Null:                       -416.01
Covariance Type:            nonrobust   LLR p-value:                   0.05882
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -1.6143      1.122     -1.439      0.150      -3.813       0.584
been_married_binary     0.6457      0.236      2.739      0.006       0.184       1.108
age_preg        

## Other Questions

How do the timing and context of pregnancy relate to the decision to induce an abortion, and does this vary by schooling status?

Are there differences in reproductive health knowledge and contraceptive practices between pregnant and non‑pregnant adolescents?

How do social norms and attitudes influence teenage pregnancy risk and subsequent reproductive choices?

What role do different sources of sexual and reproductive health information play in shaping knowledge and practices that affect teenage pregnancy risk?

How do misconceptions or a lack of reproductive health knowledge correlate with the occurrence of teenage pregnancy?