# FORMATIVE ASSESSMENT OF ADOLESCENT GIRLS AND YOUNG WOMEN’S HIV, GENDER-BASED VIOLENCE AND SEXUAL AND REPRODUCTIVE HEALTH STATUS

## Background
Teenage pregnancy and motherhood have been a major health and social concern in Uganda as it infringes upon the human rights of girls but also hinders their ability to achieve their full socioeconomic development. Teenagers who engage in sexual intercourse at a young age face an elevated risk of becoming pregnant and giving birth. The 2022 UDHS indicated that 23.5% of women age 15-19 had initiated childbearing by the time of the survey, with 18.4% having already had a live birth, while 5.1% were pregnant with their first child.

Patterns by background characteristics:
* By age 16, 1 in every 10 women age 15-19 has begun childbearing. This percentage significantly rises to almost 4 out of every 10 by the time they reach 18 (Table 5.12).
* Teenagers in rural areas started childbearing earlier than those in urban areas. Twenty five percent of women age 15-19 in rural areas have begun childbearing, compared with 21% in urban areas.
* Teenage childbearing varies by region. The percentage of women age 15-19 who have begun childbearing ranges from 15% in Kigezi region to 28 % -30% in Busoga and Bukedi sub regions.
* The proportion of women age 15-19 who have begun childbearing decreases with both education and wealth.

Regions: The selection of the districts that we surveyed was informed by HIV prevalence dynamics and implementing partner support: we went to districts where there were Global Fund-supported implementing partners working to reduce the new number of new HIV infections among AGYW, improve SRH (e.g. reduce teenage pregnancy) and GBV indicators in the targeted districts.

## Data Analysis

The output of this notebook includes a data analysis responding to the research questions.

### Data Loading

In [47]:
# Libraries
import warnings
import os
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder
from spicy import stats
from scipy.stats import zscore
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm
from sklearn.metrics import r2_score

# Set-up environment
pd.options.display.float_format = '{:.2f}'.format
pd.set_option('display.max_colwidth', None)
sns.set_theme(style="whitegrid", context="paper")
os.chdir('/Users/nataschajademinnitt/Documents/5. Data Analysis/teenage_pregnancy')
print("Current directory:", os.getcwd())
warnings.filterwarnings("ignore")

Current directory: /Users/nataschajademinnitt/Documents/5. Data Analysis/teenage_pregnancy


In [172]:
# Load the data
df_raw = pd.read_csv("./data/processed_df.csv")

In [174]:
# BETWEEN GROUP
# Never been pregnant (currently older than 19) | Been pregnant 10-19 (irrespective of current age)
df = df_raw.loc[
    ( (df_raw['been_preg'] == 1) & (df_raw['age_preg'] <= 19) )
    |
    ( (df_raw['been_preg'] == 0) & (df_raw['age_completed'] >= 20) )
]

df['been_preg'].value_counts()

been_preg
1    1925
0    1513
Name: count, dtype: int64

In [176]:
# BETWEEN GROUPS
# Not been pregnant (currently 10-19) | Been pregnant (currently 10-19)
df_ado = df_raw.loc[(df_raw['age_completed'] <= 19)]
df_ado.been_preg.value_counts()

been_preg
0    4216
1     629
Name: count, dtype: int64

In [178]:
# WITHIN GROUP
# Been pregnant 10-19 (irrespective of current age)
df_preg = df_raw.loc[(df_raw['been_preg'] == 1) & (df_raw['age_preg'] <= 19)]
df_preg.been_preg.value_counts()

been_preg
1    1925
Name: count, dtype: int64

## Research Questions

### Socio-Demographic and Educational Factors

**1. How does household wealth predict the likelihood of teenage pregnancy?**

Sample: Between group (been pregnant = 1,925 | Not been pregnant = 1,513)

Interpretation:
* Girls in the Medium wealth group have odds of teenage pregnancy that are 44% of the odds for girls in the Low wealth group. This implies a 56% reduction in odds compared to the reference group (1 - 0.44 = 0.56).
* Girls in the High wealth group have odds of teenage pregnancy that are only 11% of those for girls in the Low wealth. This implies a 89% reduction in odds compared to the reference group (1 - 0.11 = 0.89).
* This model clearly shows a gradient in risk: as wealth increases, the odds of teenage pregnancy decrease significantly.

Controls:
* attend_scol: The control adds little and can cause numerical issues as 97% attend school.
* pre_preg_marriage: Given that almost all girls who have been married tend to experience teenage pregnancy afterwards, using the marriage variable to predict pregnancy across groups can lead to separation issues and isn’t as informative.
* hh_vul: Household vulnerability was not a significant predictor could be due to the way it was computed.

In [150]:
# Create dummies for wealth tertile with 'Low' as reference
df['wealth_tertile'] = pd.Categorical(
    df['wealth_tertile'],
    categories=['Low', 'Medium', 'High'],
    ordered=True
)
# Create wealth dummies from the original df
wealth_dummies = pd.get_dummies(df['wealth_tertile'], prefix='wealth', drop_first=True)

# Concatenate dummies to df
df_model_cat = pd.concat([df, wealth_dummies], axis=1)

# Design matrix using the dummy column names
X_cat = df_model_cat[wealth_dummies.columns]
X_cat = sm.add_constant(X_cat)
X_cat = X_cat.astype(float)
y_cat = df_model_cat['been_preg']

# Fit the logistic regression
model_cat = sm.Logit(y_cat, X_cat).fit(disp=False)
print(model_cat.summary())

# Convert coefficients to odds ratios
or_cat = np.exp(model_cat.params)
print("Odds Ratios (categorical):\n", or_cat)

                           Logit Regression Results                           
Dep. Variable:              been_preg   No. Observations:                 3438
Model:                          Logit   Df Residuals:                     3435
Method:                           MLE   Df Model:                            2
Date:                Mon, 07 Apr 2025   Pseudo R-squ.:                  0.1326
Time:                        17:11:20   Log-Likelihood:                -2045.6
converged:                       True   LL-Null:                       -2358.3
Covariance Type:            nonrobust   LLR p-value:                1.543e-136
                    coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
const             1.2050      0.067     17.938      0.000       1.073       1.337
wealth_Medium    -0.7856      0.090     -8.682      0.000      -0.963      -0.608
wealth_High      -2.2290      0.097    -

**2. Does pregnancy increase dropout risk among all adolescents?**

As non‑pregnant comparison group is limited to women aged ≥ 20, they’ve had more time to complete schooling, so the “out_completed” category will be over‑represented simply by virtue of age. That makes between‑group comparisons on school completion misleading.

Sample: Between group (been pregnant = 629 | Not been pregnant = 4,216)

Controls:
* Wealth: Richer girls may be both less likely to become pregnant and less likely to drop out, so wealth is a confounder.
* Marriage: Married adolescents have different dropout dynamics (e.g., spousal support) and different pregnancy risk.

Results:
* Adolescents who experienced a teenage pregnancy have 46% higher odds of dropping out, compared to their non‑pregnant peers of the same wealth and marital status (p = 0.016).

Interpretation:
* Because all other terms are held constant in the model the odds ratio for been_preg reflects the association between pregnancy and dropout independent of differences in wealth and marriage.

In [198]:
# Wealth tertiles
wealth_dummies = pd.get_dummies(
    df_ado['wealth_tertile'], prefix='wealth', drop_first=True
)

# 4. Build design matrix
X = pd.concat([
    df_ado[['been_preg', 'been_married_binary']],
    wealth_dummies
], axis=1)
X = sm.add_constant(X).astype(float)

y = df_ado['dropped_out']

# 5. Fit adjusted logistic regression
model_adj = sm.Logit(y, X).fit(disp=False)

# 6. Results
print(model_adj.summary())

or_adj = np.exp(model_adj.params)
print("\nAdjusted Odds Ratios:\n", or_adj)

                           Logit Regression Results                           
Dep. Variable:            dropped_out   No. Observations:                 4845
Model:                          Logit   Df Residuals:                     4840
Method:                           MLE   Df Model:                            4
Date:                Mon, 07 Apr 2025   Pseudo R-squ.:                 0.05202
Time:                        17:34:34   Log-Likelihood:                -1498.3
converged:                       True   LL-Null:                       -1580.5
Covariance Type:            nonrobust   LLR p-value:                 1.638e-34
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -3.2188      0.128    -25.185      0.000      -3.469      -2.968
been_preg               0.3751      0.155      2.416      0.016       0.071       0.679
been_married_bin

**3. Among girls who experienced a teenage pregnancy, what factors predict who drops out versus completes (or stays in) school?**

Isolate Protective vs. Risk Pathways:
* Before: Marriage may provide support (financial, social) that helps pregnant girls stay in school.
* After: Marriage as a response to pregnancy may not confer the same protective benefits.
* Never: Girls without spousal support may be most vulnerable to dropping out.

Sample: Within group (been pregnant = 1,925)

Results:
* married_after (OR = 0.30): Girls who married after their teenage pregnancy have 70% lower odds of dropping out compared to girls who were never married at the time of their pregnancy.
* married_before (OR = 0.51): Girls who married before their teenage pregnancy have 49% lower odds of dropping out compared to never‑married pregnant girls.
* Low wealth (OR = 2.85): Low‑wealth pregnant girls have nearly 3 times the odds of dropping out compared to high‑wealth peers.
* Medium wealth (OR = 2.32): Medium‑wealth pregnant girls have over twice the odds of dropping out compared to high‑wealth peers.

Interpretation:
* Marriage Provides Protection: Both pre‑ and post‑pregnancy marriage are associated with substantially reduced dropout risk relative to never‑married pregnant girls.
* Greater Protection for Post‑Pregnancy Marriage: The stronger effect for “married_after” (70% reduction) suggests that securing a marital partnership after becoming pregnant may offer critical support—financial, emotional, or social—that helps girls stay in or return to school.
* Wealth Remains Crucial: Even among pregnant girls, those from poorer households are far more likely to drop out, underscoring the intersection of economic and marital support.

In [188]:
# Create dummies for marriage timing with never as reference
timing_dummies = pd.get_dummies(df_preg['marriage_timing'], prefix='married', drop_first=False)

# Drop the “married_never” column to make it the reference
timing_dummies = timing_dummies.drop(columns=['married_never'])

# Combine with wealth dummies
wealth_dummies = pd.get_dummies(df_preg['wealth_tertile'], prefix='wealth', drop_first=True)

X = pd.concat([timing_dummies, wealth_dummies], axis=1)
X = sm.add_constant(X).astype(float)
y = df_preg['dropped_out']

model_ref = sm.Logit(y, X).fit(disp=False)
print(model_ref.summary())

# Odds ratios
or_ref = np.exp(model_ref.params)
print("\nOdds Ratios (reference = never married):\n", or_ref)

                           Logit Regression Results                           
Dep. Variable:            dropped_out   No. Observations:                 1925
Model:                          Logit   Df Residuals:                     1920
Method:                           MLE   Df Model:                            4
Date:                Mon, 07 Apr 2025   Pseudo R-squ.:                 0.03252
Time:                        17:24:13   Log-Likelihood:                -649.89
converged:                       True   LL-Null:                       -671.73
Covariance Type:            nonrobust   LLR p-value:                 7.454e-09
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const             -2.5551      0.270     -9.447      0.000      -3.085      -2.025
married_after     -1.1945      0.313     -3.817      0.000      -1.808      -0.581
married_before    -0.6752      0.155

### Sexual Behavior and Contraceptive Use

**4. Does the age at first sexual intercourse and the context of that encounter influence the risk of teenage pregnancy?**

Revised subsetting:
* Between‑groups: Compare girls who experienced pregnancy (n = 1,295) with girls who have fully passed through the risk period (n = 1,514).
* Rationale: Using girls older than 19 in the non‑pregnant group gives a full account of sexual behavior and risk exposure without the censoring of younger non‑pregnant girls.

**5. How does early initiation of contraceptive use predict sustained use and lower rates of teenage pregnancy?**

Dual approach:
* For contraceptive behavior: Analyze within the subset of sexually active girls regardless of current age, tracking those who used contraception at first sex versus those who did not.
* For pregnancy risk: Compare the contraceptive patterns of the 1,295 girls with teenage pregnancy to the 1,514 girls who are older than 19 and never experienced pregnancy.
* Rationale: This approach allows you to assess both the immediate impact of early contraceptive use and its association with having avoided pregnancy over the full risk period.

**6. Are there differences in reproductive health knowledge and contraceptive practices between pregnant and non‑pregnant adolescents?**

Revised subsetting:
* Between‑groups: Again, compare the 1,295 girls (pregnancy event during 10–19) with the 1,514 girls older than 19 who have never been pregnant.
* Rationale: This contrast ensures that non‑pregnant girls have had the full window of exposure, which makes differences in knowledge and practices more interpretable.

### Marital Status and Social Norms

**7. Does marriage or a consensual union mediate the relationship between teenage pregnancy and school dropout?**

Revised subsetting:
* Within‑group: Focus on the 1,295 girls who experienced teenage pregnancy, using retrospective data on marital status and schooling at the time of the event.
* Rationale: Since marital status can change over time, using the pregnant subgroup helps clarify the temporal ordering and mediating role of marriage.

**8. How do social norms and attitudes influence teenage pregnancy risk and subsequent reproductive choices?**

Revised subsetting:
* Between‑groups: Compare attitudes among the 1,295 girls (pregnant during 10–19) with those of the 1,514 girls (non‑pregnant, aged >19).
* Rationale: This comparison leverages the complete exposure period for the non‑pregnant group, allowing you to assess whether certain attitudes correlate with having experienced pregnancy.

### Pregnancy Outcomes and Abortion Practices

**9. Among those who experienced teenage pregnancy, what is the prevalence of induced abortion, and what factors predict the likelihood of seeking an abortion?**

Subsetting remains:
* Within‑group: Focus on the 1,295 girls who experienced pregnancy between 10–19.
* Rationale: This focused subgroup allows for detailed analysis of pregnancy outcomes, including abortion practices.

**10. How do the timing and context of pregnancy relate to the decision to induce an abortion, and does this vary by schooling status?**

Subsetting remains:
* Within‑group: Analyze within the 1,295 girls who experienced teenage pregnancy.
* Rationale: This allows you to assess the interplay of timing, contextual factors (e.g., age at pregnancy, schooling status), and abortion decisions without additional confounding from non‑pregnant girls.

### Information Sources and Health Knowledge
**11. What role do different sources of sexual and reproductive health information play in shaping knowledge and practices that affect teenage pregnancy risk?**

Revised subsetting:
* Between‑groups: Compare the 1,295 girls (pregnant during 10–19) with the 1,514 girls (non‑pregnant, aged >19).
* Rationale: Using the full risk window for the non‑pregnant group provides a more definitive comparison of the influence of information sources.

**12. How do misconceptions or a lack of reproductive health knowledge correlate with the occurrence of teenage pregnancy?**

Revised subsetting:
* Between‑groups: Compare the 1,295 girls with teenage pregnancy to the 1,514 older, non‑pregnant girls.
* Rationale: This approach minimizes the potential misclassification of non‑pregnant girls who are still at risk, as the older group has already passed through the adolescent risk window.