# Mothers’ Lifestyle Characteristics Impact on Infant Birth Weight

Team B1: Ayingfu (#82951880), Bryant Hartono (#66162710), Jeremy Davies (#74883935), Jingxuan Ma (#49183288)

## 1. Introduction

#### 1.1 Data Source

The dataset for this project is the “birthwt” dataset provided with the Support Functions and Datasets for Venables and Ripley's MASS package in R.

Relevant citation for the MASS package:  
Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, https://www.stats.ox.ac.uk/pub/MASS4/.

#### 1.2 Dataset Description

**Dataset detail**  
The dataset consists of 189 observations with 10 columns (refer to column details explained below). Data was collected at Baystate Medical Center, Springfield, Massachusetts during 1986, with no specific setting. Study subjects were 189 mothers, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Characteristics of the mother’s lifestyle and health factors were collected during the pregnancy and the weight of the newborn infant was recorded at birth.

**Response variable**  
Neonate birth weight (‘bwt’), measured in grams.

**Potential explanatory variables**  
1) Continuous variables  
- age: Mother’s age (years)
- lwt: Mother's weight (pounds) at last menstrual period
- ptl: Number of previous premature labors
- ftv: Number of physician visits during the first trimester  
  
2) Categorical variables  
- race: Mother's race ( “1” = white, “2” = black, “3” = other) 
- smoke: Smoking status during the pregnancy. (“1”=Yes,”0”= No)
- ht: History of hypertension. (“1”=Yes,”0”= No)
- ui: Presence of uterine irritability. (“1”=Yes,”0”= No)

#### 1.3 Research Question / Motivation

There is some evidence (and some speculation) to suggest that heavier babies have lower infant mortality rates and perhaps also lower risk of cardiovascular disease later in life. Therefore, our group is interested in studying the factors that explain an infant’s birth weight. In particular, we are interested in whether the weight of the mother, the mother’s race, the smoking status of the mother, among other variables, affect the infant birth weight.  
  
If we can identify the contributing factors and reasonably predict a baby’s neonate birth weight, this would enable better preparation of healthcare at birth. For example, if we develop a model that predicts a baby will have a low birth weight, then the obstetrician could recommend for the mother to have the birth at a dedicated neonatal ward, where there are more specialized resources to ensure a healthy birth.

## 2. Analysis

#### 2.1 Reading in data

In [1]:
# Necessary libraries to install:
# - 'tidyverse', 'cowplot' , 'tidymodels', 'repr', 'MASS', 'leaps'

# Load in libaries
library(tidyverse)
library(cowplot)
library(tidymodels)
library(repr)
library(MASS)

# Suppress table outputs to a manageable # of rows
options(repr.matrix.max.rows = 10)

# Read data
data <- birthwt

-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.3     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.0     [32mv[39m [34mdplyr  [39m 1.0.5
[32mv[39m [34mtidyr  [39m 1.1.3     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 1.4.0     [32mv[39m [34mforcats[39m 0.5.1

-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

-- [1mAttaching packages[22m ------------------------------------------------------------------------------ tidymodels 0.1.2 --

[32mv[39m [34mbroom    [39m 0.7.6      [32mv[39m [34mrecipes  [39m 0.1.15
[32mv[39m [34mdials    [39m 0.0.9      [32mv[39m [34mrsa

#### 2.2 Clean and wrangle data

In [6]:
# Select only potentially meaningful columns (variables) and re-name categorical levels for easier understanding
data_selected <- data %>%
    dplyr::select(age, lwt, ptl, ftv, race, smoke, ht, ui, bwt) %>%
    mutate(race = case_when(
      race == 1  ~ "White",
      race == 2  ~ "Black",
      TRUE       ~ "Other"
    ), 
    smoke = case_when(smoke == 0  ~ "No", smoke == 1  ~ "Yes"), 
    ht = case_when(ht == 0  ~ "No", ht == 1  ~ "Yes"),
    ui = case_when(ui == 0  ~ "No", ui == 1  ~ "Yes")
)

# Convert categorical variables to factors in R
data_selected$race <- as.factor(data_selected$race)
data_selected$smoke <- as.factor(data_selected$smoke)
data_selected$ht <- as.factor(data_selected$ht)
data_selected$ui <- as.factor(data_selected$ui)

#  relevel race based on highest observation count
data_selected$race <- relevel(data_selected$race, ref="White")  

# Inspect clean dataframe
data_selected

Unnamed: 0_level_0,age,lwt,ptl,ftv,race,smoke,ht,ui,bwt
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<fct>,<fct>,<fct>,<fct>,<int>
85,19,182,0,0,Black,No,No,Yes,2523
86,33,155,0,3,Other,No,No,No,2551
87,20,105,0,1,White,Yes,No,No,2557
88,21,108,0,2,White,Yes,No,Yes,2594
89,18,107,0,0,White,Yes,No,Yes,2600
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
79,28,95,0,2,White,Yes,No,No,2466
81,14,100,0,2,Other,No,No,No,2495
82,23,94,0,0,Other,Yes,No,No,2495
83,17,142,0,0,Black,No,Yes,No,2495


## 3. Conclusion