<a href="https://colab.research.google.com/github/matthewshawnkehoe/Data-Analysis/blob/main/births_and_smoking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Does smoking cause low birth weight?

We are working for a clinic for healthy mothers and have been tasked to see if smoking causes low birth weight. We would also like to identify other factors which cause low birth weight.

![](https://www.verywellfamily.com/thmb/xVJtWvUQ4OG6Zo688kUEUc-chto=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/baby-birth-weight-statistics-9b84ab741b2e4eefbdc4f399037558f9.jpg)

Image Source: [Physical Growth in Newborns](https://www.uofmhealth.org/health-library/te6295)

## Table of Contents
---
- [Birth Weight Analysis: Summary](#Birth-Weight-Analysis:-Summary)
- [Import Libraries and Configure Notebook Environment](#Import-Libraries-and-Configure-Notebook-Environment)
- [Understanding the Data](#Understanding-the-Data)
- [Hypothesis](#Hypothesis)
- [Conclusions](#Conclusions)

## Birth Weight Analysis: Summary

We set out to identify if smoking causes low birth weight.

There is a wide range of negative health outcomes associated with low birth weight. For example, it is closely associated with fetal and neonatal mortality and morbidity, inhibited growth and cognitive development as well as noncommunicable diseases in adulthood. Infants with low birth weight are about 20 times more likely to die than those with a higher birth weight. The clinic hopes to inform prospective mothers about information which could help mothers have children with healthy birth weights.

I am interested in understanding what causes low birth weight as I was born premature and weighed approximately 3.0 pounds.

**Our analysis tell us that ... to be filled in more completely after finishing analysis.**

## Import Libraries and Configure Notebook Environment

In [2]:
# Load libraries and configure environment
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Understanding the Data

To avoid spending money on organizing a survey, we'll first try to make use of existing data to determine whether we can reach any reliable result.

One good candidate for our purpose is [Stats Lab](https://www.stat.berkeley.edu/~statlabs/labs.html) birth weight data. The survey data is publicly available in [this GitHub repository](https://github.com/data-8/textbook). Below, we'll do a quick exploration of the `baby.csv` file stored in the `assets/data` folder of the repository just mentioned. We'll read in the file using the direct link [here](https://github.com/data-8/textbook/tree/main/assets/data/baby.csv).

In [5]:
births = pd.read_csv("baby.csv")

# Convert column headers to lower case for ease of coding
births.columns = births.columns.str.lower()

# Inspect data
print(births.shape)

births.info()
births.head()

FileNotFoundError: ignored

## Hypothesis

As discussed above, we are interested in seeing if smoking impacts birth weight.

**H₀:** In the population, the birth weight distribution is the same whether the mother smokes or not. Smoking does not impact birth weight.

**Hₐ:**  In the population, smokers' babies have a lower birth weight on average than those of non-smokers.

## 3. Data

The data comes from [Stats Lab](https://www.stat.berkeley.edu/~statlabs/labs.html) and the the textbook [Computational and Inferential Thinking](https://github.com/data-8/textbook). The dataset contains 1,175 observations and 6 variables. More specifically,
* `birth weight`: represents the baby's birth weight in ounces
* `gestational days`: represents the number of gestational days
* `maternal age`: represents the mother's age in years
* `maternal height`: represents the mother's height in inches
* `maternal pregnancy weight`: represents the mother's pregnancy weight in pounds
* `maternal smoker`: represents whether or not the mother smoked during pregnancy

In [None]:
# Set up environment
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

births = pd.read_csv("baby.csv")

# Convert column headers to lower case for ease of coding
births.columns = births.columns.str.lower()

# Inspect data
print(births.shape)

births.info()
births.head()