###### Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2017 L.A. Barba, N.C. Clementi

# Lead in Lipstick

After completing [Lesson 1](http://go.gwu.edu/engcomp2lesson2) and [Lesson 2](http://go.gwu.edu/engcomp2lesson2) of "Take off with stats," Module 2 of our course in _Engineering Computations_, here we'll work out a full example of what you can do with all that you've learned.

This example is based on the lecture by Prof. Kristin Sainani at Stanford, ["Exploring real data: lead in lipstick"](https://youtu.be/nlKIT-_b2jU), of her online course ["Statistics in Medicine,"](https://lagunita.stanford.edu/courses/Medicine/MedStats-SP/SelfPaced/about). We followed along her narration, searched online for the sources she cited and the data from the FDA studies, and worked out the descriptive statistics using Python. We hope you'll enjoy it!

## In the news

In 2007, some alarming reports appeared in the media: a US consumer-rights group had tested 33 brand-name lipsticks, and found that 61% had detectable lead levels of 0.03 to 0.65 parts per million (ppm). A full one-third of the lipsticks tested exceeded the lead level set by the US Food and Drug Administration (FDA) as the limit for candy: 0.1 ppm. Here are some media reports:

* Reuters published on Oct. 12, 2007: [Lipsticks contain lead, consumer group says](https://www.reuters.com/article/us-lipstick-lead/lipsticks-contain-lead-consumer-group-says-idUSN1140964520071012)—it quotes a doctor as saying: "Lead builds up in the body over time and lead-containing lipstick applied several times a day, every day, can add up to significant exposure levels."
* CTV.ca News published [FDA to examine claim of lead levels in lipstick](http://www.ctvnews.ca/fda-to-examine-claim-of-lead-levels-in-lipstick-1.259946)—it quoted one member of the Campaign for Safe Cosmetics as saying: "We want the companies to immediately re-formulate their products to get the lead out and ultimately, really we need to change the laws and force these companies to be accountable to women's health."
* The New York Times was more measured in [The Claim: Some Red Lipstick Brands Contain High Lead Levels](http://www.nytimes.com/2007/11/13/health/13real.html) (Nov. 13, 2007), concluding: "Studies have found that lead in lipstick is not a cause for concern, but research is continuing."

The FDA did carry out new studies in 2009 and 2012 to try to determine if lead content was a concern for lipstick users. These new studies generated some new scary headlines!

* On the Washington Post: [400 lipsticks found to contain lead, FDA says](https://www.washingtonpost.com/business/economy/400-lipstick-brands-contain-lead-fda-says/2012/02/14/gIQAhOyeDR_story.html?utm_term=.e3622592e0e7)—the FDA is quoted as stating "We do not consider the lead levels we found in the lipsticks to be a safety concern…"
* In ime Magazine: [What’s in Your Lipstick? FDA Finds Lead in 400 Shades](http://healthland.time.com/2012/02/15/whats-in-your-lipstick-fda-finds-lead-in-400-shades/)—where a campaigner is quoted as saying: "We want to see the FDA recommend a limit based on the lowest level a company can achieve, like candy manufacturers are required."

## The FDA studies

In [None]:
import numpy
import pandas
from matplotlib import pyplot
%matplotlib inline

#Import rcParams to set font styles
from matplotlib import rcParams

#Set font style and size 
rcParams['font.family'] = 'serif'
rcParams['font.size'] = 16

In [None]:
# Load the FDA 2009 data set using pandas, and assign it to a dataframe
leadlips2009 = pandas.read_csv("../../data/FDA2009-lipstickdata.csv")

In [None]:
leadlips2009[0:5]

In [None]:
leadlips2009.hist(column='Pb ppm', bins=4, edgecolor='white');

In [None]:
lead2009 = leadlips2009['Pb ppm'].values

In [None]:
pyplot.figure(figsize=(6,4))
pyplot.hist(lead2009, bins=4, color='#3498db', histtype='bar', edgecolor='white') 
pyplot.title('Lead levels in lipstick, n=22 (2009) \n')
pyplot.xlabel('ppm')
pyplot.ylabel('Count');

In [None]:
print('The mean value is {:.2f}'.format(leadlips2009['Pb ppm'].mean()))
print('The median is {:.2f}'.format(leadlips2009['Pb ppm'].median()))
print('The standard deviation is {:.2f}'.format(leadlips2009['Pb ppm'].std()))
print('The maximum value is {:.2f}'.format(leadlips2009['Pb ppm'].max()))

In [None]:
print('The 99 percentile is {:.2f}'.format(leadlips2009['Pb ppm'].quantile(.99)))
print('The 95 percentile is {:.2f}'.format(leadlips2009['Pb ppm'].quantile(.95)))
print('The 90 percentile is {:.2f}'.format(leadlips2009['Pb ppm'].quantile(.90)))
print('The 75 percentile is {:.2f}'.format(leadlips2009['Pb ppm'].quantile(.75)))

In [None]:
# Load the FDA 2012 data set using pandas, and assign it to a dataframe
leadlips2012 = pandas.read_csv("../../data/FDA2012-lipstickdata.csv")

In [None]:
leadlips2012[0:5]

In [None]:
leadlips2012.hist(column='Lead (ppm)', bins=10, edgecolor='white');

In [None]:
print('The mean value is {:.2f}'.format(leadlips2012['Lead (ppm)'].mean()))
print('The median is {:.2f}'.format(leadlips2012['Lead (ppm)'].median()))
print('The standard deviation is {:.2f}'.format(leadlips2012['Lead (ppm)'].std()))
print('The maximum value is {:.2f}'.format(leadlips2012['Lead (ppm)'].max()))

The mean value, median, and standard deviation did not change much between the 2009 and 2012 studies, even though the earlier study only tested 22 samples. As Prof. Sainani points out, this goes to show that you can begin to describe a feature even with modest sample sizes.

The maximum value in the second study was a lot higher: 7.19 compared to 3.06. The reason for seeing this higher maximum value in the later study is that, for a _right skewed_ distribution like this one, there are infrequent occurrences of a higher concentration of lead. These start to be detected with larger sample sizes.

In [None]:
print('The 99 percentile is {:.2f}'.format(leadlips2012['Lead (ppm)'].quantile(.99)))
print('The 95 percentile is {:.2f}'.format(leadlips2012['Lead (ppm)'].quantile(.95)))
print('The 90 percentile is {:.2f}'.format(leadlips2012['Lead (ppm)'].quantile(.90)))
print('The 75 percentile is {:.2f}'.format(leadlips2012['Lead (ppm)'].quantile(.75)))

In [None]:
leadlips2012.boxplot(column='Lead (ppm)', figsize=(6,8));

The box plot also indicates a right skewed distribution, and shows a number of outliers on the high end of the range: some lipsticks have an especially high level of lead.

## Lipstick exposure

## References

1. [Limiting Lead in Lipstick and Other Cosmetics](https://www.fda.gov/cosmetics/productsingredients/products/ucm137224.htm#reference1), US Food and Drug Administration.
2. European consumer exposure to cosmetic products, a framework for conducting population exposure assessments  (2007). Hall, B., et al., _Food and Chemical Toxicology_ **45**(11): 2097-2108. [Available on PubMed.](https://www.ncbi.nlm.nih.gov/pubmed/17683841)

### Recommended viewing

This lesson was based on the followign lecture from ["Statistics in Medicine,"](https://lagunita.stanford.edu/courses/Medicine/MedStats-SP/SelfPaced/about), a free course in Stanford Online by Prof. Kristin Sainani:
* [Exploring real data: lead in lipstick](https://youtu.be/nlKIT-_b2jU)

In [None]:
# Execute this cell to load the notebook's style sheet, then ignore it
from IPython.core.display import HTML
css_file = '../../style/custom.css'
HTML(open(css_file, "r").read())