Building changes from commit c388f12

inferentialthinking · Jan 14, 2019 · 06870f7 · 06870f7
1 parent c388f12
commit 06870f7
Show file tree

Hide file tree

Showing 10 changed files with 28 additions and 34 deletions.
diff --git a/_build/chapters/01/1/2/statistical-techniques.md b/_build/chapters/01/1/2/statistical-techniques.md
@@ -33,6 +33,6 @@ focusing too much attention on simplistic summaries such as average values.
 Computers enable a family of methods based on resampling that apply to a wide
 range of different inference problems, take into account all available
 information, and require few assumptions or conditions. Although these
-techniques have often been reserved for graduate courses in statistics, their
+techniques have often been reserved for advanced courses in statistics, their
 flexibility and simplicity are a natural fit for data science applications.
 
diff --git a/_build/chapters/01/1/intro.md b/_build/chapters/01/1/intro.md
@@ -44,7 +44,7 @@ Applying this approach requires learning to program a computer, and so this
 text interleaves a complete introduction to programming that assumes no prior
 knowledge. Readers with programming experience will find that we cover several
 topics in computation that do not appear in a typical introductory computer
-science curriculum. Data science also requires careful reasoning about
+science curriculum. Data science also requires careful reasoning about numerical
 quantities, but this text does not assume any background in mathematics or
 statistics beyond basic algebra. You will find very few equations in this text.
 Instead, techniques are described to readers in the same language in which they

diff --git a/_build/chapters/01/2/why-data-science.md b/_build/chapters/01/2/why-data-science.md
@@ -13,7 +13,7 @@ Why Data Science?
 
 Most important decisions are made with only partial information and uncertain
 outcomes. However, the degree of uncertainty for many decisions can be reduced
-sharply by public access to large data sets and the computational tools
+sharply by access to large data sets and the computational tools
 required to analyze them effectively. Data-driven decision making has already
 transformed a tremendous breadth of industries, including finance, advertising,
 manufacturing, and real estate. At the same time, a wide range of academic
@@ -25,7 +25,7 @@ their work, their scientific endeavors, and their personal decisions. Critical
 thinking has long been a hallmark of a rigorous education, but critiques are
 often most effective when supported by data. A critical analysis of any aspect
 of the world, may it be business or social science, involves inductive
-reasoning; conclusions can rarely been proven outright, only supported by
+reasoning; conclusions can rarely been proven outright, but only supported by
 the available evidence. Data science provides the means to make precise,
 reliable, and quantitative arguments about any set of observations. With
 unprecedented access to information and computing, critical thinking about

diff --git a/_build/chapters/01/3/2/Another_Kind_Of_Character.md b/_build/chapters/01/3/2/Another_Kind_Of_Character.md
@@ -43,7 +43,7 @@ chars_periods_little_women = Table().with_columns([
 ```
 
 
-Here are the data for *Huckleberry Finn*. Each row of the table corresponds to one chapter of the novel and displays the number of characters as well as the number of periods in the chapter. Not surprisingly, chapters with fewer characters also tend to have fewer periods, in general – the shorter the chapter, the fewer sentences there tend to be, and vice versa. The relation is not entirely predictable, however, as sentences are of varying lengths and can involve other punctuation such as question marks. 
+Here are the data for *Huckleberry Finn*. Each row of the table corresponds to one chapter of the novel and displays the number of characters as well as the number of periods in the chapter. Not surprisingly, chapters with fewer characters also tend to have fewer periods, in general: the shorter the chapter, the fewer sentences there tend to be, and vice versa. The relation is not entirely predictable, however, as sentences are of varying lengths and can involve other punctuation such as question marks. 
 
 
 
@@ -159,7 +159,7 @@ chars_periods_little_women
 
 
 
-You can see that the chapters of *Little Women* are in general longer than those of *Huckleberry Finn*. Let us see if these two simple variables – the length and number of periods in each chapter – can tell us anything more about the two books. One way for us to do this is to plot both sets of data on the same axes. 
+You can see that the chapters of *Little Women* are in general longer than those of *Huckleberry Finn*. Let us see if these two simple variables – the length and number of periods in each chapter – can tell us anything more about the two books. One way to do this is to plot both sets of data on the same axes. 
 
 In the plot below, there is a dot for each chapter in each book. Blue dots correspond to *Huckleberry Finn* and gold dots to *Little Women*. The horizontal axis represents the number of periods and the vertical axis represents the number of characters.
 

diff --git a/_build/chapters/01/what-is-data-science.md b/_build/chapters/01/what-is-data-science.md
@@ -8,21 +8,20 @@ next_page:
   title: 'Introduction'
 comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***"
 ---
-What is Data Science
+What is Data Science?
 ====================
 
 Data Science is about drawing useful conclusions from large and diverse data
 sets through exploration, prediction, and inference.  Exploration involves
 identifying patterns in information.  Prediction involves using information
 we know to make informed guesses about values we wish we knew.  Inference
-involves quantifying our degree of certainty: will those patterns we found
-also appear in new observations? How accurate are our predictions? Our primary
+involves quantifying our degree of certainty: will the patterns that we found in our data also appear in new observations? How accurate are our predictions? Our primary
 tools for exploration are visualizations and descriptive statistics, for
 prediction are machine learning and optimization, and for inference are
 statistical tests and models.
 
 Statistics is a central component of data science because statistics
-studies how to make robust conclusions with incomplete information. Computing
+studies how to make robust conclusions based on incomplete information. Computing
 is a central component because programming allows us to apply analysis
 techniques to the large and diverse data sets that arise in real-world
 applications: not just numbers, but text, images, videos, and sensor readings.

diff --git a/...pters/02/1/observation-and-visualization-john-snow-and-the-broad-street-pump.md b/...pters/02/1/observation-and-visualization-john-snow-and-the-broad-street-pump.md
@@ -11,7 +11,7 @@ comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /con
 Observation and Visualization: John Snow and the Broad Street Pump
 ------------------------------------------------------------------
 
-One of the earliest examples of astute observation eventually leading to the
+One of the most powerful examples of astute observation eventually leading to the
 establishment of causality dates back more than 150 years. To get your mind into
 the right timeframe, try to imagine London in the 1850’s. It was the world’s
 wealthiest city but many of its people were desperately poor. Charles Dickens,
@@ -35,7 +35,7 @@ they were breathing the same air—and miasmas—as their neighbors, there was n
 compelling association between bad smells and the incidence of cholera.
 
 Snow had also noticed that the onset of the disease almost always involved
-vomiting and diarrhea. He therefore believed that that infection was carried by
+vomiting and diarrhea. He therefore believed that the infection was carried by
 something people ate or drank, not by the air that they breathed. His prime
 suspect was water contaminated by sewage.
 
@@ -44,9 +44,9 @@ London. As the deaths mounted, Snow recorded them diligently, using a method
 that went on to become standard in the study of how diseases spread: *he drew a
 map*. On a street map of the district, he recorded the location of each death.
 
-Here is Snow’s original map. Each black bar represents one death. The black
+Here is Snow’s original map. Each black bar represents one death. When there are multiple deaths at the same address, the bars corresponding to those deaths are stacked on top of each other. The black
 discs mark the locations of water pumps. The map displays a striking
-revelation–the deaths are roughly clustered around the Broad Street pump.
+revelation—the deaths are roughly clustered around the Broad Street pump.
 ![Snow’s Cholera Map](../../../images/snow_map.jpg)
 
 Snow studied his map carefully and investigated the apparent anomalies. All of

diff --git a/_build/chapters/02/2/snow-s-grand-experiment.md b/_build/chapters/02/2/snow-s-grand-experiment.md
@@ -12,7 +12,7 @@ Snow’s “Grand Experiment”
 -------------------------
 
 Encouraged by what he had learned in Soho, Snow completed a more thorough
-analysis of cholera deaths. For some time, he had been gathering data on cholera
+analysis. For some time, he had been gathering data on cholera
 deaths in an area of London that was served by two water companies. The Lambeth
 water company drew its water upriver from where sewage was discharged into the
 River Thames. Its water was relatively clean. But the Southwark and Vauxhall

diff --git a/_build/chapters/02/3/establishing-causality.md b/_build/chapters/02/3/establishing-causality.md
@@ -18,14 +18,13 @@ two groups were comparable to each other, apart from the treatment.
 
 In order to establish whether it was the water supply that was causing cholera,
 Snow had to compare two groups that were similar to each other in all but one
-aspect–their water supply. Only then would he be able to ascribe the differences
+aspect—their water supply. Only then would he be able to ascribe the differences
 in their outcomes to the water supply. If the two groups had been different in
 some other way as well, it would have been difficult to point the finger at the
 water supply as the source of the disease.  For example, if the treatment group
 consisted of factory workers and the control group did not, then differences
 between the outcomes in the two groups could have been due to the water supply,
-or to factory work, or both, or to any other characteristic that made the groups
-different from each other. The final picture would have been much more fuzzy.
+or to factory work, or both. The final picture would have been much more fuzzy.
 
 Snow’s brilliance lay in identifying two groups that would make his comparison
 clear. He had set out to establish a causal relation between contaminated water
@@ -48,8 +47,8 @@ diseases.
 Let us now return to more modern times, armed with an important lesson that we
 have learned along the way:
 
-In an observational study, if the treatment and control groups differ in ways
-other than the treatment, it is difficult to make conclusions about causality.
+**In an observational study, if the treatment and control groups differ in ways
+other than the treatment, it is difficult to make conclusions about causality.**
 
 An underlying difference between the two groups (other than the treatment) is
 called a *confounding factor*, because it might confound you (that is, mess you
@@ -58,10 +57,9 @@ up) when you try to reach a conclusion.
 **Example: Coffee and lung cancer.** Studies in the 1960’s showed that coffee
 drinkers had higher rates of lung cancer than those who did not drink coffee.
 Because of this, some people identified coffee as a cause of lung cancer. But
-coffee does not cause lung cancer. The analysis contained a confounding factor –
-smoking. In those days, coffee drinkers were also likely to have been smokers,
+coffee does not cause lung cancer. The analysis contained a confounding factor—smoking. In those days, coffee drinkers were also likely to have been smokers,
 and smoking does cause lung cancer. Coffee drinking was associated with lung
 cancer, but it did not cause the disease.
 
 Confounding factors are common in observational studies. Good studies take great
-care to reduce confounding.
+care to reduce confounding and to account for its effects.
diff --git a/_build/chapters/02/5/endnote.md b/_build/chapters/02/5/endnote.md
@@ -11,7 +11,7 @@ comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /con
 Endnote
 -------
 
-In the terminology of that we have developed, John Snow conducted an
+In the terminology that we have developed, John Snow conducted an
 observational study, not a randomized experiment. But he called his study a
 “grand experiment” because, as he wrote, “No fewer than three hundred thousand
 people … were divided into two groups without their choice, and in most cases,
@@ -26,7 +26,7 @@ quite a bit more complex. But every method of randomization consists of a
 sequence of carefully defined steps that allow chances to be specified
 mathematically. This has two important consequences.
 
-1. It allows us to account–mathematically–for the possibility that randomization
+1. It allows us to account—mathematically—for the possibility that randomization
    produces treatment and control groups that are quite different from each
    other.
 
@@ -37,10 +37,9 @@ mathematically. This has two important consequences.
 
 In this course, you will learn how to conduct and analyze your own randomized
 experiments. That will involve more detail than has been presented in this
-section. For now, just focus on the main idea: to try to establish causality,
+chapter. For now, just focus on the main idea: to try to establish causality,
 run a randomized controlled experiment if possible. If you are conducting an
-observational study, you might be able to establish association but not
-causation. Be extremely careful about confounding factors before making
+observational study, you might be able to establish association but it will be harder to establish causation. Be extremely careful about confounding factors before making
 conclusions about causality based on an observational study.
 
 **Terminology**
@@ -76,8 +75,8 @@ conclusions about causality based on an observational study.
    proof such as would be admitted in any scientific enquiry that there is any
    such thing as contagion.”
 
-3. A later RCT established that the conditions on which PROGRESA insisted –
-   children going to school, preventive health care – were not necessary to
+3. A later RCT established that the conditions on which PROGRESA insisted—children
+   going to school, preventive health care—were not necessary to
    achieve increased enrollment. Just the financial boost of the welfare
    payments was sufficient.
 
@@ -90,7 +89,6 @@ published by our own University of California Press, reads like a whodunit. It
 was one of the main sources for this section's account of John Snow and his
 work. A word of warning: some of the contents of the book are stomach-churning.
 
-[*Poor Economics*](http://www.pooreconomics.com), the best seller by Abhijit V.
-Banerjee and Esther Duflo of MIT, is an accessible and lively account of ways to
+[*Poor Economics*](http://www.pooreconomics.com), the best seller by Abhijit Banerjee and Esther Duflo of MIT, is an accessible and lively account of ways to
 fight global poverty. It includes numerous examples of RCTs, including the
 PROGRESA example in this section.
diff --git a/_build/chapters/02/causality-and-experiments.md b/_build/chapters/02/causality-and-experiments.md
@@ -32,8 +32,7 @@ group of individuals, a factor of interest called a *treatment*, and an
 
 It is easiest to think of the individuals as people. In a study of whether
 chocolate is good for the health, the individuals would indeed be people, the
-treatment would be eating chocolate, and the outcome might be a measure of blood
-pressure. But individuals in observational studies need not be people. In a
+treatment would be eating chocolate, and the outcome might be a measure of heart disease. But individuals in observational studies need not be people. In a
 study of whether the death penalty has a deterrent effect, the individuals could
 be the 50 states of the union. A state law allowing the death penalty would be
 the treatment, and an outcome could be the state’s murder rate.