# 7. Sampling, Instruments, and Bias
*Module: Experimental Design (Sprint 1 of 2)*

*Experiments and the scientific method are at the heart of how we “know” what we know when it comes to data analysis. But how does it translate to the different situations we encounter in practice and what are some common pitfalls to be aware of?*

|Data Journalist| Data Engineer | Statistical Modeler| Business Analyst |
|:----------------:|:----:|:------------------:|:----:|
|I need to **understand the different ways to study sample populations and the potential biases introduced** so that I can assess the value of published research into my investigations|I need to be able to **implement valid sampling and collection procedures** for data at all scales so that I can support analyses without inadvertently introducing bias|I need to **understand how sampling and instruments introduce bias** so that I can design analyses that account for them|I need to **design effective data collection instruments** so that I can answer critical questions for my business|

## Analytical Process Big Picture
![Curriculum Summary](../curriculum_summary.png)

## When we formulate good questions...
- we need to know what a questions can be answered by what evidence
- what kind of evidence we can gather
- how we gather evidence
- what can our collection tell us about how we interpret the answer




## Key Questions
- How do we interpret evidence?
- What is the goal of data analysis?
- What is the goal of an experiment? How is an experiment different from simply observing nature?
- How do we establish causality?
- How do the things we measure relate to the things we want to know?
- How are surveys like sensors?
- Why is a random sample important, what abilities does it afford us?
- How do know that a sample represents a population?
- How do we guarantee a random sample and what do we do if we can't?

## Key Concepts and Definitions
- scientific method
- analytical method
- experimental study
- observational study
- evidence
- experimental evidence
- anecdotal evidence
- observational evidence
- causality
- representative sample
- outcome / dependent variable
- explantory / independent / predictor variable
- simple random sampling
- stratified sampling
- cluster sampling
- systematic sampling
- convenience sample
- voluntary response sample
- central limit theorem
- scale / response formats
- likert scale
- multiple response scale
- nominal variable
- ordinal variable
- ranking
- research bias
- response bias




## Themes of this Sprint
- Evidence / Experiments / Observation
- Causality and Relationships
- Variables
- Instrument
- Survey Design
- Samples / CLT


# Knowledge

This is the module that connects this exercise in analysis with knowledge. Broadly. How do we know anything. What do we know. What role does the analytical process and data and evidence play in that

From Experimental Design PDF
>"Experimental Design and Statistical Analysis go hand in hand, and neither can be understood without the other." 


>"I need to say a few things about the difficulties of learning about experi- mental design and analysis. A practical working knowledge requires understanding many concepts and their relationships. Luckily much of what you need to learn agrees with common sense, once you sort out the terminology. On the other hand, there is no ideal logical order for learning what you need to know, because every- thing relates to, and in some ways depends on, everything else. So be aware: many concepts are only loosely defined when first mentioned, then further clarified later when you have been introduced to other related material. Please try not to get frustrated with some incomplete knowledge as the course progresses. If you work hard, everything should tie together by the end of the course."

## Variables

manipulate one variable and observe the effects on another


## Related / Correlation / Causation
>If variables X and Y (e.g., the number of televisions (X) in various countries and the infant mortality rate (Y) of those countries) are found to be associated, then there are three basic possibilities. 
- First X could be causing Y (televisions lead to more health awareness, which leads to better prenatal care) 
- or Y could be causing X (high infant mortality leads to attraction of funds from richer countries, which leads to more televisions) 
- or unknown factor Z could be causing both X and Y (higher wealth in a country leads to more televisions and more prenatal care clinics). 

>It is worth memorizing these three cases, because they should always be considered when association is found in an observational study as opposed to a randomized experiment. (It is also possible that X and Y are related in more complicated ways including in large networks of variables with feedback loops.)

>Causation (“X causes Y”) can be logically claimed if X and Y are associated, and X precedes Y, and no plausible alternative explanations can be found, par- ticularly those of the form “X just happens to vary along with some real cause of changes in Y” (called confounding).

## Experimental Design
http://www.statisticshowto.com/experimental-design/
    
> Experimental design is a way to carefully plan experiments in advance so that your results are both objective and valid. The terms “Experimental Design” and “Design of Experiments” are used interchangeably and mean the same thing. However, the medical and social sciences tend to use the term “Experimental Design” while engineering, industrial and computer sciences favor the term “Design of experiments.”

## Confirmatoratory vs Exploratory Research 
https://en.wikipedia.org/wiki/Research_design

>Confirmatory research tests a priori hypotheses — outcome predictions that are made before the measurement phase begins. Such a priori hypotheses are usually derived from a theory or the results of previous studies. The advantage of confirmatory research is that the result is more meaningful, in the sense that it is much harder to claim that a certain result is generalizable beyond the data set. The reason for this is that in confirmatory research, one ideally strives to reduce the probability of falsely reporting a coincidental result as meaningful. This probability is known as α-level or the probability of a type I error.

>Exploratory research on the other hand seeks to generate a posteriori hypotheses by examining a data-set and looking for potential relations between variables. It is also possible to have an idea about a relation between variables but to lack knowledge of the direction and strength of the relation. If the researcher does not have any specific hypotheses beforehand, the study is exploratory with respect to the variables in question (although it might be confirmatory for others). The advantage of exploratory research is that it is easier to make new discoveries due to the less stringent methodological restrictions. Here, the researcher does not want to miss a potentially interesting relation and therefore aims to minimize the probability of rejecting a real effect or relation; this probability is sometimes referred to as β and the associated error is of type II. In other words, if the researcher simply wants to see whether some measured variables could be related, he would want to increase the chances of finding a significant result by lowering the threshold of what is deemed to be significant.

>Sometimes, a researcher may conduct exploratory research but report it as if it had been confirmatory ('Hypothesizing After the Results are Known', HARKing—see Hypotheses suggested by the data); this is a questionable research practice bordering on fraud.

## Sensor Design

https://www.clear.rice.edu/elec201/Book/sensors.html
    
> Without sensors, a robot is just a machine. Robots need sensors to deduce what is happening in their world and to be able to react to changing situations. This chapter introduces a variety of robotic sensors and explains their electrical use and practical application. The sensor applications presented here are not meant to be exhaustive, but merely to suggest some of the possibilities. Please do not be limited by the ideas contained in this chapter! Assembly instructions for the kit sensors are given in Section 2.6.


Sensors as Transducers (and Data Collectors)  

> The basic function of an electronic sensor is to measure some feature of the world, such as light, sound, or pressure and convert that measurement into an electrical signal, usually a voltage or current. Typical sensors respond to stimuli by changing their resistance (photocells), changing their current flow (phototransistors), or changing their voltage output (the Sharp IR sensor). The electrical output of a given sensor can easily be converted into other electrical representations.

## More on Experimental Design
EXPERIMENTAL DESGIN BOOK PDF

>Experimental design is a careful balancing of several features including ** “power”, generalizability, various forms of “validity”, practicality and cost**. These concepts will be defined and discussed thoroughly in the next chapter. For now, you need to know that often an improvement in one of these features has a detrimental effect on other features. 

> A thoughtful balancing of these features in advance will result in an experiment with the ** best chance of providing useful evidence ** to modify the current state of knowledge in a particular scientific field. On the other hand, it is unfortunate that many experiments are designed with avoidable flaws. It is only rarely in these circumstances that statistical analysis can rescue the experimenter. This is an example of the old maxim “an ounce of prevention is worth a pound of cure”.

>Our goal is always to actively design an experiment that has the best chance to produce meaningful, defensible evidence, rather than hoping that good statistical analysis may be able to correct for defects after the fact.

## Experimental Design and EDA
> Statistical analysis of experiments starts with graphical and non-graphical ex- ploratory data analysis (EDA). EDA is useful for
• detection of mistakes
• checking of assumptions
• determining relationships among the explanatory variables
• assessing the direction and rough size of relationships between explanatory and outcome variables, and

## Experimental Design and Modeling relationships

>Most formal (confirmatory) statistical analyses are based on models. Statis- tical models are ideal, mathematical representations of observable characteristics. Models are best divided into two components. **The structural component of the model (or structural model) specifies the relationships between explana- tory variables and the mean (or other key feature) of the outcome variables**. The **“random” or “error” component of the model (or error model) characterizes the deviations of the individual observations from the mean**. (Here, “error” does not indicate “mistake”.) The two model components are also called **“signal” and “noise”** respectively. 

>Statisticians realize that no mathematical models are perfect representations of the real world, but some are close enough to reality to be useful. A full description of a model should include all assumptions being made because statistical inference is impossible without assumptions, and sufficient deviation of reality from the assumptions will invalidate any statistical inferences.

>A slightly different point of view says that models describe how the distribution of the outcome varies with changes in the explanatory variables.

 
> **Statistical models have both a structural component and a random component which describe means and the pattern of deviation from the mean, respectively.**


## VARIABLE SELECTION
> Operational- izations define measures or variables which are quantities of interest or which serve as the practical substitutes for the concepts of interest. For example, if you have a theory about what affects people’s anger level, you need to operationalize the concept of anger. 


## What makes a “good” variable?
> Regardless of what we are trying to measure, the qualities that make a good measure of a scientific concept are high reliability, absence of bias, low cost, prac- ticality, objectivity, high acceptance, and high concept validity. 
> **Reliability** is essentially the inverse of the statistical concept of variance, and a rough equivalent is “consistency”. Statisticians also use the word “precision”.

> **Bias** refers to the difference between the measure and some “true” value. A difference between an individual measurement and the true value is called an “er- ror” (which implies the practical impossibility of perfect precision, rather than the making of mistakes). The bias is the average difference over many measurements. Ideally the bias of a measurement process should be zero. For example, a mea- sure of weight that is made with people wearing their street clothes and shoes has a positive bias equal to the average weight of the shoes and clothes across all subjects.

> All other things being equal, when two measures are available, we will choose the **less expensive and easier to obtain (more practical) measures**. Measures that have a greater degree of subjectivity are generally less preferable. Although devising your own measures may improve upon existing measures, there may be a trade off with acceptability, resulting in reduced impact of your experiment on the field as a whole.

> Construct validity is a key criterion for variable definition. Under ideal conditions, after completing your experiment you **will be able to make a strong claim that changing your explanatory variable(s) in a certain way (e.g., doubling the amplitude of a background hum) causes a corresponding change in your out- come (e.g., score on an irritability scale)**. But if you want to convert that to meaningful statements about the effects of auditory environmental disturbances on the psychological trait or construct called “irritability”, you must be able to argue that the scales have good construct validity for the traits, namely that the **operationalization of background noise as an electronic hum has good construct validity for auditory environmental disturbances**, and that your **irritability scale really measures what people call irritability**. Although construct validity is critical to the impact of your experimentation, its detailed understanding belongs sepa- rately to each field of study, and will not be discussed much in this book beyond the discussion in Chapter 3.

## Construct Validity
> Construct validity is a characteristic of devised measurements that describes how well the measurement can stand in for the scientific concepts or “constructs” that are the real targets of scientific learning and inference.

## Instrumentation
https://www.quora.com/What-is-Definition-of-Instrumentation
> Instrumentation is the variety of measuring instruments to monitor and control a process. It is the art and science of measurement and control of process variables within a production, laboratory, or manufacturing area.

> Instrumentation is defined as the art and science of measurement and control of process variables within a production or manufacturing area. The process variables used in industries are Level, Pressure, Temperature, Humidity, Flow, pH, Force, Speed etc.

## Survey Design
https://en.wikiversity.org/wiki/Survey_design

> This learning resource is about how to design surveys (or questionnaires) in the social sciences.

> Surveys are commonly used in disciplines such as psychology, health, marketing, sociology, governance, and demographics.

> Survey research is an efficient way of gathering data to help address a research question. The main challenge is developing reliable and valid measures and sampling representative data.

> Survey design is critical in determining the quality of research. The potential for poor design is vast - whether intentionally on the part of the researcher or unintentionally. For example, watch this [http://www.youtube.com/watch?v=G0ZZJXw4MTA 2 min. episode of Yes, Minister] about politicians trying to get the poll results they want.

>Before designing a survey[edit]
>It can be very tempting to press ahead with designing a survey. But first, be clear about the purpose of the study and the research methodology.


>Designing a survey? Don't put the cart before the horse. Develop a proposal first, then design the survey.
Before designing a survey, develop a research proposal which clearly explains the:
- research purpose
- research questions
- hypotheses
- Research design: Experimental, quasi-experimental, non-experimental
- Sampling method
- Target constructs - operationally define the:
- independent variables
- dependent variables


> Types of questions[edit]
It is surprisingly difficult to develop a "good" survey question or item. Consider each of the following aspects of survey questions, their pros and cons, and with examples:

>Objective vs. subjective
- Close-ended vs. open-ended
- Leading and loaded questions
- Positive-, negative-, and double-negative-wording


> Response formats[edit]
It is important to understand the implications of response formats on levels of measurement in survey design and quantitative data analysis.

>Some commonly used response formats include:
- Dichotomous: e.g., Yes or No
- Multi-chotomous: e.g., Yes, No, or Maybe
- Multiple response: e.g., Tick all that apply
- Likert scale: Equally-spaced intervals, usually 3 to 9 intervals
- Graphical rating: Can mark any point on a continuous scale
- Ranking: Compare items to each other by placing them in order of descending preference
- Semantic differential: Put two words at opposite ends of a scale with interval marks
- Idiographic: Use symbols/pictures instead of words and numbers
- For more info see: Rating scale (Wikipedia)



## More on Survey design
http://www.esourceresearch.org/eSourceBook/SampleSurveys/6DevelopingaSurveyInstrument/tabid/484/Default.aspx
    
> In general, survey questions should:
- Contain only one idea or question
- Define the scope to consider, such as the time period or activities that are relevant to the question
- Be written with neutral language to avoid leading the respondent to a specific answer
- Use language that enables less educated persons to easily understand the question.
- Contain response options that are simple, clear, consistent, and include the full range of responses that might occur
- For categorical responses, be mutually exclusive and exhaustive so that a respondent can pick one and only one option
- For numeric responses, guide the respondent to provide the response in a consistent format and units 

## Errors and Biases in Survey Research

https://blog.cruxresearch.com/2013/08/27/the-top-5-errors-and-biases-in-survey-research/

>The Top 5

>1.  Researcher Bias.

>The most important error that creeps into surveys about isn’t statistical at all and is not measurable. The viewpoint of the researcher has a way of creeping into question design and analysis. Some times this is purposeful, and other times it is more subtle. All research designers are human, and have points-of-view. Even the most practiced and professional researchers can have subtle biases in the way they word questions or interpret results. How we frame questions and report results is always affected by our experiences and viewpoints – which can be a good thing, but can also affect the purity of the study.

>2. Poor match of the sample to the population.

>This is the source of some of the most famous errors in polling. Our industry once predicted the elections of future Presidents Alf Landon and Thomas Dewey based on this mistake. It is almost never the case that the sampling frame you use is a perfect match to the population you are trying to understand, so this error is present on most studies. You can sometimes recover from asking the wrong questions, but you can never recover from asking them of the wrong people

>Most clients (and suppliers) like to focus on questionnaire development when a new project is awarded. The reality is the sampling and weighting plan is every bit as consequential to the success of the project, and rarely gets the attention it deserves. We can tell when we have a client that really knows what they are doing if they begin the project by focusing on sampling issues and not jumping to questionnaire design.

>3. Lack of randomness/response bias.

>Many surveys proceed without random samples. In fact, it is rare that a survey being done today can accurately claim to be using a random sample. Remember those statistics courses you took in college and graduate school? The one thing they have in common is pretty much everything they taught you statistically is only relevant if you have a random sample. And, odds are great that you don’t.

>A big source of “non-randomness” in a sample is response bias. A typical RDD phone survey being conducted today has a cooperation rate of less than 20%. 10% is considered a good response rate from an online panel. When we report results of these studies, we are assuming that the vast majority of people who didn’t respond would have responded in the same way as those who did. Often, this is a reasonable assumption. But, sometimes it is not. Response bias is routinely ignored in market research and polls because it is expensive to correct (the fix involves surveying the non-responders).

>4.  Failure to quota sample or weight data.

>This is a bit technical. Even if we sample randomly, it is typical for some subgroups to be more willing to cooperate than others. For example, females are typically less likely to refuse a survey invitation than males, and minorities are less likely to participate than whites. So, a good researcher will quota sample and weight data to compensate for this. In short, if you know something about your population before you survey them, you should use this knowledge to your advantage. If you are conducting an online poll and you are not doing something to quota sample or weight the data, odds are very good that you are making an important mistake.

>5.  Overdoing it.

>I have worked with methodologists who have more degrees than a thermometer, think about the world in Greek letters, and understand every type of bias we can comprehend. I have also seen them concentrate so much on correcting for every type of error they can imagine that they “overcook” the data. I remember once passing off a data set to a statistician, who corrected for 10 types of errors, and the resulting data set didn’t even have the gender distribution it the proper proportion.

>Remember — you don’t have to correct for an error or bias unless it has an effect on what you are asking.  For example, if men and women answer a question identically, weighting by gender will have no effect on the study results. Instead, you should know enough about the issues you are studying to know what types of errors are likely to be relevant to your study.

>So that is our top 5. Note that I did not put sampling error in the top 5. I am not sure it would make my top 20. Sampling error is the “+/- 5%” that you see attached to many polls. We will do a subsequent blog post on why this isn’t a particularly relevant error for most studies. It just happens to be the one type of error that can be easily calculated mathematically, which is why we see it cited so often. I am more concerned about the errors that are harder to calculate, or, more importantly, the ones that go unnoticed.

>With 40+ sources of errors, one could wonder how our industry ever gets it right. Yet we do. More than $10 Billion is spent on research and polling in the US each year, and if this money was not being spent effectively, the industry would implode. So, how do we get it right?

>In one sense, many of the errors in surveys tend to be randomly distributed. For instance, there can be a fatigue bias in a question involving a long list of items to be assessed. By presenting long lists in a randomized order we can “randomize” this error – we don’t remove it.

>In some sense, errors and biases also seem to have a tendency to cancel each other out, rather than magnify each other. And, as stated above, not all errors matter to every project. The key is to consider which ones might before the study is fielded.

# Sampling types
https://www.khanacademy.org/math/statistics-probability/designing-studies/sampling-methods-stats/a/sampling-methods-review

# Internal Validity (concept for Next Sprint)
http://www.indiana.edu/~educy520/sec5982/week_9/520in_ex_validity.pdf

>Why is Internal Validity Important?
We often conduct research in order to determine
cause-and-effect relationships.
■ Can we conclude that changes in the independent
variable caused the observed changes in the
dependent variable?
■ Is the evidence for such a conclusion good or poor?
■ If a study shows a high degree of internal validity then
we can conclude we have strong evidence of
causality.
■ If a study has low internal validity, then we must
conclude we have little or no evidence of causality.


# Necessary Conditions for Causality
>Three conditions that are necessary to claim that
variable A causes changes in variable B:
• Relationship condition: Variable A and variable B
must be related.
• Temporal Antecedence condition: Proper time order
must be established.
• Lack of Alternative Explanation Condition:
Relationship between variable A and variable B
must not be attributable to a confounding,
extraneous variable.


>Threats to internal validity compromise our confidence
in saying that a relationship exists between the
independent and dependent variables.

>Threats to external validity compromise our
confidence in stating whether the study’s results are
applicable to other groups.


## Project Ideas

From Justin for Both Sprints on Experimental Design and Research Methods
- What are you trying to understand? What questions are you trying to answer?
- How is data better than your intuition or ‘gut’?
- Is the thing you want to understand directly measureable? 
- If not, what are some proxies to that? 
- What is correlation vs. causation?
- What might confound your ability to approximate?
- Are there reasonable ranges of values you may expect to encounter?
- Are there unreasonable ranges? How do you know?
- Pull the measureable or proxy-based data and visualize it. 
        - What does it tell you? 
        - What are some basic conclusions?
        - What are some reasons that this may be wrong?
        - How can you overcome these limitations?
- Talk about Thomas Kuhn, Karl Popper, Michael Polanyi. 
- Tools: Scipy, numpy, PyMC, statsmodels, sci-kit learn