# Introduction

Reference: STATISTICS (Eleventh Edition) by Robert S. Witte

## What is Statistics?
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.

### Descriptive Statistics
Statistics exists because of the prevalence of variability in the real world. In its simplest form, known as descriptive statistics, statistics provides us with tools—tables, graphs, averages, ranges, correlations—for organizing and summarizing the inevitable variability in collections of actual observations or scores

Examples:
1. A graph showing the annual change in global temperature during the last 30 years
2. A report that describes the average difference in grade point average (GPA) between college students who regularly drink alcoholic beverages and those who don’t.

### Inferential Statistics
Statistics also provides tools—a variety of tests and estimates—for generalizing beyond collections of actual observations. This more advanced area is known as inferential statistics. Tools from inferential statistics permit us to use a relatively small collection of actual observations

Examples:
1. A researcher’s hypothesis that, on average, meditators report fewer headaches than do nonmeditators
2. An assertion about the relationship between job satisfaction and overall happiness.

## Population vs. Sample
Inferential statistics is concerned with generalizing beyond sets of actual observations, that is, with generalizing from a sample to a population. In statistics, a population refers to any complete collection of observations or potential observations, whereas a sample refers to any smaller collection of actual observations drawn from a population.

<img src="./images/sample-population.png" alt="sample-population" width=500 align="left" />

### Surveys (Random Sampling) vs. Experiments (Random Assignment)

**Random sampling (Survey)** is a procedure designed to ensure that each potential observation in the population has an equal chance of being selected in a survey.

Estimating the average anxiety score for all college students probably would not generate much interest. Instead, we might be interested in determining whether relaxation training causes, on average, a reduction in anxiety scores between two groups of otherwise similar college students.

College students in the relaxation experiment probably are not a random sample from any intact population of interest, but rather a convenience sample consisting of volunteers from a limited pool of students fulfilling a course requirement. Accordingly, our focus shifts from random sampling to the random assignment of volunteers to the two groups.

**Random assignment (Experiment)** is procedure designed to ensure that each person has an equal chance of being assigned to any group in an experiment.

<img src="./images/assignment-survey.png" alt="assignment-survey" width=500 align="left" />

Indicate whether each of the following terms is associated primarily with a survey (S) or an experiment (E).

- random assignment
- representative
- generalization to the population
- control group
- real difference
- random selection
- convenience sample
- volunteers

## Three Types of Data

The precise form of a statistical analysis often depends on whether data are **qualitative**, **ranked**, or **quantitative**.

- **Qualitative data** consist of words (Yes or No), letters (Y or N), or numerical codes (0 or 1) that represent a class or category.
- **Ranked data** consist of numbers (1st, 2nd, . . . 40th place) that represent relative standing within a group.
- **Quantitative data** consist of numbers (weights of 238, 170, . . . 185 lbs) that represent an amount or a count.

Indicate whether each of the following terms is qualitative, ranked, or quantitative.

- ethnic group
- age
- family size
- academic major
- sexual preference
- IQ score
- net worth (dollars)
- third-place finish
- gender
- temperature

## Levels of Measurement

The level of measurement specifies the extent to which a number (or word or letter) actually represents some attribute and, therefore, has implications for the appropriateness of various arithmetic operations and statistical procedures.

For our purposes, there are three levels of measurement—**nominal**, **ordinal**, and **interval/ratio**—and these levels are paired with **qualitative**, **ranked**, and **quantitative** data, respectively.

### Qualitative Data and Nominal Measurement

If people are classified as either male or female (or coded as 1 or 2), the data are qualitative and measurement is nominal. The single property of nominal measurement is **classification**—that is, sorting observations into different classes or categories.

A distinctive feature of nominal measurement is its bare-bones representation of any attribute. For instance, a student is either male or female. Even with the introduction of arbitrary numerical codes, such as 1 for male and 2 for female, it would never be appropriate to claim that, because female is 2 and male is 1, females have twice as much gender as males. Similarly, **calculating an average with these numbers would be meaningless**.

### Ranked Data and Ordinal Measurement

When any single number indicates only relative standing, such as first, second, or tenth place in a horse race or in a class of graduating seniors, the data are ranked and the level of measurement is ordinal. The distinctive property of ordinal measurement is **order**.

Since ordinal measurement fails to reflect the actual distance between adjacent ranks, **simple arithmetic operations with ranks are inappropriate**. For example, it’s inappropriate to conclude that the arithmetic mean of ranks 1 and 3 equals rank 2, since this assumes that the actual distance between ranks 1 and 2 equals the distance between ranks 2 and 3.

### Quantitative Data and Interval/Ratio Measurement

The distinctive properties of interval/ratio measurement are **equal intervals** and a **true zero**.

**Equal intervals** imply that hefting a 10-lb weight while on the bathroom scale always registers your actual weight plus 10 lbs.

A **true zero** signifies that the bathroom scale registers 0 when not in use—that is,
when weight is completely absent.

### Measurement of Nonphysical Characteristics

In the absence of a true zero, it would be inappropriate to claim that an IQ score of 140 represents twice as much intellectual aptitude as an IQ score of 70.

Other interpretations are possible. One possibility is to treat IQ scores as attaining only ordinal measurement—that is, for example, a score of 140 represents more intellectual aptitude than a score of 130—without specifying the actual size of this difference.

### Summary

<img src="./images/data-types.png" alt="data-types" width=500 align="left" />

## Types of Variables

A variable is a characteristic or property that can take on different values.

### Discrete and Continuous Variables

- A **discrete variable** consists of isolated numbers separated by gaps.
- A **continuous variable** consists of numbers whose values, at least in theory, have no restrictions.

### Independent and Dependent Variables

- In an experiment, an **independent variable** is the treatment manipulated by the investigator.
- When a variable is believed to have been **influenced by the independent variable**, it is called a **dependent variable**. 

Unlike the independent variable, the dependent variable isn’t manipulated by the investigator. Instead, it represents an outcome: the data produced by the experiment.

With just a little practice, you should be able to identify these two types of variables. In an experiment, what is being manipulated by the investigator at the outset and, therefore, qualifies as the independent variable? What is measured, counted, or recorded by the investigator at the completion of the study and, therefore, qualifies as the dependent variable? Once these two variables have been identified, they can be used to describe the problem posed by the study; that is, does the independent variable cause a change in the dependent variable?

<img src="./images/obs-experiment.png" alt="observation-experiment" width=500 align="left" />