# Introduction and Descriptive Statistics

## Welcome to Statistics: Statistics in Everyday Life

Statistics isn't just something confined to textbooks or academic research. It's a fundamental tool for understanding and interpreting the world around us. We use statistical thinking and concepts constantly, from interpreting weather forecasts to understanding economic trends.

### Examples in Daily Conversations and News

The section highlights numerous examples of how statistics appear in everyday conversations and the news media:
*   **Weather Forecasts:** When a weather report says there's a 35% chance of rain, that probability is derived from statistical models analyzing historical weather data.
*   **Housing Affordability:**  News reports about rising housing costs rely on statistical analysis of real estate prices over time.
*   **Unemployment Rates:**  The unemployment rate is a key economic statistic, calculated by surveying a sample of the population. Statements about how the rate is changing (rising or falling) involve statistical comparisons.
*   **Job Market Trends:** Discussions about millennials' job preferences (e.g., seeking part-time work) are based on statistical surveys and analysis of employment data.
*   **Sharing Economy:** The trends and preferences associated to sharing economy is derived from statistical data.
*   **Salaries:**
    *   Average starting salaries for specific professions (e.g., business analysts).
    *   Comparisons of salaries between different professions (e.g., engineers vs. economists).
    *   Determining the highest-paid individuals (e.g., in Hollywood or sports).
*   **Crime Rates:**  Reports about increases or decreases in crime rates (e.g., in Chicago) involve statistical analysis of crime data.
*   **Sports:**  Analyzing player performance (e.g., batting averages, scoring averages) relies heavily on statistics.

### Statistical Concepts and Terminology
This section mentions several key statistical concepts and terms that we encounter regularly:

*   **Average (Mean):**  A measure of central tendency.  We talk about average income, average age, average height, etc.
*   **Maximum:** The highest value in a dataset (e.g., the highest-paid athlete, the maximum temperature).
*   **Minimum:** The lowest value in a dataset (e.g., the lowest unemployment rate).
*   **Percentage:**  A way of expressing a proportion as a fraction of 100 (e.g., the percentage of females studying engineering).
*   **Likelihood (Probability):**  The chance of an event occurring (e.g., the chance of rain).
*   **Variance/Standard Deviation:**  Measures of how spread out data is (e.g., the consistency of a stock's performance).
*   **T-test:** A statistical test used to compare the means of two groups (e.g., comparing spending habits of men and women).
* **Median:** Middle value of a sorted dataset.

### Statistics in the News Media
News organizations frequently use statistics to report on trends and events:

*   **Election Polls:**  Polls predict election outcomes by surveying a sample of voters. The results are presented with margins of error, which are statistical measures of uncertainty.
*   **Economic Indicators:**  Reports on economic growth, inflation, and other indicators are based on statistical data.
*   **Development Statistics:**  Comparisons of housing prices, economic development, or other metrics across countries rely on statistical analysis.

## Types of Data in Statistics

Before starting any statistical analysis, it's crucial to understand the type of data you're working with. The data type dictates which statistical tools and techniques are appropriate and meaningful.

### Types of Data Based on Collection
We can classify data based on _how_ it was collected. 
1. **Cross-Sectional Data:** 
    * Measurements taken at a single point in time. It's like a snapshot of a population or phenomenon. 
    * **Examples:**
        * A census conducted in a specific year.
        * Student evaluations of a course at the end of a semester. 
        * A survey of customer satisfaction conducted on a particular day.
2. **Panel Data:**
    * The _same_ individuals or entities are measured repeatedly over time. It tracks changes within the same subjects.
    * **Example:** A group of individuals is surveyed annually for five years, asking them the set of questions each year. This allows researches to track changes in opinions, behaviors, or characteristics over time within the same group.
3. **Time Series Data:**
    * Measurements of a _single_ variable are taken at regular intervals over time. It focuses on the evolution of a single phenomenon.
    * **Examples:**
        * Monthly unemployment rates recorded over several decades.
        * Daily closing stock prices for a particular company.
        * Hourly temperature readings at a weather station.

### Types of Data Based on Number of Variables
1. **Univariate Data:** The dataset has only _one_ variable.
    * **Example:** A list of heights of students in a class.
2. **Multivariate Data:** The dataset has _multiple_ variables.
    * **Example:** A dataset with information on students' height, weight, age, and GPA.

### Types of Variables
1. **Categorical (Nominal) Variables:** 
    * Data that represents categories or groups _without_ any inherent order or ranking.
    * **Subtypes:**
        * **Binomial (dichotomous):** Only two categories (e.g., own/rent, yes/no, true/false)
        * **Multinomial:** More than two categories (e.g., modes of transportation: car/bus/walk/bike).
2. **Ordinal Variables:**
    * Categorical data where the categories _do_ have a meaningful order or ranking.
    * **Key Characteristics:**
        * Categories can be compared (one category is higher/lower, better/worse than another).
        * Differences between categories are not necessarily equal or meaningful. You know the order, but not the precise distance between ranks.
        * Regular statistical calculations (like means) are often inappropriate. 
    * **Examples:**
        * Number of cars owned by a household (0, 1, 2, 3+)
        * Customer satisfaction ratings (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied).
        * Education level (high school, bachelor's, master's, doctorate).
3. **Ratio Variables:**
    * Numerical data with a _natural zero point_. This means that zero represents the absence of the quantity being measured.
    * **Key Characteristics:**
        * Ordered, with meaningful differences between values.
        * Ratios are meaningful (e.g., $100 is twice as much as $50).
        * All standard mathematical operations (addition, subtraction, multiplication, division) are valid.
    * **Examples:** Sales (in dollars), distance, weight, age, income, height.
4. **Interval Variables:**
    * Numerical data that is ordered and the _differences_ between values are meaningful, but there is _no natural zero point_.
    * **Key Characteristics:**
        * Ordered, with meaningful differences.
        * Even though the _differences_ are meaningful, the ratios are _not_. For example, 50°C is a higher temperature than 25°C, but it doesn't mean that the weather is "2 times" hotter.
        * Zero is arbitrary; it doesn't represent the absence of the quantity.
    * **Examples:** Temperature (0°C doesn't mean "no temperature"), calendar years (Year 0 is arbitrary), IQ scores (0 IQ doesn't mean "no intelligence")

### Strongest to Weakest Form of Measurement
Ratio -> Interval -> Ordinal -> Categorical

### Summary Table

| Data Type        | Order | Meaningful Differences | Natural Zero | Ratios Meaningful | Examples                                      |
| ---------------- | ----- | ---------------------- | ------------ | ----------------- | --------------------------------------------- |
| Categorical      | No    | No                     | No           | No                | Eye color, gender, city                       |
| Ordinal          | Yes   | No                     | No           | No                | Education level, customer satisfaction          |
| Interval         | Yes   | Yes                    | No           | No                | Temperature (°C, °F), calendar year             |
| Ratio            | Yes   | Yes                    | Yes          | Yes               | Height, weight, income, sales, distance        |
| Time Series      | Yes   | Yes                     |  -      |        -           |  Unemployment Rate, Daily closing stock prices        |
| Panel       | Yes   | Yes                     |  -       |      -             |   A group of individuals is surveyed annually for five years.    |
| Cross-Sectional    |  -   |      -                 |    -         |     -              |   A census conducted in a specific year   |

## Measure of Central Tendency