# Introduction to Statistics
## Data and Information
"Data" and "information" are often used interchangeably, but they aren't the same.

* <ins>**Data**</ins>:
    * Defined as a collection of individual facts or statistics; Raw form of knowledge.
    * <ins>**Can come in various forms**</ins>: Text, observations, figures, numbers, graphs, *etc.*
    * <ins>Examples</ins>: individual prices, weights, addresses, ages, names, temperatures, dates, *etc.*
    * <ins>**Types of Data**</ins>:
        * <ins>Quantitative Data</ins>: Deals with numerical data (e.g. salary, weight, *etc.*).
        * <ins>Qualitative Data</ins>: Deals with non-numerical data (e.g. name, gender, *etc.*).
* <ins>**Information**</ins>: 
    * Defined as knowledge gained through study, communication, research, or instruction; Result of analyzing and interpreting pieces of data.
    * <ins>Examples</ins>:
        * **Data**: Temperature readings in a location.
        * **Information**: Determine seasonal temperature patterns.

### Differences between Data and Information
Both are critical for business decision-making.

| **Data** | **Information** |
| :-----: | :-----: |
| A collection of facts. | Puts facts into context. |
| Raw and unorganized. | Organized. |
| Individual and sometimes unrelated. | Maps out data to provide a big-picture view of how it all fits together. |
| Meaningless on its own. | After analysis and interpretation, data becomes meaningful information. |
| Doesn't depend on information. | Depends on data. |
| Typically comes in the form of graphs, numbers, figures, or statistics. | Typically presented through words, language, thoughts, and ideas. |
| Insufficient for decision-making. | Sufficient for decision-making. |

## Statistics & Statistical Analysis
From [Quora's Bot](https://www.quora.com/What-is-the-difference-between-statistics-and-statistical):

>"Statistics" and "statistical" are related terms but have different meanings:
>
>    1. **Statistics**: Statistics is a field of study that involves collecting, analyzing, interpreting, presenting, and organizing data. It deals with the methods for collecting, summarizing, and interpreting data to make decisions and draw conclusions. Statistics involves techniques such as hypothesis testing, regression analysis, and probability theory.
>    
>    2. **Statistical**: "Statistical" is an adjective that refers to something related to statistics. For example, "statistical analysis" means the analysis of data using statistical methods, "statistical inference" refers to drawing conclusions from data using statistical techniques, and "statistical significance" indicates the likelihood that a result is not due to chance. 
>    
>    "In summary, "statistics" is the broader field of study, while "statistical" is an adjective used to describe things related to statistics or statistical methods."

### Why Statistical Analysis is important?
* Helps us collect data using the proper methods, and employing the correct analysis.
* Helps us conduct research and present the result effectively.
* Find the structures in data and make the relevant predictions.
* Apply statistical methods to build machine learning models.

## Types of Statistical Analysis

* <ins>**Descriptive Statistics**</ins>: Helps us summarize data.
    * Brief descriptive coefficients that summarizes a given dataset.
    * <ins>**Types of Descriptive Statistics**</ins>:
        * <ins>Measures of Central Tendency</ins>: Focuses on the average or middle values of datasets.
            * Describes the center position of a distribution for a dataset.
            * A person analyzes the frequency of each data point in the distribution and describes it using the **mean**, **median**, or **mode**, which measures the most common patterns of the analyzed dataset.
            * <ins>Example</ins>: The sum of the following dataset - (2, 3, 4, 5, 6) - is 20:
                * **Mean**: 4 (20/5)
                * **Median**: 4
                * **Mode**: Null
        * <ins>Measures of Variability</ins>: Focuses on the dispersion of data.
            * AKA **Measures of Variability**; Aids in analyzing how dispersed the distribution is for a set of data.
            * Includes **standard deviation**, **variance**, **minimum** and **maximum variables**, **kurtosis**, and **skewness**.
            * The **Measures of Central Tendency** may give a person the average of a dataset, it doesn't describe how the data is distributed within the set.
                * <ins>Example</ins>: The average of the data may be 65 out of 100, there can still be data points at both 1 and 100. **Measures of Variability** help communicate this by describing the shape and spread of the dataset.
                * <ins>Consider the following dataset</ins>: (5, 19, 24, 62, 91, 100) - The range of that dataset is 95 (calculated by subtracting the lowest number from the highest number).
        * Can you use **descriptive statistics** to make inference or prediction? No, that job is for **inferential statistics**.
* <ins>**Inferential Statistics**</ins>: Helps us make inferences and predictions about a **population** based on a **sample**.
    * **Population**: Collection of individuals/events whose properties are to be analyzed, and relationships to be identified.
    * **Sample**: Subset of **population**; Selected based on sampling methods; A correctly chosen sample will have most of the information about the population.
    * *Inferential* means can be concluded. Inferential statistics are a type of statistics that focuses on processing **sample** data so that you can make decisions or conclusions on the **population**.
        * Most preferred as it produces accurate estimates at a relatively affordable cost.
    * <ins>**Key steps in Inferential Analysis**</ins>:
        1. Determine Population
        2. Sampling
        3. Data Analysis
        4. Decision-making for the entire population
    * **Advantages of Inferential Statistics**:
        * A precise tool for estimation of population.
        * Highly structured analytical method.
    * **Examples of Inferential Statistics**:
        * <ins>Regression Analysis</ins>: Used to predict the relationship between **independent variables** (aka **features**) and the **dependent variable** (aka **target**). *Example*: Factors influencing the decline in poverty. You use variables such as road length, economic growth, electrification ratio, number of teachers, number of medical personnel, *etc.*
        * <ins>Hypothesis test</ins>: Helps us prove whether the opinions or things we believe in are true or false. *Example*: Women are more addicted to Instagram than men.
        * <ins>Confidence Intervals</ins>: Estimate the **population** by using **samples**. *Example*: Estimate the average expenditure for the entire city.
        * <ins>Time Series Analysis</ins>: Predict a future event/Predict future events on the basis on pre-existing data. *Example*: Estimating the economic growth in the future.

## Understanding the Types of Data
### Types of Data
* <ins>**Qualitative Data**</ins>: Takes values in a set of categories; Can be used as labels; Mathematical computation can't be performed on these values.
    * <ins>Nominal Data</ins>:
        * 
    * <ins>Ordinal Data</ins>:
        *
* <ins>**Quantitative Data**</ins>: Takes on numerical values (these values can be measured/counted); Mathematical computations are possible for this data type.
    * <ins>Discrete Data</ins>:
        * 
    * <ins>Continuous Data</ins>:
        * 