## 1. Introduction to Statistics

**"Data"** and **"information"** are often used interchangeably, but there are subtle differences between these components and their purpose.

* <ins>**Data**</ins>: A collection of individual facts or statistics; Raw form of knowledge.
    * <ins>Can come in various forms</ins>: text, observation, figures, numbers, graphs, *etc.*
        * <ins>Examples</ins>: Individual prices, weights, addresses, names, temperatures, dates, distances, *etc.*
    * <ins>Types of Data</ins>:
        * <ins>Quantitative Data</ins>: Deals with numerical data (*eg.* salary, weight, *etc.*).
        * <ins>Qualitative Data</ins>: Deals with non-numerical data (*eg.* name, generator, *etc.*).
* <ins>**Information**</ins>: Knowledge gained through study, communication, research, or instruction; Result of analyzing and interpreting pieces of data.
    * <ins>Examples</ins>:
        * <ins>Data</ins>: Temperature readings in a location.
        * <ins>Information</ins>: Determine seasonal temperature patterns.

#### Differences between Data & Information
| **Data** | **Information** |
| :-----: | :-----: |
| A collection of facts. | Puts facts into context. |
| Raw and unorganized. | Organized. |
| Data points are individual and sometimes unrelated. | Maps out data to provide a big-picture view of how it all fits together. |
| Meaningless on its own. | Data becomes meaningful information after analysis and interpretation. |
| Doesn't depend on information. | Dependent on data. |
| Typically comes in the form of graphs, numbers, figures, or statistics. | Typically presented through words, language, thoughts, and ideas. |
| Insufficient for decision-making. | Sufficient for decision-making. |

Both **data** and **information** are critical elements in business decision-making. Understanding how these components work together, you can move your business toward a more data- and insights-driven culture.

#### What is Statistics?
<ins>**Statistics**</ins>: From [University of California - Irvine](https://www.stat.uci.edu/what-is-statistics/):
> "Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data. Statistics is a highly interdisciplinary field; research in statistics finds applicability in virtually all scientific fields and research questions in the various scientific fields motivate the development of new statistical methods and theory. In developing methods and studying the theory that underlies the methods statisticians draw on a variety of mathematical and computational tools."


#### Bonus: 'Data is...' or 'Data are...'
<ins>Articles to read</ins>:
* [Thesaurus.com](https://www.thesaurus.com/e/grammar/data-is-or-data-are/)
* The British Medical Journal
    * BMJ article - [abstract](https://www.bmj.com/content/380/bmj.p529)
    * BMJ article - [full](https://www.bmj.com/content/bmj/380/bmj.p529.full.pdf)

<ins>Quick summary from [Thesaurus.com](https://www.thesaurus.com/e/grammar/data-is-or-data-are/)</ins>:
> "The word data can be either singular or plural depending on meaning and context. In general usage, data is treated as singular when used as a mass noun to mean “information” and as plural when used to mean “individual facts.” In scientific and academic writing, data is almost always used as a plural noun. In digital technology, data is usually treated as a singular mass noun to mean “digitally stored information.” 

## 2. Types of Statistical Analysis 

* <ins>**Descriptive Statistics**</ins>: A branch of statistics that summarizes/describes data.
    * <ins>A few brances in Descriptive Statistics</ins>:
        * <ins>Measures of Central Tendency</ins>: Describes the center of the dataset (**mean**, **median**, **mode**); Measures the most common patterns of the analyzed dataset; Doesn't describe how the data is distributed within the set.
        * <ins>Measures of Variability</ins>: aka Measures of Spread; Describes the dispersion of the dataset (**variance**, **standard deviation**, **minimum** and **maximum variables**, **kurtosis**, **skewness**). 
* <ins>**Inferential Statistics**</ins>: A branch of statistics that makes inferences/predictions about a **population** based on its **sample**.
    * <ins>[Population](https://www.investopedia.com/terms/p/population.asp)</ins>: A statistical term that designates the pool from which a **sample** is drawn for a study; Any selection grouped by a common feature can be considered a **population**; **Note**: The term "individual" doesn't always mean a person in statistics. An individual is a single entity in the group being studied.
    * <ins>[Sample](https://www.investopedia.com/terms/s/sample.asp)</ins>: A representative subset of a **population**; Selected based on [sampling methods](https://www.khanacademy.org/math/statistics-probability/designing-studies/sampling-methods-stats/a/sampling-methods-review); A correctly-chosen sample will have most of the information about the **population**.
        * <ins>**Steps in Inferential Analysis**</ins>:
            1. Determine Population
            2. Sampling
            3. Data Analysis
            4. Decision making for the entire population

## 3. How Statistics and Machine Learning are Related