<img src="./images/banner.png" width="800">

# Types of Data

In the field of statistics, data is the foundation upon which all analyses and conclusions are built. Understanding the different types of data is crucial for selecting appropriate statistical methods, interpreting results accurately, and making informed decisions based on the available information.


It is essential to understand the different types of data for several reasons:

1. **Choosing Appropriate Statistical Methods**: Different types of data require different statistical approaches. By understanding the nature of your data, you can select the most suitable methods for analysis, such as descriptive statistics, hypothesis testing, or regression analysis.

2. **Interpreting Results Accurately**: The type of data you are working with influences how you interpret the results of your analysis. For example, the mean is an appropriate measure of central tendency for quantitative data, while the mode is more suitable for qualitative data.

3. **Avoiding Common Pitfalls**: Misidentifying the type of data can lead to incorrect analyses and misleading conclusions. By understanding the characteristics of each data type, you can avoid common pitfalls and ensure the validity of your results.

4. **Communicating Findings Effectively**: Knowing the type of data you are dealing with helps you communicate your findings clearly and accurately to others. This is particularly important when presenting results to stakeholders or collaborating with colleagues from different fields.


In this lecture, we will explore the two main categories of data: **qualitative (categorical)** data and **quantitative (numerical)** data. We will define and provide examples of each type, and discuss how to analyze them effectively.


We will also dive into the two subtypes of quantitative data: discrete and continuous data. Understanding the differences between these subtypes is essential for selecting appropriate statistical methods and graphical representations.


<img src="./images/types-of-data.png" width="800">

Furthermore, we will discuss the levels of measurement, which describe the nature of the data and the relationships between values. The four levels of measurement are nominal, ordinal, interval, and ratio.


By the end of this lecture, you will have a solid understanding of the different types of data and their importance in statistical analysis. This knowledge will serve as a foundation for further learning and application of statistical concepts in various fields.

**Table of contents**<a id='toc0_'></a>    
- [Qualitative (Categorical) Data](#toc1_)    
  - [Nominal Data](#toc1_1_)    
  - [Ordinal Data](#toc1_2_)    
- [Quantitative (Numerical) Data](#toc2_)    
  - [Discrete Data](#toc2_1_)    
  - [Continuous Data](#toc2_2_)    
- [Differences Between Qualitative and Quantitative Data](#toc3_)    
  - [Data Collection Methods](#toc3_1_)    
  - [Data Analysis Techniques](#toc3_2_)    
  - [Graphical Representations](#toc3_3_)    
- [Levels of Measurement](#toc4_)    
  - [Nominal Level](#toc4_1_)    
  - [Ordinal Level](#toc4_2_)    
  - [Interval Level](#toc4_3_)    
  - [Ratio Level](#toc4_4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Qualitative (Categorical) Data](#toc0_)

Qualitative data, also known as categorical data, represents characteristics or attributes that cannot be measured numerically. This type of data is typically used to describe qualities or categories.


Qualitative data is non-numerical and describes characteristics or categories. Some key characteristics of qualitative data include:

1. **Descriptive**: Qualitative data describes qualities or attributes, such as colors, types, or opinions.
2. **Non-numerical**: Qualitative data cannot be measured or expressed using numbers.
3. **Categories**: Qualitative data is often organized into distinct categories or groups.


Qualitative data can be further divided into two subtypes: nominal data and ordinal data.


<img src="./images/qualitative-data.png" width="800">

### <a id='toc1_1_'></a>[Nominal Data](#toc0_)


Nominal data is a type of qualitative data where the categories have no inherent order or ranking. The categories are mutually exclusive and exhaustive, meaning that each data point can only belong to one category, and all possible categories are included.


Examples of nominal data include:
- Eye color (blue, brown, green)
- Marital status (single, married, divorced)
- Brand preferences (Coke, Pepsi, Dr. Pepper)


When analyzing nominal data, you can use the following techniques:

1. **Frequency Distribution**: Count the number of observations in each category to create a frequency distribution table or graph, such as a bar chart or pie chart.
2. **Mode**: Determine the most frequently occurring category or categories in the dataset.
3. **Chi-Square Test**: Use a chi-square test to determine if there is a significant association between two nominal variables.


### <a id='toc1_2_'></a>[Ordinal Data](#toc0_)


Ordinal data is a type of qualitative data where the categories have a natural order or ranking. However, the differences between categories are not necessarily equal or measurable.


Examples of ordinal data include:
- Educational attainment (high school, bachelor's degree, master's degree, doctorate)
- Survey responses (strongly disagree, disagree, neutral, agree, strongly agree)
- Economic status (low, medium, high)


When analyzing ordinal data, you can use the following techniques in addition to those used for nominal data:

1. **Median**: Calculate the middle value in the ordered dataset to determine the median.
2. **Percentiles**: Determine the percentage of observations below or above a specific value using percentiles.
3. **Spearman's Rank Correlation**: Use Spearman's rank correlation to measure the strength and direction of the relationship between two ordinal variables.


Understanding the differences between nominal and ordinal data is crucial for selecting appropriate statistical methods and accurately interpreting the results of your analysis.

## <a id='toc2_'></a>[Quantitative (Numerical) Data](#toc0_)

Quantitative data, also known as numerical data, represents measurements or quantities that can be expressed using numbers. This type of data is used to describe measurable characteristics or attributes.


Quantitative data is numerical and represents measurable quantities or values. Some key characteristics of quantitative data include:

1. **Numerical**: Quantitative data is expressed using numbers and can be used in mathematical operations.
2. **Measurable**: Quantitative data represents measurable quantities or values, such as height, weight, or temperature.
3. **Continuous or Discrete**: Quantitative data can be either continuous (having an infinite number of possible values within a range) or discrete (having a finite or countable number of possible values).


Quantitative data can be further divided into two subtypes: discrete data and continuous data.


<img src="./images/quantitative-data.png" width="800">

### <a id='toc2_1_'></a>[Discrete Data](#toc0_)


Discrete data is a type of quantitative data that has a finite or countable number of possible values. Discrete data often represents whole numbers or counts.


Examples of discrete data include:
- Number of children in a family (0, 1, 2, 3, etc.)
- Number of cars sold per day at a dealership
- Number of students in a classroom


When analyzing discrete data, you can use the following techniques:

1. **Frequency Distribution**: Create a frequency distribution table or graph, such as a bar chart or histogram, to visualize the distribution of the data.
2. **Measures of Central Tendency**: Calculate the mean (average), median (middle value), and mode (most frequent value) to describe the center of the data distribution.
3. **Measures of Dispersion**: Calculate the range (difference between the maximum and minimum values), variance, and standard deviation to describe the spread of the data.


### <a id='toc2_2_'></a>[Continuous Data](#toc0_)


Continuous data is a type of quantitative data that has an infinite number of possible values within a specific range. Continuous data often represents measurements or values that can be fractional.


Examples of continuous data include:
- Height of individuals in a population
- Time taken to complete a task
- Temperature readings throughout the day


When analyzing continuous data, you can use the following techniques in addition to those used for discrete data:

1. **Histograms**: Create a histogram to visualize the distribution of continuous data by dividing the data into intervals or bins.
2. **Density Plots**: Use density plots to represent the probability density function of the continuous data.
3. **Measures of Central Tendency and Dispersion**: Calculate the mean, median, mode, range, variance, and standard deviation to describe the center and spread of the continuous data distribution.
4. **Correlation and Regression**: Use correlation and regression analysis to examine the relationship between two or more continuous variables.


Understanding the differences between discrete and continuous data is essential for selecting appropriate statistical methods, creating meaningful visualizations, and interpreting the results of your analysis accurately.

## <a id='toc3_'></a>[Differences Between Qualitative and Quantitative Data](#toc0_)

Qualitative and quantitative data differ in their nature, collection methods, analysis techniques, and graphical representations. Understanding these differences is crucial for effectively collecting, analyzing, and interpreting data in various fields.


### <a id='toc3_1_'></a>[Data Collection Methods](#toc0_)


1. **Qualitative Data**:
   - Qualitative data is typically collected through methods that allow for open-ended responses and detailed descriptions.
   - Common data collection methods include interviews, focus groups, observations, and open-ended survey questions.
   - These methods allow participants to express their thoughts, opinions, and experiences in their own words.

2. **Quantitative Data**:
   - Quantitative data is collected through structured methods that yield numerical or measurable responses.
   - Common data collection methods include closed-ended surveys, experiments, and systematic observations.
   - These methods often involve predetermined response options or scales, ensuring that the data can be easily quantified and analyzed.


### <a id='toc3_2_'></a>[Data Analysis Techniques](#toc0_)


1. **Qualitative Data**:
   - Qualitative data analysis focuses on identifying themes, patterns, and relationships within the data.
   - Common analysis techniques include content analysis, thematic analysis, and narrative analysis.
   - These techniques involve coding and categorizing the data, allowing researchers to draw meaningful conclusions and insights.

2. **Quantitative Data**:
   - Quantitative data analysis involves using statistical methods to describe, summarize, and draw inferences from the data.
   - Common analysis techniques include descriptive statistics (e.g., mean, median, standard deviation), inferential statistics (e.g., t-tests, ANOVA, regression), and hypothesis testing.
   - These techniques allow researchers to identify significant relationships, differences, and trends within the data.


### <a id='toc3_3_'></a>[Graphical Representations](#toc0_)


1. **Qualitative Data**:
   - Graphical representations of qualitative data focus on visualizing categories, themes, or relationships.
   - Common graphical representations include word clouds, concept maps, and tree diagrams.
   - These visualizations help to communicate the key findings and insights from the qualitative analysis.

2. **Quantitative Data**:
   - Graphical representations of quantitative data focus on displaying the distribution, central tendency, and variability of the data.
   - Common graphical representations include bar charts, histograms, scatter plots, and box plots.
   - These visualizations help to summarize and communicate the key features and relationships within the quantitative data.


It is important to note that some research projects may involve collecting and analyzing both qualitative and quantitative data, known as mixed-methods research. This approach allows researchers to gain a more comprehensive understanding of the topic by leveraging the strengths of both data types.


By understanding the differences between qualitative and quantitative data in terms of collection methods, analysis techniques, and graphical representations, researchers can make informed decisions when designing studies, analyzing data, and communicating their findings effectively.

## <a id='toc4_'></a>[Levels of Measurement](#toc0_)

Levels of measurement, also known as scales of measurement, describe the nature of the data and the relationships between values. Understanding the level of measurement is essential for selecting appropriate statistical methods and interpreting the results accurately. There are four levels of measurement: nominal, ordinal, interval, and ratio.


<img src="./images/levels-of-measurement.png" width="800">

### <a id='toc4_1_'></a>[Nominal Level](#toc0_)


- Nominal level data is the lowest level of measurement and represents categories or labels with no inherent order or numerical value.
- Examples of nominal level data include gender (male, female), marital status (single, married, divorced), and eye color (blue, brown, green).
- Nominal level data can be counted and described using frequencies and percentages.
- Appropriate measures of central tendency for nominal data include the mode (most frequent category).
- Statistical tests suitable for nominal data include chi-square tests and Fisher's exact test.


### <a id='toc4_2_'></a>[Ordinal Level](#toc0_)


- Ordinal level data represents categories with a natural order or ranking, but the differences between categories are not necessarily equal or measurable.
- Examples of ordinal level data include educational attainment (high school, bachelor's, master's, doctorate), survey responses (strongly disagree, disagree, neutral, agree, strongly agree), and economic status (low, medium, high).
- Ordinal level data can be counted, described using frequencies and percentages, and ranked.
- Appropriate measures of central tendency for ordinal data include the median (middle value) and mode.
- Statistical tests suitable for ordinal data include Spearman's rank correlation, Kendall's tau, and Mann-Whitney U test.


### <a id='toc4_3_'></a>[Interval Level](#toc0_)


- Interval level data represents numerical values where the differences between values are meaningful and consistent, but there is no true zero point.
- Examples of interval level data include temperature measured in Celsius or Fahrenheit, dates on a calendar, and IQ scores.
- Interval level data can be added and subtracted meaningfully, but multiplication and division are not appropriate.
- Appropriate measures of central tendency for interval data include the mean (average), median, and mode.
- Statistical tests suitable for interval data include t-tests, ANOVA, and Pearson's correlation coefficient.


### <a id='toc4_4_'></a>[Ratio Level](#toc0_)


- Ratio level data represents numerical values where the differences between values are meaningful, consistent, and there is a true zero point.
- Examples of ratio level data include height, weight, age, and income.
- Ratio level data can be added, subtracted, multiplied, and divided meaningfully.
- Appropriate measures of central tendency for ratio data include the mean, median, and mode.
- Statistical tests suitable for ratio data include all tests applicable to interval data, as well as geometric mean and coefficient of variation.


It is important to note that the level of measurement determines the appropriate statistical methods and tests that can be used. Using statistical methods designed for a higher level of measurement on data with a lower level of measurement can lead to inaccurate or misleading results.


By understanding the levels of measurement and their properties, researchers can make informed decisions when collecting data, selecting statistical methods, and interpreting the results of their analyses.