# Chapter 2: Fundamentals of Data in Scientific Research: Variables, Collection, and Bias

<div class="alert alert-info">Learning goals</div>

1. Understand and identify the different types of variables.
2. Understand the different methods of data collection and their respective benefits and limitations.
3. Distinguish between population parameters and sample estimates.
4. Understand the implications of sampling error and bias on the quality of data and how to mitigate them.

## Introduction

In scientific research, the power of inquiry is inextricably linked with the data we gather. Data, or information collected for reference or analysis, is the cornerstone that enables scientists to generate new knowledge and insight. This chapter will guide you through the essentials of data, the different types of variables, how data is collected, and the key considerations when dealing with population and sample data.

### Types of Variables

Understanding the types of data, or variables, is the first step in effective data analysis. Variables can broadly be classified as numerical and categorical.

1. **Numerical variables** represent measurements or counts and can be subdivided into two types:
    - **Discrete:** These variables take on only certain values within a given range, such as the number of petals on a flower or the number of cells in a tissue sample.
    - **Continuous:** These variables can take on any value within a range, like the weight of an organism or the pH level of a solution.
<div style="display:flex; justify-content:center;">
    <img src="../images/flower1.jpg" alt="Image" width="400" height="300" style="margin-left: 10px;">
</div>

2. **Categorical variables** represent qualitative data and identify a particular group or category to which data points belong. Examples include the species of a plant, the type of a disease, or the color of a bird's feathers. We define each value factor (e.g., blue, green, red) as **level** or a **group**.

<div style="display:flex; justify-content:center;">
    <img src="../images/cardinal.jpg" alt="Image" width="400" height="300" style="margin-left: 10px;">
</div>


### Data Collection Methods

Collecting accurate and relevant data is critical for scientific investigations. The two primary study designs for data collection are experimental and observational studies:

1. **Experimental studies** involve designing an experiment with specific treatments and controls for outside variables. This design is useful for determining cause and effect relationships but may be challenging to generalize to broader, real-world contexts.
<div style="display:flex; justify-content:center;">
    <img src="../images/experiment1.jpg" alt="Image" width="400" height="300" style="margin-left: 10px;">
</div>

2. **Observational studies** involve making observations in the real world without intervening. These studies are good for identifying patterns but don't control for outside variables, making it difficult to infer causality.

<div style="display:flex; justify-content:center;">
    <img src="../images/dolphin.jpg" alt="Image" width="400" height="300" style="margin-left: 10px;">
</div>



### Video

<iframe width="462" height="260" src="https://www.youtube.com/embed/3mjK6Zzy_Mk" title="Types of studies and variables" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

[Video Link](https://youtu.be/3mjK6Zzy_Mk)

### Population vs. Sample

Data can be collected from a complete group, known as the population, or a subset of that group, called a sample.

1. **Population (N):** This is the complete set of data from a particular source. Any conclusions drawn directly from a population are called parameters. These are constant and exact.
2. **Sample (n):** This is a subset of data drawn from the population for analysis. Conclusions from sample data, termed estimates, are random and approximate.

<div style="display:flex; justify-content:center;">
    <img src="../images/sheep.jpg" alt="Image" width="400" height="250" style="margin-left: 10px;">
</div>



### Video
<iframe width="462" height="260" src="https://www.youtube.com/embed/w--h_vL5ZPk" title="Populations and Samples" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

[Video link](https://youtu.be/w--h_vL5ZPk)

### Sampling Errors and Bias

When dealing with samples, two key considerations are sampling error and bias:

1. **Sampling error** refers to the random discrepancies between the sample estimate and the true population parameter. The larger the sample size, the smaller the sampling error, as the sample becomes more representative of the population.
2. **Sampling bias** occurs when certain members of the population are more likely to be sampled than others. This bias can be mitigated by ensuring samples are selected randomly. However, achieving true randomness can be challenging in certain scenarios.

Several common sources of sampling bias include:

- **Sample of convenience:** Researchers select the easiest samples to collect.
- **Volunteer bias:** Data comes only from members of the population who voluntarily provide data, such as respondents to a survey.


### Video
<iframe width="462" height="260" src="https://www.youtube.com/embed/-eIQoC18YKU" title="Sampling bias and sampling error and how to protect against it" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

[Video link](https://youtu.be/-eIQoC18YKU)

## Conclusion

Understanding data and its fundamentals is crucial for any scientific endeavor. This chapter has explored the types of variables, methods of data collection, and considerations for dealing with population and sample data. As you move forward in your exploration of scientific research, keep these principles in mind to ensure your findings are robust, reliable, and insightful.

## End of chapter questions

1. Why is it important for scientists to understand the different types of variables in data analysis? Discuss how the classification of variables as numerical or categorical influences the choice of appropriate statistical methods and the interpretation of results. Provide examples from scientific research to support your answer.

2. Compare and contrast experimental and observational  study designs, highlighting their respective strengths and limitations. In what situations would each design be most suitable for gathering scientific data?

3. Explain the distinction between a population and a sample in data collection. Discuss why researchers often work with samples rather than attempting to collect data from entire populations. What are the advantages and challenges associated with using sample data to make inferences about populations?

4. Sampling error and bias are important considerations when working with sample data. Define sampling error and explain how it impacts the accuracy of sample estimates in relation to the true population parameters. Discuss strategies that researchers can employ to reduce sampling error.

5. The chapter mentions various sources of sampling bias, such as convenience sampling and volunteer bias. Choose one of these biases and discuss its potential impact on the validity and generalizability of research findings. Suggest alternative sampling methods that researchers can employ to minimize the effects of this bias.