Copyright 2020 Andrew M. Olney and made available under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0) for text and [Apache-2.0](http://www.apache.org/licenses/LICENSE-2.0) for code.

# Data Science and the Nature of Data

## Types of variables

Structured data begins with **measurements** of some type of thing in the real world, which we call a **variable**.
Consider the example of height. 
I may measure 10 people and find that their heights in centimeters are:

| Height |
|--------|
| 165    |
| 188    |
| 153    |
| 164    |
| 150    |
| 190    |
| 169    |
| 163    |
| 165    |
| 190    |

Each of these values (e.g. 165) is a measurement of the variable *height*.
We call *height* a variable because its value isn't constant.
If everyone in the world were the same height, we wouldn't call height a variable, and we also wouldn't bother measuring it, because we'd know everyone is the same.

Variables have different **types** that can affect your analysis.

### Nominal

A nominal variable consists of unordered categories, like *male* or *female* for biological sex.
Notice that these categories are not numbers, and there is no order to the categories.
We do not say that male comes before female or is smaller than female.

### Ordinal

Ordinal variables consist of ordered categories.
You can think of it as nominal data but with an ordering from first to last or smallest to largest.
A common example of ordinal data are Likert questions like:

```
(1) Strongly disagree
(2) Disagree
(3) Neither agree nor disagree
(4) Agree
(5) Strongly agree
```

Even though these options are numbered 1 to 5, those numbers only indicate which comes before the others, not how "big" an option is.
For example, we wouldn't say that the difference between *Agree*  and *Disagree* is the same as the difference between *Neither agree nor disagree* and *Strongly agree*.

### Interval

Interval variables are ordered *and* their measurement scales are evenly spaced.
A classic example is temperature in Fahrenheit.
In degrees Fahrenheit, the difference between 70 and 71 is the same as the difference between 90 and 91 - either case is one degree.
The other most important characteristic of interval variables is also the most confusing one, which is that interval variables don't have a meaningful zero value.
Degrees Fahrenheit is an example of this because there's nothing special about 0 degrees. 
0 degrees doesn't mean there's no temperature or no heat energy, it's just an arbitrary point on the scale.

### Ratio

Ratio variables are like interval variables but with meaningful zeros.
Age and height are good examples because 0 age means you have no age, and 0 height means you have no height.
The name *ratio* reflects that you can form a ratio with these variables, which means that you can say age 20 is twice as old as age 10.
Notice you can't say that about degrees Fahrenheit: 100 degrees is not really twice as hot as 50 degrees, because 0 degrees Fahrenheit doesn't mean "no temperature."


In [1]:
import nbformat

# Path to the current notebook
notebook_path = "copytest.ipynb"

# Load the existing notebook
with open(notebook_path, "r") as f:
    notebook = nbformat.read(f, as_version=4)

# Define the exact identifier for the cell (unique part of the content or full text)
cell_to_remove_content = "<iframe width='560' height='315' src='https://www.youtube.com/embed/gtR6o4T6MyA?si=P96V4LzxBrcOCXl8'"

# Iterate through all cells and force delete the matching cell
cell_deleted = False
for i, cell in enumerate(notebook.cells):
    if cell.cell_type == "markdown" and cell_to_remove_content in ''.join(cell.source):
        del notebook.cells[i]
        cell_deleted = True
        print(f"Removed markdown cell containing: {cell_to_remove_content}")
        break

if not cell_deleted:
    print("Markdown cell not found.")

# Save the modified notebook
with open(notebook_path, "w") as f:
    nbformat.write(notebook, f)

print("Notebook saved without the locked markdown cell.")


Markdown cell not found.
Notebook saved without the locked markdown cell.
