# Introduction
Most datasets that a data scientist or analyst might work with contain variables that describe a set of observations. For example, we might have variables such as age, gender, and political party for a group of people (observations). In tabular data (eg., a spreadsheet), variables are represented by the columns of the spreadsheet. The types of variables within our dataset will have a great impact on the insights we can gain from our data. This is why it is important to understand variable types, and how different variables can offer different perspectives and functionalities within our data.

Generally, variables will come in two varieties; categorical and quantitative. Categorical variables group observations into separate categories that can be ordered or unordered. Quantitative variables on the other hand are variables expressed numerically, whether as a count or measurement.

![](https://static-assets.codecademy.com/Courses/Hypothesis-Testing/vartypes_article.png)

Let’s dive a bit deeper into the different variable types to understand how to identify them in a dataset.

## Quantitative Variables
We can think of quantitative variables as any information about an observation that can only be described with numbers. Quantitative variables are generally counts or measurements of something (eg., number of points earned in a game or height). They are well suited for mathematical operations and quantitative analysis, and are helpful for answering questions like “How many/much?”, “What is the average?”, or “How often?”. There are two types of quantitative variables; discrete and continuous, and they both help to serve different functions in a dataset.

## Discrete Variables
Discrete quantitative variables are numeric values that represent counts and can only take on integer values. They represent whole units that can not be broken down into smaller pieces, and as such cannot be meaningfully expressed with decimals or fractions. Examples of discrete variables are the number of children in a person’s family or the number of coin flips a person makes. Unless we are working with quantum mechanics, we can not meaningfully have flipped a coin 3.5 times, or have 4.75 sisters.

When working with discrete variables in a dataset, you may see something similar to these values:

|  |team_wins  | num_goals_season | num_fouls_season |num_of_players|
|-----------|----------------|------------------|------------------|---|
| Flaskers  | 4              | 21               | 8                | 2 |
| Pythons   | 5              | 15               | 13               | 4 |
| Coders    | 10             | 17               | 9                | 5 |
| Julias    | 3              | 18               | 7                | 3 |


When inspecting a dataset for discrete variables, ask yourself if the variable would make sense if you added .5 to any of the values. In the table above, we can see that half of a win, player, goal, or foul would not be feasible in any soccer match!

## Continuous Variables
Continuous quantitative variables are numeric measurements that can be expressed with decimal precision. Theoretically, continuous variables can take on infinitely many values within a given range. Examples of continuous variables are length, weight, and age which can all be described with decimal values.

|          | weight | age    | height | temperature |
|----------|--------|--------|--------|-------------|
| Michael  | 61.28  | 21.5   | 76.03  | 36.21       |
| McKensey | 83.1   | 27.13  | 85.201 | 37.3        |
| Joel     | 69.7   | 34.901 | 77.34  | 36.918      |
| Barry    | 56.310 | 31.5   | 72.13  | 37.594      |


Let’s take a look at the height variable which describes the height of a person. This variable as well as the others in the table can contain an infinite number of values. A person’s height could be 1.8 meters, or more precisely 1.800232 meters, or even more precisely 1.8002322344124 meters.

![](https://static-assets.codecademy.com/Courses/Hypothesis-Testing/height_gif.gif)

Sometimes the line between discrete and continuous variables can be a bit blurry. For example, age with decimal values is a continuous variable, but age IN CLOSEST WHOLE YEARS by definition is discrete. The precision with which something is recorded can also determine how we classify the variable.

## Categorical Variables
Categorical variables differ from quantitative variables in that they focus on the different ways data can be grouped rather than counted or measured. With categorical variables, we want to understand how the observations in our dataset can be grouped and separated from one another based on their attributes. When the groupings have a specific order or ranking, the variable is an ordinal categorical variable. If there is no apparent order or ranking to the categories, we refer to the variable as a nominal categorical variable.

## Ordinal Variables
Do you remember working with a column in a dataset where the values of the column were groups that were greater or lesser than each other in some intrinsic way? Suppose there was a variable containing responses to the question “Rate your agreement with the statement: The minimum age to drive should be lowered.” The response options are “strongly disagree”, “disagree”, “neutral”, “agree”, and “strongly agree”. Because we can see an order where “strongly disagree” < “disagree “ < “neutral” < “agree” < “strongly agree” in relation to agreement, we consider the variable to be ordinal.

Other examples of ordinal variables could be the standings in a sporting competition, age ranges, and customer ratings of a product or service.

|           | place | company_seniority | age_group | customer_rating |
|-----------|-------|-------------------|-----------|-----------------|
| A. Jacobs | 1     | junior            | 20-25     | very_satisfied  |
| McKensey  | 3     | senior            | 35-40     | satisfied       |
| Joel      | 7     | executive         | 30-35     | satisfied       |
| Barry     | 4     | mid               | 50-55     | very_satisfied  |


It is important to keep in mind when working with ordinal variables that the differences between categories can vary. We can see in the table above that each age group in the `age_group` column has a range of five years, and so groups are evenly spaced apart. However, the same logic can not be applied to the `customer_rating` variable. Here it is not accurate to assume that the difference between ”satisfied” and ”very satisfied” is the same as the difference between ”dissatisfied” and ”very dissatisfied”. This is a key difference between ordinal variables and discrete quantitative variables.

## Nominal Variables
Nominal categorical variables are those variables with two or more categories that do not have any relational order. Examples of nominal categories could be states in the U.S., brands of computers, or ethnicities. Notice how for each of these variables, there is no intrinsic ordering that distinguishes a category as greater than or less than another category.

|        | pet_type | color  | favorite_food  | adoption_city |
|--------|----------|--------|----------------|---------------|
| Fluffy | cat      | orange | fish_pate      | Sacramento    |
| Bruno  | dog      | white  | peanut_butter  | Mt. Shasta    |
| Alfie  | bird     | blue   | sunflower_seed | San Francisco |
| Bitsy  | turtle   | green  | apple          | Los Angeles   |


The number of possible values for a nominal variable can be quite large. It’s even possible that a nominal categorical variable will take on a unique value for every observation in a dataset, like in the case of unique identifiers such as `name` or `email_address`.

Sometimes, identifying a nominal variable can be tricky if that variable has attributes that are ordinal or quantitative. For example, the `adoption_city` variable above is nominal; however, we could assign an ordering to adoption_city based on a city-specific attribute like yearly average temperature. We might do this if we want to build a model to predict whether a particular animal will be adopted — and believe that temperature is relevant in making this prediction. If temperature is the ONLY thing we care about with respect to adoption city, we could assign an order to `adoption_city` based on temperature (and rename the variable to something like `adoption_city_temp`). Alternatively, we could create a new ordinal variable in our dataset named city_rel_temp, which is completely dependent on `adoption_city` as shown below:

| adoption_city | city_rel_temp |
|---------------|---------------|
| Sacramento    | cool          |
| Mt. Shasta    | coldest       |
| San Francisco | warm          |
| Los Angeles   | warmest       |


Now we can see that the `city_rel_temp` is an ordinal variable as it contains an order that is based fundamentally on the temperatures of the cities.

## Binary Variables
Binary or dichotomous variables are a special kind of nominal variable that have only two categories. Because there are only two possible values for binary variables, they are mutually exclusive to one another. We can imagine a variable that describes if a picture contains a cat or a dog as a binary variable. In this case, if the picture is not a dog, it must be a cat, and vice versa. Binary variables can also be described with numbers similar to bits with 0 or 1 values. Likewise you may find binary variables containing boolean values of True or False.


|             | status | is_awake  | is_stable |
|-------------|--------|-----------|-----------|
| patient_101 | 1      | VERDADERO | No        |
| patient_304 | 0      | VERDADERO | Yes       |
| patient_107 | 1      | FALSO     | No        |
| patient_514 | 1      | FALSO     | Yes       |
