# Plotting Different Types of Data

Visualization is an incredibly powerful way to present data. However, to generate effective plots, you need to understand the structure and organization of your data. In particular, you need to consider the different **variables** present in your data, whether they are **continuous** or **categorical**, and whether they are **nested** (often called *repeated measures*). In this lesson we'll review these different types of data and some common approaches to plotting them. 

## Continuous Data

Continuous data comprises most numeric data that you will encounter. Values can range continuously (i.e., in very small steps) across some range of values (possibly infinite, but typically with practical bounds when dealing with neuroscientific or psychological data). In Python, numeric data are typically stored as integers or floating point numbers. 

### Thought Question
If you think about the Gapminder GDP dataset that we have worked with in previous lessons, what variables are continuous?

```{admonition} Click the button to reveal the solution
:class: dropdown

### Answer: 
- GDP is continuous. The values representing gross domestic product represent millions of dollars, and are stored as floating point numbers
- Year is also, arguably, a numeric variable. The way it is reprepsented in the Gapminder datasets could be considered categorical (see below), but year itself is a continuous variable, most typically reprepsented as integers.
```

## Categorical Data

Categorical data comprises most data that is not numeric. For example, in a drug study someone might receive an experimental drug or a placebo — it's one or the other. Or in a language study, each participant might be classified as a native English speaker or someone who learned English as a second language. Categorical data can include data that have some degree of continuity. For example, Some people learn Engling rfrom their parents as the only language they hear in the first year of their lives, whereas others may hear another language at home but learn English fluently from an early age, from other kids in the neighbourhood. So in many cases, we *treat* data as categorical — often for convenience — even when there are subtleties that are lost.

### Thought Question
What variable(s) in the Gapminder GDP set are categorical?

```{admonition} Click the button to reveal the solution
:class: dropdown
### Answer
- Country is definitely categorical. Each country has its own GDP values
- As noted above, year is arguably categorical, because in the Gapminder dataset, not all years are present so they are not truly continuous. However, we would typically still treat year as continuous in this context, because it is a unit of time
- We also created a categorical, variable called "region" in one lesson, sorting countries into northern/souther/easter/western Europe

```

Continuous data can sometimes be made categorical as well. For example, height is a continuous variable, but for convenience in a research study we might want to classify people as "short", "medium", or "tall" rather than their precise hieght in centimetres. In research with children, participants are often categorized into groups such as grades 1-2, grades 3-4, grades 5-6, instead of treating grade (or age) as a continuous variable. This process of turning continuous data into categorical is called **discretizing** (making discrete). It can be useful if the data you collect aren't as continuous as the possible range of values (e.g., children's academic knwledge and abilities are typically more related to their grade level than their chronological age), or for ease of generalization.


## Plotting Continuous and Categorical Data

Recognizing which variables are continuous or categorical is important, because in many cases you need to plot them differently. Continuous data lend themselves to things like continuous lines (e.g., regression lines) and scatterplots. In contrast, plotting categorical data typically involves different plot "objects" for each category, such as difeerent bars in a bar graph, or different lines in a line plot.

## Nested Data

Nested data occur when some measurements are "nested" or isolated inside other variables. This is very common in cognitive and neuroscience research, where we take many measurements from each individual participant (e.g., lots of experimental trials, recording from multiple electrodes). Nested data can be continuous or categorical, but the variable that they are nested inside is almost always categorical (e.g., individual people or animals). In educational research, data may be collected from children in different schools; in this case data may be nested within each child, but children are in turn nested inside schools. 

Recognizing nested data is important because, typically, there is less variability within an individual (or other nesting category) than between individuals (or other categories). As we will shortly learn, in exporatory data analysis (EDA), as well as in statistics, measures of variability are an important way that we make inferences about the data. 

For example, say we run a reaction time experiment with 100 trials, and 10 participants. We thus have 100 x 10 = 1000 data points. The variability is likely lower within an individula than between – one participant may be on average 150 ms faster than another, but each individual may only show a variation of +/- 25 ms in their personal average reaction times. In this case, if we compute the averate over all 1000 trials, without considering nesting structure, our measure of the average variance across the 1000 trials will be very low, because each individual contributes so many (similar) trials. But if we first averate across the 100 trials for each participant, then compute the variability between these averages, we will most likely see higher variance, which reflects the true person-to-person variation. 