(frames)=
# Working with data frames

Data frames are essentially tables of data. The rows are called *observations* and the columns are called *variables*.

<div class="admonition note">
<div class="title" style="background: lightblue; padding: 10px">Note</div>
<p>The word "variable" has two different meanings. To reduce confusion, we will include a parenthetical note whenever referring to a column in a data frame  in this subsection. Elsewhere, the distinction should be made clear by context. </p>
</div>

## Outputting the data
We can output the values in a data frame stored in a variable by running the variable name all by itself.

In [9]:
airquality

Ozone,Solar.R,Wind,Temp,Month,Day
<int>,<int>,<dbl>,<int>,<int>,<int>
41,190,7.4,67,5,1
36,118,8.0,72,5,2
12,149,12.6,74,5,3
18,313,11.5,62,5,4
,,14.3,56,5,5
28,,14.9,66,5,6
23,299,8.6,65,5,7
19,99,13.8,59,5,8
8,19,20.1,61,5,9
,194,8.6,69,5,10


<div class="admonition warning">
<div class="title" style="background: pink; padding: 10px">Warning</div>
    <p>If a data frame is large, <b>R may skip some of the middle observations (rows) and / or variables (columns)</b>. This is indicated by horizontal triple dots $\cdots$ or vertical triple dots $\vdots$.
</div>

### Head
The `head()` function can be used to display the first few observations (rows) of a data frame.

```head(DATA_FRAME, NUM_OBSERVATIONS)```

In [4]:
head(airquality,3)

Unnamed: 0_level_0,Ozone,Solar.R,Wind,Temp,Month,Day
Unnamed: 0_level_1,<int>,<int>,<dbl>,<int>,<int>,<int>
1,41,190,7.4,67,5,1
2,36,118,8.0,72,5,2
3,12,149,12.6,74,5,3


## Structure
It is often helpful know the structural properties of a data frame. The `str()` function outputs the following properties of a data frame: 
- number of observations (rows)
- number of variables (columns)
- name of each variable (column)
- type of data in each variable (column)
- a few values from each variable (column)

In [1]:
str(airquality)

'data.frame':	153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...


Sometimes, we are only interested in a particular property of a data frame. The following functions give us access to specific properties. 

### Dimensions
The `dim()` function outputs a vector containing the number of observations (rows) followed by the number of variables (columns).

```dim(DATA_FRAME)```

Output: `c(NUM_OBSERVATIONS, NUM_VARIABLES)`

In [5]:
dim(airquality)

The `nrow()` function outputs the number of observations (rows) in a data frame.

In [6]:
nrow(airquality)

### Variable names
The `names()` function outputs the names of all of the variables (columns) in a data frame.

In [7]:
names(airquality)

## Pieces of data


### Variables
Each variable (column) in a data frame is a vector. Use the following syntax reference a particular variable (column).

```DATA_FRAME$VARIABLE_NAME```

In [10]:
airquality$Temp

### Specific entry
We can access a specific entry in a data frame by specifying the variable (column) and observation (row) using the following syntax.

```DATA_FRAME$VARIABLE_NAME[OBSERVATION_NUMBER]```

In [14]:
airquality$Temp[1] # First observation (row) in the Temp variable (column).

<div class="admonition note">
<div class="title" style="background: lightblue; padding: 10px">Note</div>
<p>R starts counting at one. Some other programing language start counting at zero.</p>
</div>