# Modifying Data Frames in R

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1">Introduction</a></span></li><li><span><a href="#Adding-a-Column" data-toc-modified-id="Adding-a-Column-2">Adding a Column</a></span></li><li><span><a href="#Adding-Multiple-Columns" data-toc-modified-id="Adding-Multiple-Columns-3">Adding Multiple Columns</a></span></li><li><span><a href="#Transmute-Columns" data-toc-modified-id="Transmute-Columns-4">Transmute Columns</a></span></li><li><span><a href="#Rename-Columns" data-toc-modified-id="Rename-Columns-5">Rename Columns</a></span></li></ul></div>

### Introduction

The `dplyr` package can be used to perform data frame modifications.

Data from the [American Kennel Club (AKC](https://www.akc.org/) is loaded into a data frame.

In [33]:
# load libraries

library(readr)
library(dplyr)

In [34]:
dogs <- read_csv('/users/bm/downloads/csv_files/dogs.csv')


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
cols(
  Breed = [31mcol_character()[39m,
  height_low_inches = [32mcol_double()[39m,
  height_high_inches = [32mcol_double()[39m,
  weight_low_lbs = [32mcol_double()[39m,
  weight_high_lbs = [32mcol_double()[39m,
  `2016 Rank` = [32mcol_double()[39m,
  `2015 Rank` = [32mcol_double()[39m,
  `2014 Rank` = [32mcol_double()[39m,
  `2013 Rank` = [32mcol_double()[39m
)




In [35]:
# inspect data frame

head(dogs)

Breed,height_low_inches,height_high_inches,weight_low_lbs,weight_high_lbs,2016 Rank,2015 Rank,2014 Rank,2013 Rank
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Affenpinscher,9,12,8,12,149,136,144,143
Afghan Hound,25,27,50,60,113,100,98,95
Airdale Terrier,22,24,45,45,55,53,57,56
Akita,26,28,80,120,46,46,46,45
American Eskimo,9,19,25,30,122,118,120,110
American Foxhound,22,25,65,70,189,181,180,176


In [36]:
# rename columns

dogs <- dogs %>% rename(rank_2016 = '2016 Rank',
                        rank_2015 = '2015 Rank', 
                        rank_2014 = '2014 Rank', 
                        rank_2013 = '2013 Rank')

# list column names

colnames(dogs)

### Adding a Column

Add a new column to `dogs` named `avg_height` that is the average of `height_low_inches` and `height_high_inches`. Save this new data frame to `dogs`.

In [37]:
dogs <- dogs %>% mutate(avg_height = (height_low_inches + height_high_inches)/2)

# inspect the new data frame

head(dogs)

Breed,height_low_inches,height_high_inches,weight_low_lbs,weight_high_lbs,rank_2016,rank_2015,rank_2014,rank_2013,avg_height
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Affenpinscher,9,12,8,12,149,136,144,143,10.5
Afghan Hound,25,27,50,60,113,100,98,95,26.0
Airdale Terrier,22,24,45,45,55,53,57,56,23.0
Akita,26,28,80,120,46,46,46,45,27.0
American Eskimo,9,19,25,30,122,118,120,110,14.0
American Foxhound,22,25,65,70,189,181,180,176,23.5


### Adding Multiple Columns

Add two new columns `avg_weight` and `rank_change_13_to_16` that shows the change in rank from 2013 to 2016

In [38]:
dogs <- dogs %>% mutate(avg_weight = (weight_low_lbs + weight_high_lbs)/2, 
                        rank_change_13_to_16 = rank_2016 - rank_2013)

# inspect the new data frame
head(dogs)

Breed,height_low_inches,height_high_inches,weight_low_lbs,weight_high_lbs,rank_2016,rank_2015,rank_2014,rank_2013,avg_height,avg_weight,rank_change_13_to_16
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Affenpinscher,9,12,8,12,149,136,144,143,10.5,10.0,6
Afghan Hound,25,27,50,60,113,100,98,95,26.0,55.0,18
Airdale Terrier,22,24,45,45,55,53,57,56,23.0,45.0,-1
Akita,26,28,80,120,46,46,46,45,27.0,100.0,1
American Eskimo,9,19,25,30,122,118,120,110,14.0,27.5,12
American Foxhound,22,25,65,70,189,181,180,176,23.5,67.5,13


### Transmute Columns

Dplyr's `transmute()` function will add new columns while dropping the existing columns that may no longer be useful for analysis.

In [43]:
dogs <- read_csv('/users/bm/downloads/csv_files/dogs.csv')

# rename columns

dogs <- dogs %>% rename(rank_2016 = '2016 Rank',
                        rank_2015 = '2015 Rank', 
                        rank_2014 = '2014 Rank', 
                        rank_2013 = '2013 Rank')

# list column names

colnames(dogs)


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
cols(
  Breed = [31mcol_character()[39m,
  height_low_inches = [32mcol_double()[39m,
  height_high_inches = [32mcol_double()[39m,
  weight_low_lbs = [32mcol_double()[39m,
  weight_high_lbs = [32mcol_double()[39m,
  `2016 Rank` = [32mcol_double()[39m,
  `2015 Rank` = [32mcol_double()[39m,
  `2014 Rank` = [32mcol_double()[39m,
  `2013 Rank` = [32mcol_double()[39m
)




Add the columns `avg_height`, `avg_weight`, and `rank_change_13_to_16` to dogs while dropping all existing columns except `Breed`.

In [44]:
dogs <- dogs %>% transmute(Breed = Breed, avg_height = (height_low_inches + height_high_inches)/2, avg_weight = (weight_low_lbs + weight_high_lbs)/2, rank_change_13_to_16 = rank_2016 - rank_2013)

In [45]:
head(dogs)

Breed,avg_height,avg_weight,rank_change_13_to_16
<chr>,<dbl>,<dbl>,<dbl>
Affenpinscher,10.5,10.0,6
Afghan Hound,26.0,55.0,18
Airdale Terrier,23.0,45.0,-1
Akita,27.0,100.0,1
American Eskimo,14.0,27.5,12
American Foxhound,23.5,67.5,13


### Rename Columns

Dplyr's `rename()` function allows you to easily update the column names of a data frame. Syntax:
```r
rename(new_column_name = old_column_name)
```

In [46]:
# save the column names of 'dogs' to 'original_col_names' and print it

original_col_names <- colnames(dogs)

print(original_col_names)

[1] "Breed"                "avg_height"           "avg_weight"          
[4] "rank_change_13_to_16"


Update the name of `avg_height` to `avg_height_inches`, `avg_weight` to `avg_weight_lbs`, and `rank_change_13_to_16` to `popularity_change_13_to_16.` Save the updated data frame to `dogs.`

In [47]:
dogs <- dogs %>% rename(avg_height_inches = avg_height,
                        avg_weight_lbs = avg_weight,
                        popularity_change_13_to_16 = rank_change_13_to_16)

Save the new column names of `dogs` to `new_col_names` and print it.

In [48]:
new_col_names <- colnames(dogs)

print(new_col_names)

[1] "Breed"                      "avg_height_inches"         
[3] "avg_weight_lbs"             "popularity_change_13_to_16"
