# 3 Selecting and Transforming Data

Learn advanced methods to select and transform columns. Also, learn about select helpers, which are functions that specify criteria for columns you want to choose, as well as the rename verb.

# Selecting columns

Using the select() verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs. The colon (:) is useful for getting many columns at a time.

# Instructions:

- Use glimpse() to examine all the variables in the counties table.
- Select the columns for state, county, population, and (using a colon) all five of those industry-related variables; there are five consecutive variables in the table related to the industry of people's work: professional, service, office, construction, and production.
- Arrange the table in descending order of service to find which counties have the highest rates of working in the service industry.

In [None]:
# Glimpse the counties table
glimpse(counties)

counties %>%
  # Select state, county, population, and industry-related columns
    select(state, county, population, professional:production) %>%
  # Arrange service in descending order 
  arrange(desc(service))

# Select helpers

In the video you learned about the select helper starts_with(). Another select helper is ends_with(), which finds the columns that end with a particular string.

# Instructions:

- Select the columns state, county, population, and all those that end with work.
- Filter just for the counties where at least 50% of the population is engaged in public work.

In [None]:
counties %>%
  # Select the state, county, population, and those ending with "work"
  select(state, county, population, ends_with("work")) %>%
  # Filter for counties that have at least 50% of people engaged in public work
  filter(public_work >= 50)

# Renaming a column after count

The rename() verb is often useful for changing the name of a column that comes out of another verb, such as count(). In this exercise, you'll rename the default n column generated from count() to something more descriptive.

# Instructions:

- Use count() to determine how many counties are in each state.

In [None]:
counties %>%
  # Count the number of counties in each state
  count(state) 

- Notice the n column in the output; use rename() to rename that to num_counties.

In [None]:
counties %>%
  # Count the number of counties in each state
  count(state) %>%
  # Rename the n column to num_counties
  rename(num_counties = n)

# Renaming a column as part of a select

rename() isn't the only way you can choose a new name for a column; you can also choose a name as part of a select().

# Instructions:

- Select the columns state, county, and poverty from the counties dataset; in the same step, rename the poverty column to poverty_rate.



In [None]:
counties %>%
  # Select state, county, and poverty as poverty_rate
  select(state, county, poverty_rate = poverty)

# Using relocate

As you learned in the video, the relocate() verb allows you to move columns around relative to other columns or its overall position in the tibble.

You've been given the counties_selected tibble, which contains the columns you need for your analysis of population density, but in an order that isn't easy to read. You'll use your new-found skills to put them right!

# Instructions:

- Move the density column to the end of the tibble.
- Move the population column to before the land_area column.

In [None]:
counties_selected %>%
  # Move the density column to the end
  relocate(density, .after = last_col()) %>%
  # Move the population column to before land_area
  relocate(population, .before = land_area)

# Choosing among the four verbs

In this chapter you've learned about the four verbs: select(), mutate(), relocate(), and rename(). Here, you'll choose the appropriate verb for each situation. You won't need to change anything inside the parentheses.

# Instructions:

- Choose the right verb for changing the name of the unemployment column to unemployment_rate
- Choose the right verb for keeping only the columns state, county, and the ones containing poverty.
- Calculate a new column called fraction_women with the fraction of the population made up of women, without dropping any columns.
- Keep only three columns: the state, county, and employed / population, which you'll call employment_rate.

In [None]:
# Change the name of the unemployment column
counties %>%
  rename(unemployment_rate = unemployment)

# Keep the state and county columns, and the columns containing poverty
counties %>%
  select(state, county, contains("poverty"))

# Calculate the fraction_women column without dropping the other columns
counties %>%
  mutate(fraction_women = women / population)

# Move the region column to before state
counties %>%
  relocate(region, .before = state)