# L02-8-Select
## Assignment Instructions
Rename with your name in place of Studentname and make your edits and updates here.




# Selecting Columns

The select() function is simple, in that, you provide a comma separated list of columns and only those columns are selected. Practice makes perfect. Let’s revisit the select() function and work with some of its helper functions to be able to select the columns we are looking for. In a prior exercise, we explored the basic select() operation, namely, provide it a list of columns and only those columns are returned in a data frame, in that order. Often times, we aren’t selecting columns just to remove columns we don’t need, but rather selecting columns that we need to do some additional data wrangling or cleansing with such as data type conversion. This makes selecting columns a much more common task than just part of the data import process. 

In this exercise, we will explore some additional time-saving features select() provides. 
Admittedly, the dataset we are working with have only several columns. As a result, some functions we won't be able to cover, and all this might seem a bit trivial. In practice, however, there can be data frames with hundreds or thousands of columns. Managing these manually would be painful. Often, these thousand-column data frames have some patterns in their column names. Maybe they end with a 3-digit number, or maybe the date columns end with _date. We can exploit these patterns. Being able to more programmatically select the columns of interest will become more important in these cases. After all, we are in a programming environment, so let’s take advantage of it.

We will also use regular expressions. Regular Expressions is a popular method for text pattern matching that is implemented in many programming languages. Full coverage is outside the scope of this course. It is used briefly in this exercise to make you aware of it and it will show up more, later in the course. Just know that R makes extensive use of regular expressions and is the default for most text matching functions. So be careful when trying to match symbols as they may be interpreted as regular expression modifiers.


## R Features
* library()
* glimpse()
* ? help
* %>% pipe
* names()
* sort()
* rename()
* \- exclude
* : range
* starts_with(): starts with a prefix
* ends_with(): ends with a prefix
* contains(): contains a literal string
* matches(): matches a regular expression
* everything(): all variables
* regular expressions


## Datasets
* mpg

In [2]:
# Load libraries
library(tidyverse)

Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats


In [None]:
# Explore data structure
# Data: mpg
# glimpse(mpg)

In [4]:
# Display the select() help
# help(select)

# Notice the examples
# Notice the related rename() function

In [6]:
# select columns with comma delimited list
# use names() instead of glimpse to see just the column names
# select columns hwy, displ, cyl, and class
mpg %>% 
   select(hwy, displ, cyl, class) %>% 
   names()

# Notice that it simply returns the list of column names

In [7]:
# Sometimes for longer lists of columns
# I like to see them in alphabetical order
# sort() can help with this
mpg %>% 
   names() %>% 
   sort()

# Notice that all the columns are in alphabetical order
# The order in the data frame didn't change
# only the order that they are being displayed was sorted.

In [9]:
# select all columns except some
# Use - to remove columns
# select all columns except trans and fl
mpg %>% names()

mpg %>%
   select(-trans, -fl) %>%
   names()

In [10]:
# select columns by range
# Use <column name>:<column name> for start and end range
# select all columns inclusively between model and hwy
mpg %>% 
   select(model:hwy) %>% 
   names()

In [11]:
# Can combine different selection methods
# select all columns inclusively between displ and fl except trans 
# followed by manufacturer
mpg %>% 
   select(displ:fl, -trans, manufacturer) %>% 
   names()

In [28]:
# rename columns while selecting them
# <new name> = <column name>
# select displ and rename it displacement
#?rename

mpg %>% rename(displacement = displ) %>% names()


In [None]:
# select all the columns and rename some
# rename() is simpler when renaming columns and selecting all columns
# rename displ to displacement yet include all columns
mpg %>% 
   ___(___ = ___) %>% 
   ___()

# Notice that rename doesn't change the order 
# of the columns or drop any columns

In [None]:
# Display help for starts_with()
___starts_with

# Notice all select_helpers are listed

In [None]:
# select colums that start with a text string
# starts_with()
# select all columns that start with "m"
mpg %>% 
   ___(___("m")) %>% 
   ___()

In [None]:
# select columns that end with 'y'
# ends_with()
mpg %>% 
   ___(___("y")) %>% 
   ___()

In [None]:
# select columns that contain "an"
# contains()
mpg %>% 
   ___(___("an")) %>% 
   ___()

In [None]:
# select columns that match a text pattern
# matches() which uses a text pattern called regular expressions
# match columns that contain an "a" and then an "s" later in the string
mpg %>% 
   ___(___("a.*s")) %>% 
   ___()

# In regular expressions the '.' means any character
# and the '*' modified the previous pattern character
# adding 0 or more "a.*s" means look for an a 
# followed by 0 or more of any chararcter followed by an s

In [None]:
# Reordering one column to the front
# everything() means all columns not specified
# select all columns with class at the beginning
mpg %>% 
   ___(___, ___()) %>%
   ___()

In [None]:
# Combining it all together
# displ as the first column renamed to displacement
# followed by all columns containing "r" except for year
# include all columns that end with "y"
# include all columns that begin with the letter "f"
mpg %>% 
   ___(___ = ___, ___("r"), ___, ___("y"), ___("f")) %>% 
   ___()
