<a href="https://cognitiveclass.ai/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkRP0321ENSkillsNetwork25371262-2022-01-01">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/M1_R_Basics/images/IDSNlogo.png" width="200" align="center">
</a>


<h1>Data Wrangling with Regular Expressions</h1>

Estimated time needed: **40** minutes


## Lab Overview:

In the previous data collection labs, you collected some raw datasets from several different sources. In this lab, you need to perform data wrangling tasks in order to improve data quality.


You will again use regular expressions, along with the `stringr` package (part of `tidyverse`), to clean up the bike-sharing systems data that you previously web scraped from the wiki page:

[https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems](https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkRP0321ENSkillsNetwork25371262-2022-01-01)

<a href="https://cognitiveclass.ai/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkRP0321ENSkillsNetwork25371262-2022-01-01">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/module_1/images/l2-list-bike-sharing-systems.png" width="800" align="center">
</a>


One typical challenge of web scraping is that data extracted from HTML pages may contain unnecessary or inconsistently fomatted information.\
For example:

*   Textual annotations in numeric fields: `1000 (Updated with 1050)`
*   Attached reference links: `Bike sharing system [123]`
*   Inconsistent data formats: `Yes` and `Y` for the logical value `TRUE` or `2021-04-09` and `Apr 09, 2021` for the same date
*   HTML style tags: `<span style="color:blue">Bike sharing system</span>`
*   Special characters: `&nbsp` for a white space

Many more such examples of noise may be encountered in real-world scraped data and most of such text related noises could be handled by regular expressions.


To summarize, you will be using `stringr` (part of `tidyverse`) and regular expressions to perform the following data wrangling tasks:

*   TASK: Standardize column names for all collected datasets
*   TASK: Remove undesired reference links from the scraped bike-sharing systems dataset
*   TASK: Extract only the numeric value from undesired text annotations


Let's begin by importing the libraries you will use for these data wrangling tasks.


In [None]:
# Check whether you need to install the `tidyverse` library
require("tidyverse")
library(tidyverse)

Loading required package: tidyverse

“running command 'timedatectl' had status 1”
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## TASK: Standardize column names for all collected datasets


In the previous data collection labs, you collected four datasets in csv format:

*   `raw_bike_sharing_systems.csv`:  A list of active bike-sharing systems across the world
*   `raw_cities_weather_forecast.csv`: 5-day weather forecasts for a list of cities, from OpenWeather API
*   `raw_worldcities.csv`: A list of major cities' info (such as name, latitude and longitude) across the world
*   `raw_seoul_bike_sharing.csv`: Weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour, and date information, from Seoul bike-sharing systems


*Optional:* If you had some difficulties finishing the data collection labs, you may download the datasets directly from the following URLs:


In [None]:
# Download raw_bike_sharing_systems.csv
url <- "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/datasets/raw_bike_sharing_systems.csv"
download.file(url, destfile = "raw_bike_sharing_systems.csv")

# Download raw_cities_weather_forecast.csv
url <- "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/datasets/raw_cities_weather_forecast.csv"
download.file(url, destfile = "raw_cities_weather_forecast.csv")

# Download raw_worldcities.csv
url <- "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/datasets/raw_worldcities.csv"
download.file(url, destfile = "raw_worldcities.csv")

# Download raw_seoul_bike_sharing.csv
url <- "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/datasets/raw_seoul_bike_sharing.csv"
download.file(url, destfile = "raw_seoul_bike_sharing.csv")

To improve dataset readbility by both human and computer systems, we first need to standardize the column names of the datasets above using the following naming convention:

*   Column names need to be UPPERCASE
*   The word separator needs to be an underscore, such as in `COLUMN_NAME`


You can use the following dataset list and the `names()` function to get and set each of their column names, and convert them according to our defined naming convention.


In [None]:
dataset_list <- c('raw_bike_sharing_systems.csv', 'raw_seoul_bike_sharing.csv', 'raw_cities_weather_forecast.csv', 'raw_worldcities.csv')


*TODO*: Write a `for` loop to iterate over the above datasets and convert their column names


In [None]:
for (bikesharing_dataset in dataset_list){
    # Read dataset
    dataset <- read_csv("raw_bike_sharing_systems.csv")
    # Standardized its columns:
    bikesharing_dataset <- as.data.frame(dataset)
    #Convert all column names to uppercase
      names(bikesharing_dataset)<- toupper(names(bikesharing_dataset))
   # Replace any white space separators by underscores, using the str_replace_all function
      str_replace_all(bikesharing_dataset, " ", "_")
    # Save the dataset 
    write.csv(bikesharing_dataset, "raw_bike_sharing_systems.csv", row.names=FALSE)    
}


[1mRows: [22m[34m480[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (10): COUNTRY, CITY, NAME, SYSTEM, OPERATOR, LAUNCHED, DISCONTINUED, STA...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
“argument is not an atomic vector; coercing”
[1mRows: [22m[34m480[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (10): COUNTRY, CITY, NAME, SYSTEM, OPERATOR, LAUNCHED, DISCONTINUED, STA...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
“argument is not an atomic vector; co

In [None]:
for (bike_dataset in dataset_list){
    # Read dataset
    dataset_1 <- read_csv("raw_seoul_bike_sharing.csv")
    # Standardized its columns:
    bike_dataset <- as.data.frame(dataset_1)
    #Convert all column names to uppercase
      names(bike_dataset)<- toupper(names(bike_dataset))
   # Replace any white space separators by underscores, using the str_replace_all function
      str_replace_all(bike_dataset, " ", "_")
    # Save the dataset 
    write.csv(bike_dataset, "raw_seoul_bike_sharing.csv", row.names=FALSE)    
}


[1mRows: [22m[34m8760[39m [1mColumns: [22m[34m14[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (4): Date, SEASONS, HOLIDAY, FUNCTIONING_DAY
[32mdbl[39m (10): RENTED_BIKE_COUNT, Hour, TEMPERATURE, HUMIDITY, WIND_SPEED, Visibi...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
“argument is not an atomic vector; coercing”
[1mRows: [22m[34m8760[39m [1mColumns: [22m[34m14[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (4): DATE, SEASONS, HOLIDAY, FUNCTIONING_DAY
[32mdbl[39m (10): RENTED_BIKE_COUNT, HOUR, TEMPERATURE, HUMIDITY, WIND_SPEED, VISIBI...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ

In [None]:
for ( weather_dataset in dataset_list){
    # Read dataset
    dataset_2 <- read_csv("raw_cities_weather_forecast.csv")
    # Standardized its columns:
    weather_dataset <- as.data.frame(dataset_2)
    #Convert all column names to uppercase
      names(weather_dataset)<- toupper(names(weather_dataset))
   # Replace any white space separators by underscores, using the str_replace_all function
      str_replace_all(weather_dataset, " ", "_")
    # Save the dataset 
    write.csv(weather_dataset, "raw_cities_weather_forecast.csv", row.names=FALSE)    
}


[1mRows: [22m[34m160[39m [1mColumns: [22m[34m12[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (3): city, weather, season
[32mdbl[39m  (8): visibility, temp, temp_min, temp_max, pressure, humidity, wind_spe...
[34mdttm[39m (1): forecast_datetime

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
“argument is not an atomic vector; coercing”
[1mRows: [22m[34m160[39m [1mColumns: [22m[34m12[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (3): CITY, WEATHER, SEASON
[32mdbl[39m  (8): VISIBILITY, TEMP, TEMP_MIN, TEMP_MAX, PRESSURE, HUMIDITY, WIND_SPE...
[34mdttm[39m (1): FORECAST_DATETIME

[36mℹ[39m Use `spec()` to retrieve the full colum

In [None]:
for ( worldcities_dataset in dataset_list){
    # Read dataset
    dataset_3 <- read_csv("raw_worldcities.csv")
    # Standardized its columns:
    worldcities_dataset <- as.data.frame(dataset_3)
    #Convert all column names to uppercase
      names(worldcities_dataset)<- toupper(names(worldcities_dataset))
   # Replace any white space separators by underscores, using the str_replace_all function
      str_replace_all(worldcities_dataset, " ", "_")
    # Save the dataset 
    write.csv(worldcities_dataset, "raw_worldcities.csv", row.names=FALSE)    
}

[1mRows: [22m[34m26569[39m [1mColumns: [22m[34m11[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (7): CITY, CITY_ASCII, COUNTRY, ISO2, ISO3, ADMIN_NAME, CAPITAL
[32mdbl[39m (4): LAT, LNG, POPULATION, ID

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
“argument is not an atomic vector; coercing”
[1mRows: [22m[34m26569[39m [1mColumns: [22m[34m11[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (7): CITY, CITY_ASCII, COUNTRY, ISO2, ISO3, ADMIN_NAME, CAPITAL
[32mdbl[39m (4): LAT, LNG, POPULATION, ID

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types =

*TODO*: Read the resulting datasets back and check whether their column names follow the naming convention


In [None]:

    # Print a summary for each data set to check whether the column names were correctly converted
    summary(bikesharing_dataset)

    summary(bike_dataset)

    summary (weather_dataset)
    
    summary(worldcities_dataset)

   COUNTRY              CITY               NAME              SYSTEM         
 Length:480         Length:480         Length:480         Length:480        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   OPERATOR           LAUNCHED         DISCONTINUED         STATIONS        
 Length:480         Length:480         Length:480         Length:480        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   BICYCLES         DAILY_RIDERSHIP   
 Length:480         Length:480        
 Class :character   Class :character  
 Mode  :character   Mode  :character  

     DATE           RENTED_BIKE_COUNT      HOUR        TEMPERATURE    
 Length:8760        Min.   :   2.0    Min.   : 0.00   Min.   :-17.80  
 Class :character   1st Qu.: 214.0    1st Qu.: 5.75   1st Qu.:  3.40  
 Mode  :character   Median : 542.0    Median :11.50   Median : 13.70  
                    Mean   : 729.2    Mean   :11.50   Mean   : 12.87  
                    3rd Qu.:1084.0    3rd Qu.:17.25   3rd Qu.: 22.50  
                    Max.   :3556.0    Max.   :23.00   Max.   : 39.40  
                    NA's   :295                       NA's   :11      
    HUMIDITY       WIND_SPEED      VISIBILITY   DEW_POINT_TEMPERATURE
 Min.   : 0.00   Min.   :0.000   Min.   :  27   Min.   :-30.600      
 1st Qu.:42.00   1st Qu.:0.900   1st Qu.: 940   1st Qu.: -4.700      
 Median :57.00   Median :1.500   Median :1698   Median :  5.100      
 Mean   :58.23   Mean   :1.725   Mean   :1437   Mean   :  4.074      
 3rd Qu.:74.00   3rd Qu.:2.300   3rd Qu.:2000   3rd Qu.: 14.800      
 Max.   :98.

     CITY             WEATHER            VISIBILITY         TEMP       
 Length:160         Length:160         Min.   : 9940   Min.   : 3.810  
 Class :character   Class :character   1st Qu.:10000   1st Qu.: 9.748  
 Mode  :character   Mode  :character   Median :10000   Median :12.765  
                                       Mean   :10000   Mean   :13.505  
                                       3rd Qu.:10000   3rd Qu.:16.747  
                                       Max.   :10000   Max.   :26.130  
    TEMP_MIN         TEMP_MAX         PRESSURE       HUMIDITY    
 Min.   : 3.810   Min.   : 3.810   Min.   : 954   Min.   :17.00  
 1st Qu.: 9.725   1st Qu.: 9.953   1st Qu.:1016   1st Qu.:37.00  
 Median :12.765   Median :12.765   Median :1019   Median :49.00  
 Mean   :13.444   Mean   :13.539   Mean   :1018   Mean   :49.92  
 3rd Qu.:16.635   3rd Qu.:16.977   3rd Qu.:1022   3rd Qu.:62.00  
 Max.   :26.130   Max.   :26.130   Max.   :1028   Max.   :94.00  
   WIND_SPEED       WIND_DEG      

     CITY            CITY_ASCII             LAT              LNG           
 Length:26569       Length:26569       Min.   :-54.93   Min.   :-179.5900  
 Class :character   Class :character   1st Qu.: 27.92   1st Qu.: -78.7794  
 Mode  :character   Mode  :character   Median : 40.22   Median :  -0.7689  
                                       Mean   : 33.10   Mean   : -11.3639  
                                       3rd Qu.: 47.99   3rd Qu.:  29.6833  
                                       Max.   : 81.72   Max.   : 179.3667  
                                                                           
   COUNTRY              ISO2               ISO3            ADMIN_NAME       
 Length:26569       Length:26569       Length:26569       Length:26569      
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
       

## Process the web-scraped bike sharing system dataset


By now we have standardized all column names. Next, we will focus on cleaning up the values in the web-scraped bike sharing systems dataset.


In [None]:
# First load the dataset
bike_sharing_df <- read_csv("raw_bike_sharing_systems.csv")

ERROR: ignored

In [None]:
# Print its head
head(bike_sharing_df)

Even from the first few rows, you can see there is plenty of undesireable embedded textual content, such as the reference link included in `Melbourne[12]`.


In this project, let's only focus on processing the following revelant columns (feel free to process the other columns for more practice):

*   `COUNTRY`: Country name
*   `CITY`: City name
*   `SYSTEM`: Bike-sharing system name
*   `BICYCLES`: Total number of bikes in the system


In [None]:
# Select the four columns
sub_bike_sharing_df <- bike_sharing_df %>% select(COUNTRY, CITY, SYSTEM, BICYCLES)

Let's see the types of the selected columns


In [None]:
sub_bike_sharing_df %>% 
    summarize_all(class) %>%
    gather(variable, class)

They are all interpreted as character columns, but we expect the `BICYCLES` column to be of numeric type. Let's see why it wasn't loaded as a numeric column - possibly some entries contain characters. Let's create a simple function called `find_character` to check that.


In [None]:
# grepl searches a string for non-digital characters, and returns TRUE or FALSE
# if it finds any non-digital characters, then the bicyle column is not purely numeric
find_character <- function(strings) grepl("[^0-9]", strings)

Let's try to find any elements in the `Bicycles` column containing non-numeric characters.


In [None]:
sub_bike_sharing_df %>% 
    select(BICYCLES) %>% 
    filter(find_character(BICYCLES)) %>%
    slice(0:10)

As you can see, many rows have non-numeric characters, such as `32 (including 6 rollers) [162]` and `1000[253]`. This is actually very common for a table scraped from Wiki when no input validation is enforced.

Later, you will use regular expressions to clean them up.


Next, let's take a look at the other columns, namely `COUNTRY`, `CITY`, and `SYSTEM`, to see if they contain any undesired reference links, such as in `Melbourne[12]`.


In [None]:
# Define a 'reference link' character class, 
# `[A-z0-9]` means at least one character 
# `\\[` and `\\]` means the character is wrapped by [], such as for [12] or [abc]
ref_pattern <- "\\[[A-z0-9]+\\]"
find_reference_pattern <- function(strings) grepl(ref_pattern, strings)

In [None]:
# Check whether the COUNTRY column has any reference links
sub_bike_sharing_df %>% 
    select(COUNTRY) %>% 
    filter(find_reference_pattern(COUNTRY)) %>%
    slice(0:10)

Ok, looks like the `COUNTRY` column is clean. Let's check the `CITY` column.


In [None]:
# Check whether the CITY column has any reference links
sub_bike_sharing_df %>% 
    select(CITY) %>% 
    filter(find_reference_pattern(CITY)) %>%
    slice(0:10)

Hmm, looks like the `CITY` column has some reference links to be removed. Next, let's check the `SYSTEM` column.


In [None]:
# Check whether the System column has any reference links
sub_bike_sharing_df %>% 
    select(SYSTEM) %>% 
    filter(find_reference_pattern(SYSTEM)) %>%
    slice(0:10)

So the `SYSTEM` column also has some reference links.


After some preliminary investigations, we identified that the `CITY` and `SYSTEM` columns have some undesired reference links, and the `BICYCLES` column has both reference links and some
textual annotations.

Next, you need to use regular expressions to clean up the unexpected reference links and text annotations in numeric values.


# TASK: Remove undesired reference links using regular expressions


*TODO:* Write a custom function using `stringr::str_replace_all` to replace all reference links with an empty character for columns `CITY` and `SYSTEM`


In [None]:
# remove reference link
remove_ref <- function(column) {
    ref_pattern <- "\\[[\\w]+\\"
    # Replace all matched substrings with a white space using str_replace_all()
    result <-  str_replace_all(column, ref_pattern, " ")
    # Trim the reslt if you want
    result<- trimws(result)
    # return(result)
    return(result)
}
remove_ref

*TODO:* Use the `dplyr::mutate()` function to apply the `remove_ref` function to the `CITY` and `SYSTEM` columns


In [None]:
# sub_bike_sharing_df %>% mutate(column1=remove_ref(column1), ... )
newdata <- bikesharing_dataset %>% 
  mutate(CITY = remove_ref(CITY),
         SYSTEM = remove_ref(SYSTEM))

 newdata


ERROR: ignored

*TODO:* Use the following code to check whether all reference links are removed:


In [None]:
newdata %>% 
    select(CITY, SYSTEM, BICYCLES) %>% 
    filter(find_reference_pattern(CITY) | find_reference_pattern(SYSTEM) | find_reference_pattern(BICYCLES))

ERROR: ignored

# TASK: Extract the numeric value using regular expressions


*TODO:* Write a custom function using `stringr::str_extract` to extract the first digital substring match and convert it into numeric type For example, extract the value '32' from `32 (including 6 rollers) [162]`.


In [None]:
# Extract the first number
extract_num <- function(columns){
    # Define a digital pattern
    digitals_pattern <- "[0-9]+"
    # Find the first match using str_extract
    result<- str_extract(columns, digitals_pattern)
    # Convert the result to numeric using the as.numeric() function
    num_result <- as.numeric (result)
}
extract_num

*TODO:* Use the `dplyr::mutate()` function to apply `extract_num` on the `BICYCLES` column


In [None]:
# Use the mutate() function on the BICYCLES column
changes <- bikesharing_dataset %>% mutate(BICYCLES = extract_num(BICYCLES))
changes

COUNTRY,CITY,NAME,SYSTEM,OPERATOR,LAUNCHED,DISCONTINUED,STATIONS,BICYCLES,DAILY_RIDERSHIP
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>
Albania,Tirana,Ecovolis,,,March 2011,,8,200,
Argentina,Mendoza,Metrobici,,,2014,,2,40,
Argentina,"San Lorenzo, Santa Fe",Biciudad,Biciudad,,27 November 2016,,8,80,
Argentina,Buenos Aires,Ecobici,Serttel Brasil,Bike In Baires Consortium.[10],2010,,400,4000,21917
Argentina,Rosario,Mi Bici Tu Bici[11],,,2 December 2015,,47,480,
Australia,Melbourne[12],Melbourne Bike Share,PBSC & 8D,Motivate,June 2010,30 November 2019[13],53,676,
Australia,Brisbane[14][15],CityCycle,3 Gen. Cyclocity,JCDecaux,September 2010,,150,2000,
Australia,Melbourne,oBike,4 Gen. oBike,,July 2017,July 2018,dockless,1250,
Australia,Sydney,oBike,4 Gen. oBike,,July 2017,July 2018,dockless,1250,
Australia,Sydney,Ofo,4 Gen. Ofo,,October 2017,,dockless,600,


*TODO:* Use the summary function to check the descriptive statistics of the numeric `BICYCLES` column


In [None]:
summary(bikesharing_dataset$BICYCLES)

   Length     Class      Mode 
      480 character character 

*TODO:* Write the cleaned bike-sharing systems dataset into a csv file called `bike_sharing_systems.csv`


In [None]:
# Write dataset to `bike_sharing_systems.csv`


# References:


If you need to refresh your memory about regular expressions, please refer to this good Regular Expression cheat sheet:

<a href="https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkRP0321ENSkillsNetwork25371262-2022-01-01" target="_blank">Basic Regular Expressions in R</a>


# Next Steps


Great! Now you have cleaned up the bike-sharing system dataset using regular expressions. Next, you will use other `tidyverse` functions to perform data wrangling on the bike-sharing demand dataset.


## Authors

<a href="https://www.linkedin.com/in/yan-luo-96288783/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkRP0321ENSkillsNetwork25371262-2022-01-01" target="_blank">Yan Luo</a>


### Other Contributors

Jeff Grossman


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By | Change Description      |
| ----------------- | ------- | ---------- | ----------------------- |
| 2021-04-08        | 1.0     | Yan        | Initial version created |
|                   |         |            |                         |
|                   |         |            |                         |

## <h3 align="center"> © IBM Corporation 2021. All rights reserved. <h3/>
