data-wrangling-exercise-1

Data Wrangling Exercise 1: Basic Data Manipulation (Springboard)

Getting started

The data is in an Excel file here called refine.xlsx. Right away, you'll notice that the data set has a few issues:

There are four brands: Philips, Akzo, Van Houten and Unilever. However, there are many different spellings and capitalizations of those names!
The product code and number are combined in one column, separated by a hyphen.

Exercise

Using R, clean this data set to make it easier to visualize and analyze. Specifically, these are the tasks you need to do:

0: Load the data in RStudio

Save the data set as a CSV file called refine_original.csv and load it in RStudio into a data frame.

1: Clean up brand names

Clean up the 'company' column, so all of the misspellings of the brand names are standardized. For example, you can transform the values in the column to be: philips, akzo, van houten and unilever (all lowercase).

2: Separate product code and number

Separate the product code and product number into separate columns i.e. add two new columns called product_code and product_number, containing the product code and number respectively

3: Add product categories

You learn that the product codes actually represent the following product categories:

p = Smartphone
v = TV
x = Laptop
q = Tablet In order to make the data more readable, add a column with the product category for each record.

4: Add full address for geocoding

You'd like to view the customer information on a map. In order to do that, the addresses need to be in a form that can be easily geocoded. Create a new column full_address that concatenates the three address fields (address, city, country), separated by commas.

5: Create dummy variables for company and product category

Both the company name and product category are categorical variables i.e. they take only a fixed set of values. In order to use them in further analysis you need to create dummy variables. Create dummy binary variables for each of them with the prefix company_ and product_ i.e.

Add four binary (1 or 0) columns for company: company_philips, company_akzo, company_van_houten and company_unilever
Add four binary (1 or 0) columns for product category: product_smartphone, product_tv, product_laptop and product_tablet

6: Submit the project on Github

Include your code, the original data as a CSV file refine_original.csv, and the cleaned up data as a CSV file called refine_clean.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
refine.R		refine.R
refine_clean.csv		refine_clean.csv
refine_original.csv		refine_original.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-wrangling-exercise-1

Getting started

Exercise

0: Load the data in RStudio

1: Clean up brand names

2: Separate product code and number

3: Add product categories

4: Add full address for geocoding

5: Create dummy variables for company and product category

6: Submit the project on Github

About

Releases

Packages

Languages

sirotaya/data-wrangling-exercise-1

Folders and files

Latest commit

History

Repository files navigation

data-wrangling-exercise-1

Getting started

Exercise

0: Load the data in RStudio

1: Clean up brand names

2: Separate product code and number

3: Add product categories

4: Add full address for geocoding

5: Create dummy variables for company and product category

6: Submit the project on Github

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages