---

author: Юрій Клебан

---

# Create new variables with **`mutate()`**

Before start load packages

In [3]:
library(dplyr) # for demos
#install.packages("gapminder")
library(gapminder)  # load package and dataset

`mutate(.data, …)` compute new column(s). 
Lets compute new column for `gapminder`

$gdpTotal = gdpPercap * pop / 1000000$.


In [4]:
gapminder |> 
    mutate(gdpTotal = gdpPercap * pop) |>
    head(10)

country,continent,year,lifeExp,pop,gdpPercap,gdpTotal
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453,6567086330
Afghanistan,Asia,1957,30.332,9240934,820.853,7585448670
Afghanistan,Asia,1962,31.997,10267083,853.1007,8758855797
Afghanistan,Asia,1967,34.02,11537966,836.1971,9648014150
Afghanistan,Asia,1972,36.088,13079460,739.9811,9678553274
Afghanistan,Asia,1977,38.438,14880372,786.1134,11697659231
Afghanistan,Asia,1982,39.854,12881816,978.0114,12598563401
Afghanistan,Asia,1987,40.822,13867957,852.3959,11820990309
Afghanistan,Asia,1992,41.674,16317921,649.3414,10595901589
Afghanistan,Asia,1997,41.763,22227415,635.3414,14121995875


`transmute(.data, …)` compute new column(s), drop others.

In [5]:
gapminder |>
    transmute(gdpTotal = gdpPercap * pop) |>
    head(10)

gdpTotal
<dbl>
6567086330
7585448670
8758855797
9648014150
9678553274
11697659231
12598563401
11820990309
10595901589
14121995875


You can `mutate` many columns at once:

In [6]:
gapminder |>
    mutate(gdpTotal = gdpPercap * pop,
           countryUpper = toupper(country), # uppercase country
           lifeExpRounded = round(lifeExp)) |>
    head(10)

country,continent,year,lifeExp,pop,gdpPercap,gdpTotal,countryUpper,lifeExpRounded
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>,<dbl>,<chr>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453,6567086330,AFGHANISTAN,29
Afghanistan,Asia,1957,30.332,9240934,820.853,7585448670,AFGHANISTAN,30
Afghanistan,Asia,1962,31.997,10267083,853.1007,8758855797,AFGHANISTAN,32
Afghanistan,Asia,1967,34.02,11537966,836.1971,9648014150,AFGHANISTAN,34
Afghanistan,Asia,1972,36.088,13079460,739.9811,9678553274,AFGHANISTAN,36
Afghanistan,Asia,1977,38.438,14880372,786.1134,11697659231,AFGHANISTAN,38
Afghanistan,Asia,1982,39.854,12881816,978.0114,12598563401,AFGHANISTAN,40
Afghanistan,Asia,1987,40.822,13867957,852.3959,11820990309,AFGHANISTAN,41
Afghanistan,Asia,1992,41.674,16317921,649.3414,10595901589,AFGHANISTAN,42
Afghanistan,Asia,1997,41.763,22227415,635.3414,14121995875,AFGHANISTAN,42


You also can edit existing column (let's change `continent Europe` to `EU` in dataframe):

In [9]:
data2002 <- gapminder |> filter(year == 2002) 
head(data2002)

data2002 |>
    mutate(continent = as.character(continent), # convert factor -> character 
           continent = ifelse(continent == "Europe", "EU", continent))

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,2002,42.129,25268405,726.7341
Albania,Europe,2002,75.651,3508512,4604.2117
Algeria,Africa,2002,70.994,31287142,5288.0404
Angola,Africa,2002,41.003,10866106,2773.2873
Argentina,Americas,2002,74.34,38331121,8797.6407
Australia,Oceania,2002,80.37,19546792,30687.7547


country,continent,year,lifeExp,pop,gdpPercap
<fct>,<chr>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,2002,42.129,25268405,726.7341
Albania,EU,2002,75.651,3508512,4604.2117
Algeria,Africa,2002,70.994,31287142,5288.0404
Angola,Africa,2002,41.003,10866106,2773.2873
Argentina,Americas,2002,74.340,38331121,8797.6407
Australia,Oceania,2002,80.370,19546792,30687.7547
Austria,EU,2002,78.980,8148312,32417.6077
Bahrain,Asia,2002,74.795,656397,23403.5593
Bangladesh,Asia,2002,62.013,135656790,1136.3904
Belgium,EU,2002,78.320,10311970,30485.8838


---

## Refences

1. [dplyr: A Grammar of Data Manipulation](https://cran.r-project.org/web/packages/dplyr/index.html) on https://cran.r-project.org/.
2. [Data Transformation with splyr::cheat sheet](https://github.com/rstudio/cheatsheets/blob/master/data-transformation.pdf).
3. [DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES)](https://www.listendata.com/2016/08/dplyr-tutorial.html) by Deepanshu Bhalla.
5. [Dplyr Intro](https://stat545.com/dplyr-intro.html) by Stat 545.
6.[R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread)](https://www.guru99.com/r-dplyr-tutorial.html). Introduction to Data Analysis
7. [Loan Default Prediction. Beginners data set for financial analytics Kaggle](https://www.kaggle.com/kmldas/loan-default-prediction)