# How to manipulate dataframe in R

Here we give a little tutorial to illustrate how to use dataframe in R. We will use the R-built in dataframe ```mtcars```

we will first make a copy of the dataset as we are going to make some modification later

In [24]:
df = mtcars

## Take a look of the dataframe

The simpliest way to have a look of the first few rows of the dataset is to use the function [```head```](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/head.html)

In [25]:
head(df) 

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


We can look at the last few rows by the function [```tail```](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/head.html)

In [26]:
tail(df)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


### Get and set rownames / colnames
The left most shows the rownames of the df and the top shows the colnames. We can retreive the rownames simply by using the function [```rownames```](https://stat.ethz.ch/R-manual/R-devel/library/base/html/colnames.html)!

In [27]:
rownames(df)

If for example we want to set the first rownames from "Mazda RX4" to "Mazda RX5", we can set the rownames simply by:

In [28]:
rownames(df)[1] = "Mazda RX5"
head(df)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX5,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


Remember R indexes starts from 1!

Similarly we can get the colnames and change them by using the function [```colnames```](https://stat.ethz.ch/R-manual/R-devel/library/base/html/colnames.html)

In [29]:
print(colnames(df))
colnames(df) = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k") # c(...) is a vector of different objects 
head(df)

 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"


Unnamed: 0,a,b,c,d,e,f,g,h,i,j,k
Mazda RX5,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


Let's change the colnames to the original colnames before we continue...

In [30]:
colnames(df) = colnames(mtcars)
head(df)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX5,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


## Accessing one data point / one column / row of data

Think the dataframe as a grid and each data point can be located using (i,j) where i is the ith row and j is the jth column. Again remember R indexes start from 1!


### Accessing individual data point
To access the value "cyl" (2nd column) of "Mazda RX5" (1st row), we can simply:

In [31]:
df[1,2]

If you want also the "disp" (3rd column) and "hp" (4th column) as well, we can assess the 3 values by:

In [32]:
df[1,2:4] # this means geting the value from 1 st row and 2 *to* 4th column

Unnamed: 0,cyl,disp,hp
Mazda RX5,6,160,110


Sometimes you may prefer specifying the rows and columns by name instead of indexes:

In [33]:
df["Mazda RX5",c("cyl", "disp", "hp")]

Unnamed: 0,cyl,disp,hp
Mazda RX5,6,160,110


As you can see, calling by name will give you exactly the same result. Whether calling by name or by index is your personal choice and depends on situation. For example, calling by name is easier to read, but if we change the rownames or colnames then our code will not run (e.g. we change back the first row name to "Mazda RX4"). 

### Accessing one column/ one row
If we want to access the whole row of "Mazda RX4 Wag" data (2nd row), we can simply:

In [34]:
df[2,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4


Note that leaving the column index means that we want all columns, so you may guess how you can access a column by:

In [35]:
df[,3]

which gives you the whole column 3.

And you should be able to guess what the following lines are doing:

In [38]:
df[2:4,]
df[c("Mazda RX4 Wag","Datsun","Hornet 4 Drive"), ]
head(df[,1:3])
head(df[,c("mpg","cyl","disp")])

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1


Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1


Unnamed: 0,mpg,cyl,disp
Mazda RX5,21.0,6,160
Mazda RX4 Wag,21.0,6,160
Datsun 710,22.8,4,108
Hornet 4 Drive,21.4,6,258
Hornet Sportabout,18.7,8,360
Valiant,18.1,6,225


Unnamed: 0,mpg,cyl,disp
Mazda RX5,21.0,6,160
Mazda RX4 Wag,21.0,6,160
Datsun 710,22.8,4,108
Hornet 4 Drive,21.4,6,258
Hornet Sportabout,18.7,8,360
Valiant,18.1,6,225


There is another way to call a column of df:

In [40]:
df$mpg

## Accessing row/column with special conditions

What if I want all the information of cars with names starting with "Mazda"? How can we proceed?

From the previous section, we would know how to select them **if** we know the indexes for all the cars with names starting with "Mazda". Of course you can do it manually...

In [41]:
df[1:2,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX5,21,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4


But there is a cleverer way to find the indexes by the function [```which```](https://stat.ethz.ch/R-manual/R-devel/library/base/html/which.html)

In [46]:
first.5.letters.of.car.names = substring(rownames(df), 1,5) 
print(first.5.letters.of.car.names)
i = which(first.5.letters.of.car.names == "Mazda")

 [1] "Mazda" "Mazda" "Datsu" "Horne" "Horne" "Valia" "Duste" "Merc " "Merc "
[10] "Merc " "Merc " "Merc " "Merc " "Merc " "Cadil" "Linco" "Chrys" "Fiat "
[19] "Honda" "Toyot" "Toyot" "Dodge" "AMC J" "Camar" "Ponti" "Fiat " "Porsc"
[28] "Lotus" "Ford " "Ferra" "Maser" "Volvo"


In [47]:
df[i,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX5,21,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4


which gives you the same result!