### Dataframes

<p>Dataframes are one of the most commonly used structures in R. Below are the characteristis of a dataframe.
    <li>Has rows and cols.</li>
    <li>Has nrow, ncol, and dim properties.</li>
    <li>Every col must be the same data type (within that col).</li>
    <li>You can index similar to matrix.</li> 
</p>

<p>To create, use data.frame()
    <li>To add row names: use row.names = ('col1', 'col2')</li>
    <li>head(df_name, n) - returns top n rows</li>
    <li>col.names() - returns col names</li>
    <li>tail(df_name, n) - returns bottom n rows</li>
    <li>str() - returns structure of df</li>
    <li>ummary() - returns 5 num summary of set, (mean, max, min, median) per numeric col</li>
    
</p>

In [2]:
# Creating a data frame -  note if version is less than 4, you have to use stringsAsFactors = FALSE
# preferred to use col names then arguments as below
countries_data <- data.frame(
  country=c('Portugal', 'France', 'UK'), 
  population = c(10280000, 66990000, 66650000),
  EU = c(TRUE, TRUE, FALSE),
  stringsAsFactors = FALSE)

countries_data

country,population,EU
Portugal,10280000,True
France,66990000,True
UK,66650000,False


In [3]:
# see the structure
str(countries_data)

'data.frame':	3 obs. of  3 variables:
 $ country   : chr  "Portugal" "France" "UK"
 $ population: num  10280000 66990000 66650000
 $ EU        : logi  TRUE TRUE FALSE


In [4]:
# see the class
class(countries_data)

In [5]:
# name the rows of the df with countries names
# df changes to 3 obs with 2 vars.
countries_data <- data.frame(
  population = c(10280000, 66990000, 66650000),
  EU = c(TRUE, TRUE, FALSE),
  row.names = c('Portugal', 'France', 'UK'))

countries_data

Unnamed: 0,population,EU
Portugal,10280000,True
France,66990000,True
UK,66650000,False


### Indexing and Modifying Data Frames

In [6]:
# same as indexing matrices
# 1st col, 1st row
countries_data[1,1]

In [7]:
# 1st row-all
countries_data[1,]

Unnamed: 0,population,EU
Portugal,10280000,True


In [8]:
# can also use row name to index
countries_data['France',]

Unnamed: 0,population,EU
France,66990000,True


In [9]:
# get 1st col, gives all rows and the whole column
countries_data['population']

Unnamed: 0,population
Portugal,10280000
France,66990000
UK,66650000


In [10]:
# change the population of Portugal to 1, assign to var if required.
countries_data['Portugal', 'population']<-1

countries_data

Unnamed: 0,population,EU
Portugal,1,True
France,66990000,True
UK,66650000,False


### Expanding Data Frames

In [13]:
# re-run countries data with no row names
countries_data <- data.frame(
  country=c('Portugal', 'France', 'UK'), 
  population = c(10280000, 66990000, 66650000),
  EU = c(TRUE, TRUE, FALSE),
  stringsAsFactors = FALSE)

# add another country to countries DF - Spain
spain_data <- data.frame(
  country = c('Spain'),
  population = c(46754778),
  EU = c(TRUE),
  stringsAsFactors = FALSE
)

# to append the Spain data to countries data, use rbind()
rbind(countries_data, spain_data)

countries_data<-rbind(countries_data, spain_data)

country,population,EU
Portugal,10280000,True
France,66990000,True
UK,66650000,False
Spain,46754778,True


In [14]:
# add a new column with capitals of each country
# create capitals variable as a vector
capitals<-c('Lisbon', 'Paris', 'London', 'Madrid')

# use cbind to add as a column
cbind(countries_data, capitals)

# add to the data frame permanently
countries_data<-cbind(countries_data, capitals, stringsAsFactors = FALSE)

country,population,EU,capitals
Portugal,10280000,True,Lisbon
France,66990000,True,Paris
UK,66650000,False,London
Spain,46754778,True,Madrid


### Removing Elements from Data Frames

In [15]:
# to remove rows, put - before the index
# remove Spain, to remove perm, re-write the object
countries_data[-4,]

# remove EU column from DF
# assign to a NULL
# NO NEED TO RE-WRITE THE OBJ
countries_data[,'EU']<-NULL

country,population,EU,capitals
Portugal,10280000,True,Lisbon
France,66990000,True,Paris
UK,66650000,False,London
