# Data Frame

Data frame é uma estrutura de dados bidimensional em R. É uma caso especial de uma lista que possui componentes de tamanhos iguais. Cada compontente forma uma coluna.

In [1]:
x = data.frame("id"=1:2, "idade"= c(32, 21)); x

id,idade
1,32
2,21


In [2]:
typeof(x)

In [3]:
class(x)

## Funções do Data Frame

In [4]:
names(x)

In [5]:
ncol(x)

In [6]:
nrow(x)

In [7]:
length(x)

## Criando um Data Frame

In [8]:
x = data.frame("id" = 1:2, "idade" = c(21,15), "nome" = c("José Lima","Dória Silva")); str(x)

'data.frame':	2 obs. of  3 variables:
 $ id   : int  1 2
 $ idade: num  21 15
 $ nome : Factor w/ 2 levels "Dória Silva",..: 2 1


In [9]:
x = data.frame("id" = 1:2, "idade" = c(21,15), "nome" = c("José Lima","Dória Silva"), 
               stringsAsFactors=FALSE); str(x)

'data.frame':	2 obs. of  3 variables:
 $ id   : int  1 2
 $ idade: num  21 15
 $ nome : chr  "José Lima" "Dória Silva"


## Lidando com valores nulos

In [10]:
x = data.frame("id" = 1:2, "idade" = c(21, NA), "nome" = c("José Lima","Dória Silva")); str(x)

'data.frame':	2 obs. of  3 variables:
 $ id   : int  1 2
 $ idade: num  21 NA
 $ nome : Factor w/ 2 levels "Dória Silva",..: 2 1


In [11]:
mean(x[,'idade'])

In [12]:
mean(x[!is.na(x[,'idade']),'idade'])

In [14]:
x[is.na(x[,'idade']),'idade'] <- 30

In [13]:
x

id,idade,nome
1,21.0,José Lima
2,,Dória Silva


### Lendo Data Frame de um Arquivo CSV

In [16]:
df = read.csv("../dados/iris-dataset.csv", header=FALSE)

In [18]:
head(df)

V1,V2,V3,V4,V5
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa


In [19]:
tail(df)

Unnamed: 0,V1,V2,V3,V4,V5
145,6.7,3.3,5.7,2.5,Iris-virginica
146,6.7,3.0,5.2,2.3,Iris-virginica
147,6.3,2.5,5.0,1.9,Iris-virginica
148,6.5,3.0,5.2,2.0,Iris-virginica
149,6.2,3.4,5.4,2.3,Iris-virginica
150,5.9,3.0,5.1,1.8,Iris-virginica


In [20]:
# python era df.describe()
summary(df)

       V1              V2              V3              V4       
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.054   Mean   :3.759   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
               V5    
 Iris-setosa    :50  
 Iris-versicolor:50  
 Iris-virginica :50  
                     
                     
                     

### Alterando o nome das colunas

In [21]:
colnames(df) = c("sepal_length", "sepal_width", "petal_length", "sepal_width", "species")
head(df)

sepal_length,sepal_width,petal_length,sepal_width.1,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa


### Indexação

#### Acessando como uma lista

In [22]:
head(df["sepal_length"])
typeof(df["sepal_length"])

sepal_length
5.1
4.9
4.7
4.6
5.0
5.4


In [23]:
df$sepal_length
typeof(df$sepal_length)

In [24]:
df[["sepal_length"]]
typeof(df[["sepal_length"]])

#### Acessando como uma Matriz

In [25]:
head(df, n = 10)

sepal_length,sepal_width,petal_length,sepal_width.1,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa


In [26]:
df[1:5,]

sepal_length,sepal_width,petal_length,sepal_width.1,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


In [27]:
df[df$sepal_length < 5.0,]

Unnamed: 0,sepal_length,sepal_width,petal_length,sepal_width.1,species
2,4.9,3.0,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
4,4.6,3.1,1.5,0.2,Iris-setosa
7,4.6,3.4,1.4,0.3,Iris-setosa
9,4.4,2.9,1.4,0.2,Iris-setosa
10,4.9,3.1,1.5,0.1,Iris-setosa
12,4.8,3.4,1.6,0.2,Iris-setosa
13,4.8,3.0,1.4,0.1,Iris-setosa
14,4.3,3.0,1.1,0.1,Iris-setosa
23,4.6,3.6,1.0,0.2,Iris-setosa


In [28]:
df[1:3, 2]

In [29]:
df[1:3, 2, drop = FALSE]

sepal_width
3.5
3.0
3.2


In [31]:
head(df, n = 1)
df[1, "petal_length"] = 2
head(df, n = 1)

sepal_length,sepal_width,petal_length,sepal_width.1,species
5.1,3.5,2,0.2,Iris-setosa


sepal_length,sepal_width,petal_length,sepal_width.1,species
5.1,3.5,2,0.2,Iris-setosa


### Adicionando componentes

In [32]:
rbind(list(5.06, 3.6, 1.0, 0.25, "Iris-setosa"),df)
head(df)

Unnamed: 0,sepal_length,sepal_width,petal_length,sepal_width.1,species
2,5.06,3.6,1.0,0.25,Iris-setosa
210,5.10,3.5,2.0,0.20,Iris-setosa
3,4.90,3.0,1.4,0.20,Iris-setosa
4,4.70,3.2,1.3,0.20,Iris-setosa
5,4.60,3.1,1.5,0.20,Iris-setosa
6,5.00,3.6,1.4,0.20,Iris-setosa
7,5.40,3.9,1.7,0.40,Iris-setosa
8,4.60,3.4,1.4,0.30,Iris-setosa
9,5.00,3.4,1.5,0.20,Iris-setosa
10,4.40,2.9,1.4,0.20,Iris-setosa


sepal_length,sepal_width,petal_length,sepal_width.1,species
5.1,3.5,2.0,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa


In [33]:
df$novo <- "a"
head(df)

sepal_length,sepal_width,petal_length,sepal_width.1,species,novo
5.1,3.5,2.0,0.2,Iris-setosa,a
4.9,3.0,1.4,0.2,Iris-setosa,a
4.7,3.2,1.3,0.2,Iris-setosa,a
4.6,3.1,1.5,0.2,Iris-setosa,a
5.0,3.6,1.4,0.2,Iris-setosa,a
5.4,3.9,1.7,0.4,Iris-setosa,a


#### Removendo componetentes

In [34]:
df$novo = NULL
tail(df)

Unnamed: 0,sepal_length,sepal_width,petal_length,sepal_width.1,species
145,6.7,3.3,5.7,2.5,Iris-virginica
146,6.7,3.0,5.2,2.3,Iris-virginica
147,6.3,2.5,5.0,1.9,Iris-virginica
148,6.5,3.0,5.2,2.0,Iris-virginica
149,6.2,3.4,5.4,2.3,Iris-virginica
150,5.9,3.0,5.1,1.8,Iris-virginica
