## Begining in R language

## Lesson 1: knowing the data

In [1]:
ana_silva <- c('Ana Silva', 28, 6230.50, 'Não possui', TRUE)
ana_silva

In [2]:
names(ana_silva) <- c('Nome', 'Idade', 'Salário', 'Telefone', 'Trabalho remoto')
ana_silva

In [3]:
carlos_oliveira <- c('Carlos Oliveira', 35, 7500.75, '(11) 1234-5678', TRUE)

maria_santos <- c('Maria Santos', 40, 8000.25, '(21) 9876-5432', TRUE)

joao_costa <- c('Joao Costa', '32', 2460.80, 'Não possui', FALSE)

fernanda_lima <- c('Fernanda Lima', 27, 4230.35, '(31) 8765-4321', TRUE)


### Creating a matrix

To create a matrix in R, we use a function called matrix() which takes four arguments:

* vector: the data we will use to create the matrix;
* nrow: the number of rows we want in the matrix;
* ncol: the number of columns we want in the matrix;
* byrow (optional): specifies whether we want to create the matrix by rows or columns. If set to TRUE, it will create the matrix by rows. If set to FALSE, it will create the matrix by columns.

In [4]:
colab_combinado <- c(ana_silva, carlos_oliveira, maria_santos, joao_costa, fernanda_lima)

In [5]:
matriz_colab <- matrix(colab_combinado, nrow = 5, byrow = TRUE)
matriz_colab

0,1,2,3,4
Ana Silva,28,6230.5,Não possui,True
Carlos Oliveira,35,7500.75,(11) 1234-5678,True
Maria Santos,40,8000.25,(21) 9876-5432,True
Joao Costa,32,2460.8,Não possui,False
Fernanda Lima,27,4230.35,(31) 8765-4321,True


In [6]:
rownames(matriz_colab) <- c('Colaboradora Ana', 'Colaborador Carlos', 'Colaboradora Maria', 'Colaborador Joao', 'Colaboradora Fernanda')
colnames(matriz_colab) <- c('Nome','Idade','Salário','Telefone','Trabalho remoto')

matriz_colab

Unnamed: 0,Nome,Idade,Salário,Telefone,Trabalho remoto
Colaboradora Ana,Ana Silva,28,6230.5,Não possui,True
Colaborador Carlos,Carlos Oliveira,35,7500.75,(11) 1234-5678,True
Colaboradora Maria,Maria Santos,40,8000.25,(21) 9876-5432,True
Colaborador Joao,Joao Costa,32,2460.8,Não possui,False
Colaboradora Fernanda,Fernanda Lima,27,4230.35,(31) 8765-4321,True


## Lesson 2: manipulating data

In [7]:
# Vector with values of sales for each month
vendas_jan <- c(20, 18, 25, 16, 23)
vendas_fev <- c(15, 20, 22, 18, 19)
vendas_mar <- c(25, 23, 20, 17, 21)
vendas_abr <- c(18, 15, 19, 20, 24)
vendas_mai <- c(22, 25, 21, 15, 18)
vendas_jun <- c(21, 22, 19, 17, 20)

In [8]:
# Names of the salespeople
pessoas <- c("Pedro Santos", "Carla Nunes", "Maria Eduarda", "Luiz Felipe", "Julio Costa")

# Months
meses <- c("Janeiro", "Fevereiro", "Marco", "Abril", "Maio", "Junho")

In [9]:
# Combining the vectors
vendas_semestre <- c(vendas_jan, vendas_fev, vendas_mar, vendas_abr, vendas_mai, vendas_jun)

# Sale by month and salesperson
matriz_vendas <- matrix(vendas_semestre, nrow = 5, byrow = FALSE)

In [10]:
# Set the row and column names
rownames(matriz_vendas) <- pessoas
colnames(matriz_vendas) <- meses
# Show the matrix
matriz_vendas

Unnamed: 0,Janeiro,Fevereiro,Marco,Abril,Maio,Junho
Pedro Santos,20,15,25,18,22,21
Carla Nunes,18,20,23,15,25,22
Maria Eduarda,25,22,20,19,21,19
Luiz Felipe,16,18,17,20,15,17
Julio Costa,23,19,21,24,18,20


##### Questions to answer:
* Which collaborator had the highest sales revenue?
* Which month had the highest revenue?

In [11]:
rowSums(matriz_vendas)

In [12]:
colSums(matriz_vendas)

##### Questions to answer:
* Which collaborator had the highest sales revenue? Maria Eduarda
* Which month had the highest revenue? Março

In [13]:
total_colab <- rowSums(matriz_vendas)

matriz_total_colab <- cbind(matriz_vendas, total_colab)
matriz_total_colab

Unnamed: 0,Janeiro,Fevereiro,Marco,Abril,Maio,Junho,total_colab
Pedro Santos,20,15,25,18,22,21,121
Carla Nunes,18,20,23,15,25,22,123
Maria Eduarda,25,22,20,19,21,19,126
Luiz Felipe,16,18,17,20,15,17,103
Julio Costa,23,19,21,24,18,20,125


In [14]:
total_meses <- colSums(matriz_vendas)

matriz_total_meses <- rbind(matriz_vendas, total_meses)
matriz_total_meses

Unnamed: 0,Janeiro,Fevereiro,Marco,Abril,Maio,Junho
Pedro Santos,20,15,25,18,22,21
Carla Nunes,18,20,23,15,25,22
Maria Eduarda,25,22,20,19,21,19
Luiz Felipe,16,18,17,20,15,17
Julio Costa,23,19,21,24,18,20
total_meses,102,94,106,96,101,99


## Lesson 3:Conditional and Loop Structures

In [None]:
preco <- c(50, 100, 150, 25, 75)

qtd_estoque <- c(10, 5, 20, 30, 7)

preco_estoque <- c(preco, qtd_estoque)

matriz_estoque <- matrix(preco_estoque, ncol = 2)

rownames(matriz_estoque) <- c('Notebook', 'Tablet', 'Monitor', 'Smartphone', 'Headset') # nolint: line_length_linter.
colnames(matriz_estoque) <- c('Produto', 'Estoque')

matriz_estoque

Unnamed: 0,Produto,Estoque
Notebook,50,10
Tablet,100,5
Monitor,150,20
Smartphone,25,30
Headset,75,7



##### Based on this matrix, we will address the predefined questions in this project. Let's explore the following:

* Calculate the total value in stock.
* Identify products with low stock (less than 15 units).
* Classify the total stock value as high or low.
* Apply a 10% discount to all products in stock.
* Determine the best-selling product.



In [None]:
total_estoque <- matriz_estoque[, 1] * matriz_estoque[, 2]
total_estoque

In [19]:
matriz_estoque <- cbind(matriz_estoque, total_estoque)

matriz_estoque

Unnamed: 0,Produto,Estoque,total_estoque
Notebook,50,10,500
Tablet,100,5,500
Monitor,150,20,3000
Smartphone,25,30,750
Headset,75,7,525


In [None]:
estoque_baixo <- matriz_estoque[, 2] < 15
matriz_estoque[estoque_baixo, ]

Unnamed: 0,Produto,Estoque,total_estoque
Notebook,50,10,500
Tablet,100,5,500
Headset,75,7,525


In [23]:
total_somado <- colSums(matriz_estoque)
total_somado[3]

In [24]:
if (total_somado[3] > 3000){
  paste('Total em estoque é alto. Valor total: ', total_somado[3])
} else {
  paste('Total em estoque é baixo. Valor total: ', total_somado[3])
}

In [27]:
for (i in 1:nrow(matriz_estoque)) {
    matriz_estoque[i, 1] <- matriz_estoque[i, 1] * 0.9
}

matriz_estoque

Unnamed: 0,Produto,Estoque,total_estoque
Notebook,45.0,10,500
Tablet,90.0,5,500
Monitor,135.0,20,3000
Smartphone,22.5,30,750
Headset,67.5,7,525


In [28]:
indice_mais_vendido <- 0
quantidade_mais_vendida <- 0
i <- 1

In [29]:
while (i <= nrow(matriz_estoque) & quantidade_mais_vendida < 30) {
    if (matriz_estoque[i, 2] > quantidade_mais_vendida) {
        quantidade_mais_vendida <- matriz_estoque[i, 2]
        indice_mais_vendido <- i
    }
    i <- i + 1
}

cat('Produto mais vendido: ', indice_mais_vendido)
cat('\nTotal em estoque: ', matriz_estoque[indice_mais_vendido, 2])

Produto mais vendido:  4
Total em estoque:  30

##### The ifelse() function allows you to apply conditions to each element of a vector or matrix efficiently. It follows the syntax:

In [25]:
idades <- c(25, 16, 22, 30, 14)
categorias <- ifelse(idades >= 18, "Adulto", "Jovem")
print(categorias)

[1] "Adulto" "Jovem"  "Adulto" "Adulto" "Jovem" 


In [26]:
notas <- matrix(c(75, 45, 80, 55, 90, 65), ncol = 2)
resultados <- ifelse(notas >= 60, "Aprovado", "Reprovado")
print(resultados)

     [,1]        [,2]       
[1,] "Aprovado"  "Reprovado"
[2,] "Reprovado" "Aprovado" 
[3,] "Aprovado"  "Aprovado" 


## Lesson 4: Mathematical and Statistical Functions

In [32]:
dados_vendas <- matrix(c(
  1230.75, 20, 24615,
  840.46, 35, 29416.10,
  110.20, 15, 1653,
  519.67, 10, 5196.70,
  650.90, 25, 16272.50

), ncol = 3, byrow = TRUE)

colnames(dados_vendas) <- c("Preco", "Quantidade", "Valor Total")
rownames (dados_vendas) <- c("Laptop", "Smart TV", "Webcam", "Microfone", "Smartwatch")

dados_vendas

Unnamed: 0,Preco,Quantidade,Valor Total
Laptop,1230.75,20,24615.0
Smart TV,840.46,35,29416.1
Webcam,110.2,15,1653.0
Microfone,519.67,10,5196.7
Smartwatch,650.9,25,16272.5


In [35]:
dados_filtados <- dados_vendas [dados_vendas[, 1]> 600,]

qtd_filtrado <- sum(dados_filtados[, 2])

qtd_filtrado

In [38]:
qtd_total <- sum(dados_vendas[, 2])

resultado_porcentagem <- (qtd_filtrado / qtd_total) * 100

round(resultado_porcentagem)

* What is the average sales revenue?
* Is there a significant difference between the mean and median of the revenues?
* Which is the most expensive product and which is the cheapest?

In [41]:
mean(dados_vendas[, 3])
median(dados_vendas[, 3])

In [49]:
which.max(dados_vendas[, 1])
which.min(dados_vendas[, 1])

In [50]:
valor_maior <- which.max(dados_vendas[, 1])
dados_vendas[valor_maior, 1]

In [53]:
valor_menor <- which.min(dados_vendas[, 1])
dados_vendas[valor_menor,1]

## Lesson 5: Factors

In [54]:
status_entrega <- c("Entregue", "Em Trânsito", "Pendente", "Entregue", "Em Trânsito")
nomes_produtos <- c("Smartphone", "Notebook", "Monitor", "Mouse", "Tablet")
names(status_entrega) <- nomes_produtos
status_entrega

In [55]:
fator_entrega <- factor(status_entrega)

fator_entrega

In [56]:
fator_entrega <- factor(status_entrega, ordered = TRUE, levels = c('Em Trânsito', 'Pendente', 'Entregue'))

fator_entrega

In [57]:
levels(fator_entrega) <- c('Pendente', 'Em Trânsito', 'Entregue')

fator_entrega

##### Difference in Relation to Python

Unlike Python, R explicitly handles categorical variables using factors. While in Python, you can use objects like lists to represent categories, R introduces factors as a specific class for this purpose.

Here are some specific benefits that make factors a distinctive choice for handling categorical variables:

Explicit Data Type Definition:
* R: The factor class is explicitly designed to represent categorical variables. This means that when you create a factor, R knows that the variable is categorical and handles it as such.
* Python: In Python, categorical variables are often represented as strings or list objects, without a dedicated data structure for categories. The lack of a specific structure can lead to less clarity in interpreting the data type.

Defining Levels:
* R: When creating a factor, you can explicitly define the levels that the categorical variable can take. This helps ensure consistency in analyses, especially when working with data from different sources.
* Python: In Python, the explicit definition of levels may not be as intuitive, and levels may be more prone to discrepancies because they depend on the order in which values appear.

Ease of Reordering:
* R: Factors make it easier to reorder categories. This is useful when creating plots or performing statistical analyses that require a specific order.
* Python: In Python, reordering categories may involve more manual manipulation and, in some cases, may require the use of additional libraries.

Use in Statistical Models:
* R: Many statistical models in R automatically interpret factors correctly, incorporating the order and structure of the levels in the analysis.
* Python: In Python, it may be necessary to preprocess categorical variables to ensure that statistical models interpret them correctly.

In [58]:
fator_entrega[3]

In [59]:
fator_entrega[3] > fator_entrega[4]

In [62]:
prioridade <- fator_entrega %in% c('Pendente', 'Em Trânsito')

prioridade

In [63]:
fator_entrega[prioridade]

In [64]:
cont_pendente <- sum(fator_entrega == 'Pendente')

cont_transito <- sum(fator_entrega == 'Em Trânsito')

cont_entregue <- sum(fator_entrega == 'Entregue')

cont_pendente
cont_transito
cont_entregue