The data in the file `RIKZ.txt` available in the `data` folder was collected to study the relationship between some abiotic aspects (e.g., sediment composition, slope of the beach) as these might affect benthic fauna. Mulder (2000) described the results of a pilot study that looked at the effects of differences in slope and grain size on fauna in the coastal zone. 


Janssen, G.M., Mulder, S., Zuur, A.F., Ieno, E.N. and Smith, G.M.

Q1. Load the data into a variable called `survey_data`
  * The `survey_data` should be a `tibble`
  * The fileds in the file are delimited. You can change read_csv's behavior to split on tabs, rather that on comma (","), which is the default behavior

In [2]:
#Question 1 

library(tidyverse)
library(dplyr)
survey_data <- read_dlim("RIKZ.txt")
as_tibble(survey_data)

Q2. display the first 6 lines of the tables

In [3]:
#Question 2 
head(survey_data)

Q3 The columns C1 P1-P25, N1, CR1-28, M1-17 and I1-5 of the table represent the counts for 75 species grouped within five taxa: Chaetognatha (C), Polychaeta (P), Crustacea (CR), Mollusca (M), and Insecta (I). We're only interested in the richness, and we need to compute it as:
* `1` if the group has a value `> 0`
* `0` otherwise.

* Create a new column, call it `richness`, which represents the richness in each sample. The richness of `sample 1` should be `11`, since sample has non-null values only for the following groups: 
```
'C1''P6''P15''P16''P25''CR1''CR14''CR15''CR19''CR26''I3'
```



In [4]:
#Question 3
survey_data_1 <- survey_data
survey_data_A <- survey_data[, 2:76]

survey_data_B <- survey_data_A %>%
    mutate_if(is.numeric, -1 * (. !=0))

survey_data_C <- survey_data_B %>% 
    mutate(richness = rowSums(survey_data_B))

survey_data_1$richness <- survey_data_C$richness

Q4 Create a copy of the variable `survey_data` that does not have columns C1 P1-P25, N1, CR1-28, M1-17 and I1-5. Call this variable `survey_data_richness`


In [5]:
#Question 4
survey_data_richness <- survey_data_1[, 77:90]

Q6. Use the `lm` function to model the richness as a function of the remaining variables but not including the variable `week`, which needs a special treatment we haven't covered yet!     


In [None]:
#Question 6
lm_angle1 <- lm(richness ~ angle1, survey_data_richness)
lm_angle2 <- lm(richness ~ angle2, survey_data_richness)
lm_exposure <- lm(richness ~ exposure, survey_data_richness)
lm_salinity <- lm(richness ~ salinity, survey_data_richness)
lm_temperature <- lm(richness ~ temperature, survey_data_richness)
lm_NAP <- lm(richness ~ NAP, survey_data_richness)
lm_penetrability <- lm(richness ~ penetrability, survey_data_richness)
lm_grainsize <- lm(richness ~ grainsize, survey_data_richness)
lm_humus <- lm(richness ~ humus, survey_data_richness)
lm_chalk <- lm(richness ~ chalk, survey_data_richness)
lm_sorting1 <- lm(richness ~ sorting1, survey_data_richness)
lm_Beach <- lm(richness ~ Beach, survey_data_richness)


Q7. What do the various output of the `lm` mean? Interpret the results of your model. 

In [None]:
#Question 7
#No variables have strong correlation with the variable richness.
#Some variables, like grainsize, have a negative correlation to richness while others are positive. 

In [None]:
Q8. Build a model that includes all the parameters and assess the fit of the data

In [None]:
# Question 8 
lm_all <- lm(richness ~ angle1+angle2+exposure+salinity+temperature+penetrability+
             grainsize+humus+chalk+sorting1+Beach, survey_data_richness)
summary(lm_all)

In [None]:
#The only variable with that produced a p-value near the alpha was salinity. All the rest were much higher. 
#When running the multiple linear regression model, the combined p-value was 0.338, showing no significance.
#This output tells up that richness is not correlated strongly with any of these variables. 

Q9. Use an appropriate method that only selects a subset of the data. Compare the AIC with the previous method. What do you conclude? Justify your answer.

In [None]:
#Question 9
AIC(lm_all)

In [None]:
#The AIC score is >264, therefore, the model is not a good fit. 
#Finding the AIC produced similar results to the multiple linear regression model. 

Q10. Can you find the model that provide the best fit. You can use term interaction in the model. Justify why this model is the `best`.




In [None]:
# Write your code here

In [None]:
# Write your Interpretation here