# Regression analysis

**Regression analysis** is a statistical tool that attempts to identify correlation between independent variables (one variable or more) and a single dependent variable.   

**Correlation** is the degree to which two things change together.   
* Any two things are correlated, somewhere between -1 and 1, 
* 0 meaning there is no correlation at all
* positively correlated - a positive change in one (advertising) leads to a positive change in something else (sales).  

**Linear Functions and Models  **  
**linear equation** is any pattern of numbers that is increasing or decreasing by the same amount every step of the way.  
**slope-intercept form** of the linear equation, `y = mx + b`, where the m value is the slope, and the b value is the y-intercept.  

<sub>https://www.zweigmedia.com/RealWorld/tutorialsf0/framesLA.html</sub>  
<sub>https://study.com/academy/lesson/what-is-a-linear-equation.html</sub>  

**Sum of squares error (SSE)**  
 
<sub>https://www.zweigmedia.com/tuts/tutRegression.html</sub>

**The regression line or The best fit line  **  
<sub>https://www.zweigmedia.com/tuts/tutRegressionb.html</sub>

**Coefficient of Determination(R-squared)  &  Adjusted R-squared**  

<sub>https://towardsdatascience.com/coefficient-of-determination-r-squared-explained-db32700d924e</sub>

**Regression assumptions & Residual analysis**  

<sub>https://www.datacamp.com/community/tutorials/linear-regression-R#coefficients</sub>

## Simple linear Regression  
a single independent variable is used to predict the value of a dependent variable  

In [None]:
# Importing the dataset
dataset = read.csv('data/income.csv')

# Splitting the dataset into the Training set and Test set
install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$income, SplitRatio = 2/3)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)

# Fitting Simple Linear Regression to the Training set
regressor = lm(formula = income ~ Exp,
               data = training_set)

# Predicting the Test set results
y_pred = predict(regressor, newdata = test_set)

# Visualising the Training set results
library(ggplot2)
ggplot() +
  geom_point(aes(x = training_set$Exp, y = training_set$income),
             colour = 'red') +
  geom_line(aes(x = training_set$Exp, y = predict(regressor, newdata = training_set)),
            colour = 'blue') +
  ggtitle('income vs Experience (Training set)') +
  xlab('Years of experience') +
  ylab('income')

# Visualising the Test set results
library(ggplot2)
ggplot() +
  geom_point(aes(x = test_set$Exp, y = test_set$income),
             colour = 'red') +
  geom_line(aes(x = training_set$Exp, y = predict(regressor, newdata = training_set)),
            colour = 'blue') +
  ggtitle('income vs Experience (Test set)') +
  xlab('Years of experience') +
  ylab('income')

## Multiple linear Regression
two or more independent variables are used to predict the value of a dependent variable  

In [None]:
# Importing the dataset
dataset = read.csv('data/companies.csv')

# Encoding categorical data
dataset$State = factor(dataset$State,
                       levels = c('New York', 'California', 'Florida'),
                       labels = c(1, 2, 3))

# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Profit, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)

# Fitting Multiple Linear Regression to the Training set
regressor = lm(formula = Profit ~ .,
               data = training_set)

# Predicting the Test set results
y_pred = predict(regressor, newdata = test_set)

# Artificial Neural Networks

In [None]:
install.packages('neuralnet')
library("neuralnet")
 
#Going to create a neural network to perform sqare rooting
#Type ?neuralnet for more information on the neuralnet library
 
#Generate 50 random numbers uniformly distributed between 0 and 100
#And store them as a dataframe
traininginput <-  as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
 
#Column bind the data into one variable
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
 
#Train the neural network
#Going to have 10 hidden layers
#Threshold is a numeric value specifying the threshold for the partial
#derivatives of the error function as stopping criteria.
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01)
print(net.sqrt)
 
#Plot the neural network
plot(net.sqrt)
 
#Test the neural network on some training data
testdata <- as.data.frame((1:10)^2) #Generate some squared numbers
net.results <- compute(net.sqrt, testdata) #Run them through the neural network
 
#Lets see what properties net.sqrt has
ls(net.results)
 
#Lets see the results
print(net.results$net.result)
 
#Lets display a better version of the results
cleanoutput <- cbind(testdata,sqrt(testdata),
                         as.data.frame(net.results$net.result))
colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output")
print(cleanoutput)

<span style="color:red; font-family:Comic Sans MS">**R Code pulled from:** </span>  
<a href="http://gekkoquant.com/2012/05/26/neural-networks-with-r-simple-example/" target="_blank">http://gekkoquant.com/2012/05/26/neural-networks-with-r-simple-example/</a>  
<span style="color:red; font-family:Comic Sans MS">**Sources & References** </span>     
<a href="https://gl4l.greatlearning.in/building-artificial-neural-networks-using-r/" target="_blank">https://gl4l.greatlearning.in/building-artificial-neural-networks-using-r/</a>  
<span style="color:red; font-family:Comic Sans MS">Further Resources: </span>     
<a href="http://www.michaeljgrogan.com/neural-network-modelling-neuralnet-r/" target="_blank">http://www.michaeljgrogan.com/neural-network-modelling-neuralnet-r/</a>  

