In [None]:
# %% [markdown]
# *******Simple Linear Regression Program for Absolute Beginners**
# **Step 1-Data pre-processing:**
# * importing required libraries for loading the dataset, ploting the graphs, performing mathematical operations, and creating the linear regression model.

# %% [code]
import numpy as nm
import matplotlib.pyplot as mpl
import pandas as pd

# %% [markdown]
# * Next, we will load the dataset into our program.

# %% [code]
data_set=pd.read_csv('../input/salary/Salary.csv')

# %% [markdown]
# * **.shape** and **.info()** function for number of dimensions in the array and to get a concise summary of the dataset.

# %% [code]
print("Data size: ",data_set.shape)
data_set.info()

# %% [markdown]
# * Extracting the dependent and independent variables from the given dataset.
#   independent variable(**x**) is "years of experience" and dependent variable(**y**) is "salary" in this case.

# %% [code]
x=data_set.iloc[:,:-1].values
y=data_set.iloc[:,1].values

# %% [markdown]
# * Next, we'll be splitting the dataset into Training set and Test set by importing the train_test_split class of the model_selection library from scikit learn. In **train_test_split()** function, we have passed four parameters in which first two are for arrays of data, **test_size** for specifying the size of the test set, and **random_state** to set a seed for a random generator to get the same result always.

# %% [code]
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)

# %% [markdown]
# * Visualizing the Training dataset.

# %% [code]
mpl.scatter(x_train,y_train,color="brown")
mpl.title("Salary Vs Experience(Training dataset)")
mpl.xlabel("Years of Experience")
mpl.ylabel("Salary")
mpl.show()

# %% [markdown]
# **Step-2:Fitting the Simple Linear Regression Model to the Training Set.**
# * Fitting our model to the training dataset. To do so, we'll import the LinearRegression class of the linear_model library from the scikit learn. After importing the class, we are going to create an object of the class named as *reg*. **.fit()** function to fit our Simple Linear Regression object to the training set.

# %% [code]
from sklearn.linear_model import LinearRegression
reg=LinearRegression()
reg.fit(x_train,y_train)

# %% [markdown]
# **Step-3: Prediction of test set and training set result.**

# %% [code]
y_pred=reg.predict(x_test)
x_pred=reg.predict(x_train)

# %% [markdown]
# * Table showing the differnce(error) between the actual values and predicted values of the test set.

# %% [code]
error = pd.DataFrame({"Actual": y_test,
                      "Predictions": y_pred,
                      "Difference": nm.abs(y_test-y_pred)})
print(error)

# %% [markdown]
# **Step-4: Visualizing the Training set and Test set result.**

# %% [code]
mpl.scatter(x_train,y_train,color="purple")
mpl.plot(x_train,x_pred,color="black")
mpl.title("Salary Vs Experience(Training dataset)")
mpl.xlabel("Years of Experience")
mpl.ylabel("Salary")
mpl.show()

# %% [code]
mpl.scatter(x_test,y_test,color="red")
mpl.plot(x_train,x_pred,color="black")
mpl.title("Salary Vs Experience(Test dataset)")
mpl.xlabel("Years of Experience")
mpl.ylabel("Salary")
mpl.show()

# %% [markdown]
# * *.predict()* function to predict the final result with the Linear Regression Model.

# %% [code]
pred_sal=reg.predict([[9.2]])
print("Predicted salary: ",pred_sal)

# %% [markdown]
# * *Accuracy of our model with Training dataset and Test dataset*

# %% [code]
print("Training accuracy: ",(reg.score(x_train,y_train))*100)
print("Test accuracy: ",(reg.score(x_test,y_test))*100)