This is a simple Linear Regression program I built using a Supervised Learning Algorithm. 

**1**: **Import the necessary libraries**

NumPy, pandas, and scikit-learn (sklearn):
- NumPy :  numerical computing
- pandas : data manipulation and analysis
- scikit-learn: popular for machine learning 



In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


**2: Load the dataset**

- Load the Boston Housing Dataset using the pd.read_csv() method. This dataset contains information about various features of houses in Boston, such as the number of rooms, crime rate, etc. 
- Then print the first 5 rows of the dataset using the head() method to get an idea of what the data looks like.

In [None]:
# Load the Boston Housing Dataset
boston_dataset = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Print the first 5 rows of the dataset
print(boston_dataset.head())


      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

        b  lstat  medv  
0  396.90   4.98  24.0  
1  396.90   9.14  21.6  
2  392.83   4.03  34.7  
3  394.63   2.94  33.4  
4  396.90   5.33  36.2  


Step 3: Split the dataset into training and testing sets

- To evaluate the performance of our Linear Regression algorithm, split the dataset into training and testing sets. 
- Use the train_test_split() method from sklearn to randomly split the dataset into 80% training data and 20% testing data. 
- Define the input features X as all columns except for the target variable medv, and the target variable y as the medv column.



In [None]:
# Split the dataset into training and testing sets
X = boston_dataset.drop('medv', axis = 1)
y = boston_dataset['medv']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)


Step 4: Train the Linear Regression model

Create an instance of the LinearRegression() class from sklearn and fit the model to the training data using the fit() method --> trains the model on the training data and allows it to make predictions on the testing data

In [None]:
# Train the Linear Regression model
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)


Step 5: Evaluate the performance of the model

- Evaluate the performance of Linear Regression model by making predictions on the testing data using the predict() method and calculating the Mean Squared Error (MSE) and R2 score using the mean_squared_error() and r2_score() methods from sklearn. 
- The MSE measures the average squared difference between the predicted values and the actual values, while the R2 score measures the proportion of variance in the target variable that is explained by the input features.

In [None]:
# Evaluate the performance of the model
y_pred = linear_regression.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print('Mean Squared Error:', mse)
print('R2 Score:', r2)


Mean Squared Error: 33.44897999767639
R2 Score: 0.5892223849182525
