Regressions

Regression Problems using Python and Machine Learning

Extracting the dataset.
Check for NaN values is successful with 0 NaNs.
Graphically Representing the data to find out outliers.In here we are using box plot.
Using the map function to covert the category features like smokers,sex into numbers,but for region we will use Label Encoder i.e. sklearn default library for encoding category features.
Charges column which is our dependent variable having lots of outliers present which can create problems while traing our model.
We calculate the Z-Scores or performing standardisation for the charges column in order to find that upto what limit we keep the data in order to deal with the outliers.
In here we have taken 0.85 standard deviation from the mean to deal with outliers.
Applying the Label Encoder in region column to the covert the regions into numbers,you can also use pandas dummies to instead of this.
Splitting the data into train and test set.
Performing Cross validation with diferent regeression models.
It was found that Gradient Boosting Regression Algorithm performed well with the data compared to other models applied.
Traing the model with the data and measuring the accuracy using Mean Squared Error.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Regression Modelling-checkpoint.ipynb		Regression Modelling-checkpoint.ipynb
Work Notes.txt		Work Notes.txt

Provide feedback