Skip to content

govindplal/Data_Analysis_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Data_Analysis_Project

This is a Data Analysis project done as part of a Data Science Course. Analysis is done on the data set containing the House Sales in King County, USA, which includes Seattle. It includes homes sold between May 2014 and May 2015.

Download the entire dataset here : https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-SkillsNetwork/labs/FinalModule_Coursera/data/kc_house_data_NaN.csv

The project is done on Jupyter Notebook using Python.
The libraries used are :

  • pandas
  • matplotlib
  • seaborn
  • scikitlearn

The aims of the project were:

  • Obtaining a statistical summary of the dataframe
  • Replacing the missing values in the dataset with the mean values
  • Comparing the outliers in the sales of houses with and without waterfont view
  • Finding the correlation of certain feature with the price of the house
  • Creating linear regression models to predict price using different features and finding the coefficient of determination
  • Splitting the dataset into test samples and training samples.
  • Creating ridge regression object using the test data and training data and finding the coefficient of determination, thus, evaluating the model and refining it if required.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published