Skip to content

"Advanced Regression" on GitHub focuses on modeling house prices, providing valuable insights for management. By analyzing various independent variables, the project aims to predict how prices vary. This predictive model empowers the firm to strategically adapt its approach, concentrating efforts for optimal returns.

Notifications You must be signed in to change notification settings

poronita/Advanced_Regression

Repository files navigation

Advanced Regression Analysis

Introduction

Surprise Housing, a US-based real estate company, is venturing into the Australian market using data analytics to identify lucrative investment opportunities. This comprehensive dataset from Australian property sales is utilized to construct a robust regression model. The goal is to predict the true value of potential properties, aiding in decision-making for property acquisitions and understanding variables influencing house prices.

The key questions guiding this analysis include identifying influential predictive variables and determining optimal lambda values for ridge and lasso regression. This data-driven approach empowers Surprise Housing to make informed investment decisions in the Australian real estate market.

Business Goal

Develop a predictive model for house prices to provide insights into the relationship between prices and various factors. This model assists in strategic decision-making, optimizing investment strategy, and navigating the complexities of the real estate market.

Downloads

Data Definition

Details of variables are provided in the data description file.

Analysis

The exploration begins with doubling alpha values in ridge and lasso regression, investigating shifts in model dynamics and predictor variable importance. The decision-making process between ridge and lasso, optimal lambda values, constructing models without crucial predictors, and ensuring model robustness and generalizability are addressed.

Technologies Used

  • Programming Languages:

    • Python
  • Libraries and Frameworks:

    • Pandas
    • NumPy
    • Matplotlib
    • Seaborn
    • Scikit-learn
  • Data Handling:

    • Data loading using Pandas
    • Handling missing values with SimpleImputer
    • Scaling numerical features with StandardScaler
  • Data Visualization:

    • Matplotlib and Seaborn for creating various plots and visualizations
  • Machine Learning Models:

    • Ridge and Lasso regression models implemented using Scikit-learn
  • Model Evaluation:

    • Metrics such as Mean Squared Error (MSE), R-squared (R²), and Mean Absolute Error (MAE) for evaluating model performance
  • Data Preprocessing:

    • OneHotEncoder for handling categorical variables
    • GridSearchCV for hyperparameter tuning
  • Data Analysis:

    • Exploratory Data Analysis (EDA) techniques, including correlation matrix heatmap and box plots
  • Data Pipeline:

    • Utilization of Scikit-learn's Pipeline for streamlined and reproducible model building

Detailed analysis can be found here.
For the Python code, click here.

Python Notebook

Full Code for the Analysis - Click Here

Acknowledgement

The data used in this analysis was generously provided by Upgrade Academy. I extend my sincere gratitude to the dedicated faculty members at Upgrade Academy for their invaluable support and guidance throughout the analysis process. Their expertise and commitment significantly contributed to the success of this project.

About

"Advanced Regression" on GitHub focuses on modeling house prices, providing valuable insights for management. By analyzing various independent variables, the project aims to predict how prices vary. This predictive model empowers the firm to strategically adapt its approach, concentrating efforts for optimal returns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages