Understanding regression analysis.
Regression models are used to predict target variables on a continuous scale, which makes them attractive for addressing many questions in science as well as applications in industry, such as understanding relationships between variables, valuating trends, or making forecasts.
• Exploring and visualizing datasets
• Looking at different approaches to implement linear regression models
• Training regression models that are robust to outliers
• Evaluating regression models and diagnosing common problems
• Fitting regression models to nonlinear data
We will use the Housing Dataset, which contains information about houses in the suburbs of Boston collected by D. Harrison and D.L. Rubinfeld in 1978. The Housing Dataset has been made freely available and can be downloaded from the UCI machine learning repository at https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data.
• CRIM: This is the per capita crime rate by town
• ZN: This is the proportion of residential land zoned for lots larger than
25,000 sq.ft.
• INDUS: This is the proportion of non-retail business acres per town
• CHAS: This is the Charles River dummy variable (this is equal to 1 if tract
bounds river; 0 otherwise)
• NOX: This is the nitric oxides concentration (parts per 10 million)
• RM: This is the average number of rooms per dwelling
• AGE: This is the proportion of owner-occupied units built prior to 1940
• DIS: This is the weighted distances to five Boston employment centers
• RAD: This is the index of accessibility to radial highways
• TAX: This is the full-value property-tax rate per $10,000
• PTRATIO: This is the pupil-teacher ratio by town
• B: This is calculated as 1000(Bk - 0.63)^2, where Bk is the proportion of
people of African American descent by town
• LSTAT: This is the percentage lower status of the population
• MEDV: This is the median value of owner-occupied homes in $1000s