# Theory

We’ve previously mentioned the use of predictive analytics and regression models in our machine learning series, but we didn’t exactly explain the role of these two things in market research as a whole. As with machine learning, with the influx of large amounts of data, predictive analytics has taken on a new light in our industry.

Predictive analytics is important for many reasons, but most importantly it allows businesses to identify future outcomes and act upon them before the impact is too great or the opportunity is missed. And today, it’s not about whether or not to use predictive analytics, but how and where.

## 3 Types of “Predictive” Analytics
Through the collection of large amounts of data, predictive analytics is the use of statistical modeling, algorithms, and other techniques in order to predict future outcomes. Predicting outcomes doesn’t always entail any one process but depends on the type of data being collected and the methods used. There are also different classifications of predictive analytics:

* Predictive is the most obvious classification of predictive analytics. It uses past behavior to predict the propensity for a specific outcome to occur in the future.

* Descriptive analytics uses sets of behaviors to describe relationships between them.
Decision analytics, similar to predictive, uses known behaviors and outcomes to predict future sets of outcomes like a decision tree to predict behaviors (not just singular outcomes).

* Cluster analysis and regression models are just two statistical methods that can be used to gather data for predictive, descriptive, and decision classifications of predictive analytics. Regression models, in particular, are the key to predicting future outcomes.

## Regression Analysis: The Outcomes Predicted
As predictive analytics is a tool for machine learning and big data, regression modeling is a tool for predictive analytics—one of the primary tools in fact. Regression analysis entails looking at dependent variables (outcomes) and an independent variable (the action) while also assessing the strength in the association between them. In other words, it looks to understand if there is a relationship between variables and how strong that relationship is.

An example of a regression model, as it relates to market research and predictive analytics, could include understanding how the likelihood to purchase is affected online by the ease of product search and the delivery cost. The regression output could show that the ease of product search has a stronger association with a likelihood to purchase and as a result, more focus should be placed on improving that variable over delivery cost.

There are a variety of regression techniques to be used depending on the type of classification of predictive analytics and the types of variables involved. To learn more about other big data techniques, check out the webinar below. You’ll also learn about market research innovations for zero-based budgets.

Source [Here](https://www.gutcheckit.com/blog/predictive-analytics-regression-models-explained/).

Source [Here](https://www.crehana.com/ar/cursos-online-data/analitica-predictiva-y-modelos-de-regresion-en-python/?source_page=Search%20Landing&source_detail=Search%20Landing&source=search&model_used=SEARCH_ENGINE_V2.2&product_name=Anal%C3%ADtica%20predictiva%20y%20modelos%20de%20regresi%C3%B3n%20en%20Python&product_id=11621&keyword=python&item_type=course&position_selected=0).

______

# Importing libraries

In [1]:
import pandas as pd

# Importing data

In [2]:
df = pd.read_csv('data/housing_train.csv')

# Data analysis

In [11]:
# df.head()
# df.tail()
# df.shape
# df.columns
# df['crime_rate'][0:5]
# df[['river', 'distance']]
df.iloc[0:10]

Unnamed: 0,crime_rate,lz,industrial,river,nox,rooms,age,distance,highways,taxes,teachers,status,value
0,0.01096,55.0,2.25,0,0.389,6.453,31.9,7.3073,1,300,15.3,8.23,22.0
1,0.5405,20.0,3.97,0,0.575,7.47,52.6,2.872,5,264,13.0,3.16,43.5
2,0.21409,22.0,5.86,0,0.431,6.438,8.9,7.3967,7,330,19.1,3.59,24.8
3,0.31827,0.0,9.9,0,0.544,5.914,83.2,3.9986,4,304,18.4,18.33,17.8
4,0.02055,85.0,0.74,0,0.41,6.383,35.7,9.1876,2,313,17.3,5.77,24.7
5,0.03871,52.5,5.32,0,0.405,6.209,31.3,7.3172,6,293,16.6,7.14,23.2
6,9.91655,0.0,18.1,0,0.693,5.852,77.8,1.5004,24,666,20.2,29.97,6.3
7,0.08265,0.0,13.92,0,0.437,6.127,18.4,5.5027,4,289,16.0,8.58,23.9
8,0.08387,0.0,12.83,0,0.437,5.874,36.6,4.5026,5,398,18.7,9.1,20.3
9,1.42502,0.0,19.58,0,0.871,6.51,100.0,1.7659,5,403,14.7,7.39,23.3


In [12]:
df.iloc[2,4]

0.431