Predictive Analytics With Python

How to process data and make predictive models out of them

Data Cleaning

Describes the process of reading a dataset, getting a bird's eye view of the dataset, handling the missing values in the dataset, and exploring the dataset with basic plotting using the pandas and matplotlib packages in Python. The data cleaning and wrangling together constitutes around 80% of the modelling time.

Data Wrangling

Describes the methods to subset a dataset, concatenate or merge two or more datasets, group the dataset by categorical variables, split the dataset into training and testing sets, generate dummy datasets using random numbers, and create simulations using random numbers.

Statistical Concepts for Predictive Modelling

Explains the basic statistics needed to make sense of the model parameters resulting from the predictive models. This chapter deals with concepts like hypothesis testing, z-tests, t-tests, chi-square tests, p-values, and so on followed by a discussion on correlation.

Linear Regression with Python

Starts with a discussion on the mathematics behind the linear regression validating the mathematics behind it using a simulated dataset. It is then followed by a summary of implications and interpretations of various model parameters. The chapter also describes methods to implement linear regression using the stasmodel.api and scikit-learn packages and handling various related contingencies, such as multiple regression, multi-collinearity, handling categorical variables, non-linear relationships between predictor and target variables, handling outliers, and so on.

Logistic Regression with Python

Explains the concepts, such as odds ratio, conditional probability, and contingency tables leading ultimately to detailed discussion on mathematics behind the logistic regression model (using a code that implements the entire model from scratch) and various tests to check the efficiency of the model. The chapter also describes the methods to implement logistic regression in Python and drawing and understanding an ROC curve.

Clustering with Python

Discusses the concepts, such as distances, the distance matrix, and linkage methods to understand the mathematics and logic behind both hierarchical and k-means clustering. The chapter also describes the methods to implement both the types of clustering in Python and methods to fine tune the number of clusters.

Trees and Random Forests with Python

Starts with a discussion on topics, such as entropy, information gain, gini index, and so on. To illustrate the mathematics behind creating a decision tree followed by a discussion on methods to handle variations, such as a continuous numerical variable as a predictor variable and handling a missing value. This is followed by methods to implement the decision tree in Python. The chapter also gives a glimpse into understanding and implementing the regression tree and random forests.

Best Practices for Predictive Modelling

Entails the best practices to be followed in terms of coding, data handling, algorithms, statistics, and business context for getting good results in predictive modelling.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Chapter2		Chapter2
.gitignore		.gitignore
Data Cleaning.ipynb		Data Cleaning.ipynb
README.md		README.md
commit.sh		commit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predictive Analytics With Python

Data Cleaning

Data Wrangling

Statistical Concepts for Predictive Modelling

Linear Regression with Python

Logistic Regression with Python

Clustering with Python

Trees and Random Forests with Python

Best Practices for Predictive Modelling

About

Uh oh!

Releases

Packages

Languages

jorwalk/predictive-analytics-with-python

Folders and files

Latest commit

History

Repository files navigation

Predictive Analytics With Python

Data Cleaning

Data Wrangling

Statistical Concepts for Predictive Modelling

Linear Regression with Python

Logistic Regression with Python

Clustering with Python

Trees and Random Forests with Python

Best Practices for Predictive Modelling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages