Business Machine Learning and Data Science Applications
A curated list of applied business machine learning (BML) and business data science (BDS) examples and libraries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated. The catalogue is inspired by
Caution: This is a work in progress, please contribute, especially if you are a subject expert in ML/DS for Accounting, Banking, Finance and Insurance, Customer, Employee, Legal, Management, Operations and Public matters.
If you want to contribute to this list (please do), send me a pull request or contact me @dereknow. Also, a listed repository should be deprecated if:
- Repository's owner explicitly say that "this library is not maintained".
- Not committed for long time (2~3 years).
Table of Contents
- Banking, Finance and Insurance
- Chart of Account Prediction - Using labeled data to suggest the account name for every transaction.
- Accounting Anomalies - Using deep-learning frameworks to identify accounting anomalies.
- Financial Statement Anomalies - Detecting anomalies before filing, using R.
- Useful Life Prediction (FirmAI) - Predict the useful life of assets using sensor observations and feature engineering.
- Forensic Accounting - Collection of case studies on forensi accounting using data analysis.
- General Ledger (FirmAI) - Data processing over a general ledger as exported through an accounting system.
- Bullet Graph (FirmAI) - Bullet graph visualisation helpful for tracking sales, commission and other performance.
- Aged Debtors (FirmAI) - Example analysis to invetigate aged debtors.
- Financial Sentiment Analysis - Sentiment, distance and proportion analysis for trading signals.
- Extensive NLP - Comprehensive NLP techniques for accounting research.
Data, Parsing and APIs
- EDGAR - A walk-through in how to obtain EDGAR data.
- IRS - Acessing and parsing IRS filings.
- Financial Corporate - Rutgers corporate financial datasets.
- Non-financial Corporate - Rutgers non-financial corporate dataset.
- PDF Parsing - Extracting useful data from PDF documents.
- PDF Tabel to Excel - How to output an excel file from a PDF.
Research And Articles
- Understanding Accounting Analytics - An article that tackles the importance of accounting analytics.
- VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.
- Rutgers Raw - Good digital accounting research from Rutgers.
- Computer Augmented Accounting - A video series from Rutgers University looking at the use of computation to improve accounting.
- Accounting in a Digital Era - Another series by Rutgers investigating the effects the digital age will have on accounting.
Banking, Finance and Insurance
- Loan Acceptance - Classification and time-series analysis for loan acceptance.
- Predict Loan Repayment - Predict whether a loan will be repaid using automated feature engineering.
- Loan Eligibility Ranking - System to help the banks check if a customer is eligible for a given loan.
- Home Credit Default (FirmAI) - Predict home credit default.
- Mortgage Analytics - Extensive mortgage loan analytics.
- Credit Approval - A system for credit card approval.
- Loan Risk - Predictive model to help to reduce charge-offs and losses of loans.
- Amortisation Schedule (FirmAI) - Simple amortisation schedule in python for personal use.
Management and Operation
- Credit Card - Estimate the CLV of credit card customers.
- Survival Analysis - Perform a survival analysis of customers.
- Next Transaction - Deep learning model to predict the transaction amount and days to next transaction.
- Credit Card Churn - Predicting credit card customer churn.
- Bank of England Minutes - Textual analysis over bank minutes.
- Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
- Real Estate - Predicting real estate prices from the urban environment.
- Used Car - Used vehicle price prediction.
- XGBoost - Fraud Detection by tuning XGBoost hyper-parameters with Simulated Annealing
- Fraud Detection Loan in R - Fraud detection in bank loans.
- AML Finance Due Diligence - Search news articles to do finance AML DD.
- Credit Card Fraud - Detecting credit card fraud.
Insurance and Risk
- Bank Failure - Predicting bank failure.
- Risk Management - Finance risk engagement course resources.
- VaR GaN - Estimate Value-at-Risk for market risk management using Keras and TensorFlow.
- Actuarial Sciences (R) - A range of actuarial tools in R.
Trading and Investment
- Deep Portfolio - Deep learning for finance Predict volume of bonds.
- Corporate Bonds - Predicting the buying and selling volume of the corporate bonds.
- Simulation - Investigating simulations as part of computational finance.
- Industry Clustering - Project to cluster industries according to financial attributes.
- Financial Modeling - HFT trading and implied volatility modeling.
- Trend Following - A futures trend following portfolio investment strategy.
- Financial Statement Sentiment - Extracting sentiment from financial statements using neural networks.
- Applied Corporate Finance - Studies the empirical behaviors in stock market.
- Market Crash Prediction - Predicting market crashes using an LPPL model.
- NLP Finance Papers - Curating quantitative finance papers using machine learning.
- ARIMA-LTSM Hybrid - Hybrid model to predict future price correlation coefficients of two assets
- Basic Investments - Basic investment tools in python.
- Basic Derivatives - Basic forward contracts and hedging.
- Basic Finance - Source code notebooks basic finance applications.
- Bank Note Fraud Detection - Bank Note Authentication Using DNN Tensorflow Classifier and RandomForest.
- ATM Surveillance - ATM Surveillance in banks use case.
- Pareto/NBD Model - Calculate the CLV using a Pareto/NBD model.
- Gamma-Gamma Model - Using deep-learning frameworks to identify accounting anomalies.
- Cohort Analysis - Cohort analysis to group customers into mutually exclusive cohorts measured over time.
- E-commerce - E-commerce customer segmentation.
- Groceries - Segmentation for grocery customers.
- Online Retailer - Online retailer segmentation.
- Bank - Bank customer segmentation.
- Wholesale - Clustering of wholesale customers.
- Various - Multiple types of segmentation and clustering techniques.
- RNN - Investigating customer behaviour over time with sequential analysis using an RNN model.
- Neural Net - Demand forecasting using artificial neural networks.
- Temporal Analytics - Investigating customer temporal regularities.
- POS Analytics - Analytics driven customer behaviour ranking for retail promotions using POS data.
- Wholesale Customer - Wholesale customer exploratory data analysis.
- RFM - Doing a RFM (recency, frequency, monetary) analysis.
- Returns Behaviour - Predicting total returns and fraudulent returns.
- Visits - Predicting which day of week a customer will visit.
- Bank: Next Purchase - A project to predict bank customers' most probable next purchase.
- Bank: Customer Prediction - Predicting Target customers who will subscribe the new policy of the bank.
- Next Purchase - Predict a customers’ next purchase also using feature engineering.
- Customer Purchase Repeats - Using the lifetimes python library and real jewellery retailer data analyse customer repeat purchases.
- AB Testing - Find the best KPI and do A/B testing.
- Customer Survey (FirmAI) - Example of parsing and analysing a customer survey.
- Happiness - Analysing customer happiness from hotel stays using reviews.
- Miscellaneous Customer Analytics - Various tools and techniques for customer analysis.
- Recommendation - Recommend the songs that a customer on a music app would prefer listening to.
- General Recommender - Identifying which products to recommend to which customers.
- Collaborative Filtering - Customer recommendation using collaborative filtering.
- Up-selling (FirmAI) - Analysis to identify up-selling opportunities.
- Ride Sharing - Identify customer churn rates in order to target customers for retention campaigns.
- KKDBox I - Variational deep autoencoder to predict churn customer
- KKDBox II - A three step customer churn prediction framework using feature engineering.
- Personal Finance - Predict customer subscription churn for a personal finance business.
- ANN - Churn analysis using artificial neural networks.
- Bike - Customer bike churn analysis.
- Cost Sensitive - Cost sensitive churn analysis drivenby economic performance.
- Topic Modelling - Topic modelling on a corpus of customer surveys from the VR industry.
- Customer Satisfaction - Predict customer satisfaction using Kaggle data.
- Personality Prediction - Predict Big 5 Personality from text.
- Salary Prediction Resume - Textual analyses over resume to predict appropriate salary.
- Employee Review Analysis - Review analytics for top 50 retail companies on Indeed.
- Diversity Analysis - A simple analysis of gender and race disparity in the tech industry.
- Occupation Prediction - Predict the likelihood that an occupation is analytical.
- Training Hours Performance - The impact of training ours on employee performance.
- Promotion Prediction - Analysing promotion patterns.
- Employee Attendance prediction - Various tools to predict employee attendance.
- Early Leaving Employees - Identifying why the best and most experienced employees leaving prematurely.
- Employee Turnover - Identifying factors associated with employee turnover.
- Slack Communication Analysis - Producing meaningful visualisations from slack conversations.
- Employee Relationships from Conversations - Identifying employee relationships from emails for improved HR analytics.
- Categorise Employee Requests - Classifying employee requests via TFDIF Vectorizer and RandomForestClassifier.
- Employee Face Recognition - A face recognition implementation.
- Attendance Management System - An attendance management system using face recognition.
- LexPredict - Software package and library.
- AI Para-legal - Lobe is the world's first AI paralegal.
- Legal Entity Detection - NER For Legal Documents.
- Legal Case Summarisation - Implementation of different summarisation algorithms applied to legal case judgements.
- Legal Documents Google Scholar - Using Google scholar to extract cases programatically.
- Chat Bot - Chat-bot and email notifications.
Policy and Regulatory
- GDPR scores - Predicting GDPR Scores for Legal Documents.
- Driving Factors FINRA - Identify the driving factors that influence the FINRA arbitration decisions.
- Securities Bias Correction - Bias-Corrected Estimation of Price Impact in Securities Litigation.
- Public Firm to Legal Decision - Embed public firms based on their reaction to legal decisions.
- Supreme Court Prediction - Predicting the ideological direction of Supreme Court decisions: ensemble vs. unified case-based model.
- Supreme Court Topic Modeling - Multiple steps necessary to implement topic modeling on supreme court decisions.
- Judge Opinion - Using text mining and machine learning to analyze judges’ opinions for a particular concern.
- ML Law Matching - A machine learning law match maker.
- Bert Multi-label Classification - Fine Grained Sentiment Analysis from AI.
- Some Computational AI Course - Video series Law MIT.
- Topic Model Reviews - Amazon reviews for product development.
- Patents - Forecasting strategy using patents.
- Networks - Business categories from Yelp reviews using networks can help to identify pockets of demand.
- Company Clustering - Hierarchical clusters and topics from companies by extracting information from their descriptions on their websites
- Marketing Management - Programmatic marketing management.
- Constraint Learning - Machine learning that takes into account constraints.
- Fairlearn - I think it is called cost-sensitive machine learning.
- Multi-label Classification - Cost-Sensitive Multi-Label Classification
- Multi-class Classification - Cost-sensitive multi-class classification (Weighted-All-Pairs, Filter-Tree & others)
- CostCla - Costcla is a Python module for cost-sensitive machine learning (classification) built on top of Scikit-Learn
- DEA Software - pyDEA is a software package developed in Python for conducting data envelopment analysis (DEA).
- Covering Set (FirmAI) - Constraint programming analysis.
- Insurance (FirmAI) - CP Insurance analysis.
- Machine Learning + CP (FirmAI) - Machine Learning + Optimisation.
- Post Office (FirmAI) - Post Office optimisation.
- Soda - CP (FirmAI) - Constraint Programming + ML.
- Soda - Knapsack (FirmAI) - Knapsack algorithm + ML.
- Soda - MLP (FirmAI) - MLP analysis + ML.
- Marketing AB Testing - A/B Testing Experiment.
- Legal Studies - Instrumental and discontinuity causal approach.
- A-B Test Result (FirmAI) - Initial A-B Results.
- Causal Regression (FirmAI) - Regression technique for causal estimate.
- Frequentist vs Bayesian A-B Test (FirmAI) - Comparison between frequentist and bayesian A-B testing.
- A-B Test Power Analysis (FirmAI) - Sample size estimation to match testing power.
- Variance Reduction A-B test (FirmAI) - Techniques to reduce variance in A-B tests.
- Various - Various applies statistical solutions
- Applied RL - Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks
- Process Mining - Leveraging A-priori Knowledge in Predictive Business Process Monitoring
- TS Forecasting - Time series forecasting for important business applications.
- Web Scraping (FirmAI) - Web scraping solutions for Facebook, Glassdoor, Instagram, Morningstar, Similarweb, Yelp, Spyfu, Linkedin, Angellist.
Failure and Anomalies
- Anomalies - Anomaly detection resources.
- Intrusion Detection - Detecting network intrusions.
- APS Failure, Data - Investigating APS failures in Scania trucks.
- Hardware Failure - Using different machine learning techniques in detecting anomalies.
- Anomaly KIs,Paper - Anomaly detection algorithm for seasonal KPIs.
Load and Capacity Management
- House Load Energy - Linear, SVR and Random Forest models to predict house's appliances energy Load.
- Uber Load Management - Uber predictive load management.
- Capacity Management - Investigating IT stability issues are caused by capacity constraints.
- Bike Sharing - XGBRegressor, RandomForestRegressor, GradientBoostingRegressor combined with feature selection.
- Airline Fleet Segmentation - Analysis of Delta airlines.
- Airbnb - Airbnb Booking Analysis.
- Dispute Prediction - Financial service complaint management.
- Fight Delay Prediction - Transfer learning for flight-delay prediction via variational autoencoders in Keras.
- Electric Fault Prediction - Predict tripping at grid stations by applying simple machine learning algorithms.
- Popularity Prediction in R - Marked Hawkes Point Process .
- Triage - General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems.
- World Bank Poverty I - A comparative assessment of machine learning classification algorithms applied to poverty prediction.
- World Bank Poverty II - Repository for the World Bank Pover-t Test Competition Solution Overseas Company Land Ownership .
- Overseas Company Land Ownership - Identifying foreign ownership in the UK.
- CFPB - Consumer Finances Protection Bureau complaints analysis.
- Cannabis Legalisation Effect - Effects of cannabis legalization on crime.
- Election Analysis - Election Analysis and Prediction Models
- American Election Causal - Using ANES data with causal inference models.
- Campaign Finance and Election Results - Investigating the relation between campaign finance and subsequent election results.
- Conflict Prediction - Notebooks on conflict prediction.
- Burglary Prediction - Spatio-Temporal Modelling for burglary prediction.
- Predicting Disease Outbreak - Machine Learning implementation based on multiple classifier algorithm implementations.
- Road accident prediction - Prediction on type of victims on federal road accidents in Brazil.
- Text Mining - Disaster Management using Text mining.
- Twitter and disasters - Try to correctly predict whether tweets that are about disasters..
- Traffic Prediction - Multi attention recurrent neural networks for time-series (city traffic)
- Predict Crashes - Crash prediction modeling application that leverages multiple data sources.
- Predict Household Poverty - Predict the poverty of households in Costa Rica using automated feature engineering.