Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
195 lines (134 sloc) 15.2 KB

Data Science Projects

This page describes, in some detail, the data science projects I have recently completed.

Contact Information

Feel free to contact me:

Topics

Natural Language Processing Machine Learning Classification Projects Deep Learning Machine Learning Regression Models Unsupervised Learning

Natural Language Processing

Natural-language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to fruitfully process large amounts of natural language data. Challenges in natural-language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.

Notebooks and descriptions Contact Information

Notebooks and descriptions

Notebook Brief Description
neural-language-model-and-spinoza Spinoza's Ethics is used to build a language model for text generation with recurrent neural nets.
sentiment-analysis A "reverse sentiment analysis" using Bernoulli Naive Bayes was performed on movie reviews (already classified) to identify which words appear more frequently on reviews from each class.
topic-identification Tutorial about topic identification (in progress)
alphabet-human-thought/meaning-of-sentences In this notebook, it will be shown that using logic formalisms one can find more generic translation mechanisms (in progress)
alphabet-human-thought/sentence-structure We will show how to develop formal models for patterns in sequence of words using grammars and parsers (in progress)

Machine Learning Classification Projects

image title image title Image title Image title Image title Image title License: MIT

Notebooks and descriptions Contact Information

Notebooks and descriptions

Notebook Brief Description
predicting-comments-on-reddit In this project I determine which characteristics of a post on Reddit contribute most to the overall interaction as measured by number of comments
tennis-matches-prediction The goal of the project is to predict the probability that the higher-ranked player will win a tennis match. I will call that a win(as opposed to an upset)
churn-analysis This project was done in collaboration with Corey Girard. A mobile device company is having a major problem with customer retention. Customers switching from one company to another is called churn. Our goal in this analysis is to understand the problem, identify behaviors which are strongly correlated with churn and to devise a solution
click-prediction Many ads are actually sold on a "pay-per-click" (PPC) basis, meaning the company only pays for ad clicks, not ad views. Thus your optimal approach (as a search engine) is actually to choose an ad based on "expected value", meaning the price of a click times the likelihood that the ad will be clicked [...] In order for you to maximize expected value, you therefore need to accurately predict the likelihood that a given ad will be clicked, also known as "click-through rate" (CTR). In this project I will predict the likelihood that a given online ad will be clicked

Deep Learning Projects

Image title image title image title image title Image title License: MIT



Notebooks and descriptions Contact Information

Notebooks and descriptions

Notebook Brief Description
painter-identifier I built a Convolutional Neural Net to identify the artist of a painting via transfer learning, instantiating the convolutional part of the Inception V3 model, and training a fully-connected network on top.
bitcoin-price-analysis I built predictive models for Bitcoin price data using recurrent neural networks (LSTMs). Correlations between altcoins are also considered.
keras-tf-tutorial Neural networks tutorial where I build fully-connected networks and convolutional neural networks using both Keras and TensorFlow respectively (in progress).
transfer-learning-mini-tutorial I illustrate the use of transfer learning using the Inception V3 deep neural network model.




Machine Learning Regression Models

Image title Image title Image title Image title



Notebooks and descriptions Contact Information

Notebooks and descriptions

Notebook Brief Description
retail-store-expansion-analysis-with-lasso-and-ridge-regressions Based on a dataset containing the spirits purchase information of Iowa Class E liquor licensees by product and date of purchase this project provides recommendations on where to open new stores in the state of Iowa. To devise an expansion strategy, I first needed to understand the data and for that I conducted a thorough exploratory data analysis (EDA). With the data in hand I built multivariate regression models of total sales by county, using both Lasso and Ridge regularization, and based on these models, I made recommendations about new locations.
conjoint-analysis Conjoint analysis is a technique that allows researchers to predict consumers' choice share. The analysis can be programmed using standard question types, such as the MaxDiff variation of the Matrix Table question. Instead of directly asking the survey respondents which attributes they find most relevant, conjoint analysis asks respondents to evaluate potential product profiles which include multiple product features. There are several ways to show to respondents the product profiles. In Choice-Based Conjoint (CBC) respondents are shown multiple product conceptsn and asked which option they would choose. By varying the features shown to the respondents and observing their responses to the product profiles, one can statistically deduce the most desired product features and which attributes have the most impact on choice. The end result is a set of preference scores or part-worth utilities for each level of each attribute. In this notebook I show how to use Python to calculate the utilities. The notebook is heavily based on this course and this book.

Unsupervised Learning

image title image title Image title Image title Image title Image title License: MIT

Notebooks and descriptions Contact Information

Notebooks and descriptions

Notebook Brief Description
topic-modeling In this notebook, I will use Python and its libraries for topic modeling. In topic modeling, statistical models are used to identify topics or categories in a document or a set of documents. I will use one specific method called Latent Dirichlet Allocation (LDA) and apply it to labels on research papers.
clustering-for-customer-segmentation In this project I will apply clustering algorithms to the dataset Wholesale Customers Data Set from the UCI Machine Learning Repository. The dataset contains customers' spending amounts of several product categories.
network-analysis Neural networks tutorial where I build fully-connected networks and convolutional neural networks using both Keras and TensorFlow respectively (in progress).
You can’t perform that action at this time.