Portfolio of data science projects completed by me for academic, self learning, and hobby purposes. This portfolio is a compilation of notebooks which I created for data analysis or for exploration of machine learning algorithms. A separate category is for separate projects.
A phishing website detector(webapp) based on Random Forest algorithm. User have to enter the URL and,then the features from the URL such as alexa ranking, URL length,Domain, sub domain etc, total 27 features are extracted from URL. Then machine Learning model(Random Forest Classifier) will try to classify it into Phishing, Benign, Suspecious category. i have used dataset from UCI machine Learning Repository.It is available here.
Tools : Python, scikit-learn, matplotlib, Django
Sentiment analysis for Amazon reviews is done using the reviews obtained from users about different products on amazon.com. The dataset consist of 3.6 million training reviews and 400k testing reviews. The is built using 80k reviews and testing is done on 20k reviews. Ensembling Learning (VotingClassifier) is used to train model with 5 different Classifiers giving accuracy of 87%. Dataset
Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.
Complete Data Analysis of dataset containing more than 24 milion rows for speed testcarried out in different states in India with diffrerent technologies. Dataset is obtained from Open Government Data (OGD) Platform India
You can find this dataset Here
This analysis is my minor project for course Python for Data science UCSanDiego edX. I have used an open dataset from Kaggle. This European Soccer Database has more than 25,000 matches and more than 10,000 players for European professional soccer seasons from 2008 to 2016.I have used only players table to get a specific insight from data.
Titanic: Machine Learning from Disaster is a knowledge competition on Kaggle. It is known as the the 'Hello World' on Kaggle. This is a binary classification problem where we have to predict whether passenger wil survive or not. Here is my kernel
Big Mart Sales Predictions data set contains data for 1559 products across 10 stores in different cities.I have built a predictive model and find out the sales of each product at a particular store.This problem is good for introdution to feature engineering Here is the Kernel.
House Prices: Advanced Regression Techniques is the knowledge competition on Kaggle. Thedataset contains many large number of features.This dataset has given me opportunity for feature transformation and data visualization. Here is my kernel
A classic handwritten digit recogninzer competition on Kaggle. built a CNN using keras to identify digits. The model gave 99.58% accuracy on public leaderboard with ranking 296 Top 12%. Kernel
In this project I have built a classifier that classify a restaurant review as Good or Bad. The restaurant review dataset contains 1000 reviews with labels 1(Good) and 0(Bad). Tools: scikit-learn, NLTK
In this project I have built a regression model to predict the score of a beer based on different features of the beer. dataset can be found here
In this problem I have built a model that can predict the extent of damage that has been done to a building after an earthquake. it is a multiclass classification problem. Dataset is available here
Apparel classification problem on Analytics vidhya. The model is built using keras with 92.99% accuracy.