Skip to content

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Notifications You must be signed in to change notification settings

kvborkar100/Data_Science_Portfolio

Repository files navigation

Data Science portfolio by Krushna Borkar

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes. This portfolio is a compilation of notebooks which I created for data analysis or for exploration of machine learning algorithms. A separate category is for separate projects.

Stand Alone Projects

A phishing website detector(webapp) based on Random Forest algorithm. User have to enter the URL and,then the features from the URL such as alexa ranking, URL length,Domain, sub domain etc, total 27 features are extracted from URL. Then machine Learning model(Random Forest Classifier) will try to classify it into Phishing, Benign, Suspecious category. i have used dataset from UCI machine Learning Repository.It is available here.

Tools : Python, scikit-learn, matplotlib, Django

Sentiment analysis for Amazon reviews is done using the reviews obtained from users about different products on amazon.com. The dataset consist of 3.6 million training reviews and 400k testing reviews. The is built using 80k reviews and testing is done on 20k reviews. Ensembling Learning (VotingClassifier) is used to train model with 5 different Classifiers giving accuracy of 87%. Dataset

Data Analysis and Visualisation

Tools: Pandas, Numpy, Seaborn, Matplotlib, Plotly

Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.

Complete Data Analysis of dataset containing more than 24 milion rows for speed testcarried out in different states in India with diffrerent technologies. Dataset is obtained from Open Government Data (OGD) Platform India
You can find this dataset Here

This analysis is my minor project for course Python for Data science UCSanDiego edX. I have used an open dataset from Kaggle. This European Soccer Database has more than 25,000 matches and more than 10,000 players for European professional soccer seasons from 2008 to 2016.I have used only players table to get a specific insight from data.

Kaggle Kernels

1. Titanic: Machine Learning from Disaster

Titanic: Machine Learning from Disaster is a knowledge competition on Kaggle. It is known as the the 'Hello World' on Kaggle. This is a binary classification problem where we have to predict whether passenger wil survive or not. Here is my kernel

2. Big Mart Sales Predictions

Big Mart Sales Predictions data set contains data for 1559 products across 10 stores in different cities.I have built a predictive model and find out the sales of each product at a particular store.This problem is good for introdution to feature engineering Here is the Kernel.

3. House Prices: Advanced Regression Techniques

House Prices: Advanced Regression Techniques is the knowledge competition on Kaggle. Thedataset contains many large number of features.This dataset has given me opportunity for feature transformation and data visualization. Here is my kernel

4.Digit Recogninzer

A classic handwritten digit recogninzer competition on Kaggle. built a CNN using keras to identify digits. The model gave 99.58% accuracy on public leaderboard with ranking 296 Top 12%. Kernel

Machine Learning

In this project I have built a classifier that classify a restaurant review as Good or Bad. The restaurant review dataset contains 1000 reviews with labels 1(Good) and 0(Bad). Tools: scikit-learn, NLTK

In this project I have built a regression model to predict the score of a beer based on different features of the beer. dataset can be found here

In this problem I have built a model that can predict the extent of damage that has been done to a building after an earthquake. it is a multiclass classification problem. Dataset is available here

Deep Learning

Apparel classification problem on Analytics vidhya. The model is built using keras with 92.99% accuracy.

About

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages