Data Science portfolio by Krushna Borkar

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes. This portfolio is a compilation of notebooks which I created for data analysis or for exploration of machine learning algorithms. A separate category is for separate projects.

Stand Alone Projects

1. Malicious URL Detection using Machine Learning

A phishing website detector(webapp) based on Random Forest algorithm. User have to enter the URL and,then the features from the URL such as alexa ranking, URL length,Domain, sub domain etc, total 27 features are extracted from URL. Then machine Learning model(Random Forest Classifier) will try to classify it into Phishing, Benign, Suspecious category. i have used dataset from UCI machine Learning Repository.It is available here.

Tools : Python, scikit-learn, matplotlib, Django

2. Amazon Reviews Sentiment Analysis

Sentiment analysis for Amazon reviews is done using the reviews obtained from users about different products on amazon.com. The dataset consist of 3.6 million training reviews and 400k testing reviews. The is built using 80k reviews and testing is done on 20k reviews. Ensembling Learning (VotingClassifier) is used to train model with 5 different Classifiers giving accuracy of 87%. Dataset

Data Analysis and Visualisation

Tools: Pandas, Numpy, Seaborn, Matplotlib, Plotly

1. 911 Calls - Exploratory Data Analysis

Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.

2. Internet Speed in India Data Analysis

Complete Data Analysis of dataset containing more than 24 milion rows for speed testcarried out in different states in India with diffrerent technologies. Dataset is obtained from Open Government Data (OGD) Platform India
You can find this dataset Here

3. Soccer Data Analysis

This analysis is my minor project for course Python for Data science UCSanDiego edX. I have used an open dataset from Kaggle. This European Soccer Database has more than 25,000 matches and more than 10,000 players for European professional soccer seasons from 2008 to 2016.I have used only players table to get a specific insight from data.

Kaggle Kernels

1. Titanic: Machine Learning from Disaster

Titanic: Machine Learning from Disaster is a knowledge competition on Kaggle. It is known as the the 'Hello World' on Kaggle. This is a binary classification problem where we have to predict whether passenger wil survive or not. Here is my kernel

2. Big Mart Sales Predictions

Big Mart Sales Predictions data set contains data for 1559 products across 10 stores in different cities.I have built a predictive model and find out the sales of each product at a particular store.This problem is good for introdution to feature engineering Here is the Kernel.

3. House Prices: Advanced Regression Techniques

House Prices: Advanced Regression Techniques is the knowledge competition on Kaggle. Thedataset contains many large number of features.This dataset has given me opportunity for feature transformation and data visualization. Here is my kernel

4.Digit Recogninzer

A classic handwritten digit recogninzer competition on Kaggle. built a CNN using keras to identify digits. The model gave 99.58% accuracy on public leaderboard with ranking 296 Top 12%. Kernel

Deep Learning

1. Apparel Recognintion Analytics Vidhya

Apparel classification problem on Analytics vidhya. The model is built using keras with 92.99% accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Sentiment Analysis Amazon Reviews		Sentiment Analysis Amazon Reviews
test - urldetect		test - urldetect
911 Calls EDA .ipynb		911 Calls EDA .ipynb
How to choose perfect Beer.ipynb		How to choose perfect Beer.ipynb
Identify Apparel.ipynb		Identify Apparel.ipynb
Internet Speed in India - EDA final.ipynb		Internet Speed in India - EDA final.ipynb
Predict Damage to building.ipynb		Predict Damage to building.ipynb
README.md		README.md
Restaurant Reviews classification using NLP.ipynb		Restaurant Reviews classification using NLP.ipynb
Restaurant_Reviews.tsv		Restaurant_Reviews.tsv
Soccer Data Analysis.ipynb		Soccer Data Analysis.ipynb

kvborkar100/Data_Science_Portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Science portfolio by Krushna Borkar

Stand Alone Projects

1. Malicious URL Detection using Machine Learning

2. Amazon Reviews Sentiment Analysis

Data Analysis and Visualisation

Tools: Pandas, Numpy, Seaborn, Matplotlib, Plotly

1. 911 Calls - Exploratory Data Analysis

2. Internet Speed in India Data Analysis

3. Soccer Data Analysis

Kaggle Kernels

1. Titanic: Machine Learning from Disaster

2. Big Mart Sales Predictions

3. House Prices: Advanced Regression Techniques

4.Digit Recogninzer

Machine Learning

1. Restaurant Reviews classification using Natural Language processing

2. Prediction of score of a Beer

3. Multiclass classification - to predict damage to building

Deep Learning

1. Apparel Recognintion Analytics Vidhya

About

Resources

Stars

Watchers

Forks

Languages