Skip to content

ksulima/Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 

Repository files navigation

Data Science Portfolio

This repository gathers side projects I worked on or currently working on. The goal of projects is to use my data science/machine learning skills to explore interesting things. For each project I write short summary. I mainly use Python in Jupyter Notebook. To see full analysis and code click on the projects headline.

Enjoy reading and if you have some questions, you can contact with me through my Linkedin profile.

Projects:

  • Unsupervised Word Segmentation into Subword Units.
  • GloVe embeddings trained on our own corpus
  • Visualization in tensorboard.

Keywords (keras, subwords-nmt, GloVe, tensorboard, python)


In this project my goal is to use neural networks to structured, time series data. I use Keras to implement method "Entity Embeddings" originally described by Cheng Guo and Felix Berkhahn in a paper.

Motivation

Nowadays most of the research in deep learning field is focused on unstructed data, like computer vision, natural language processing, where neaural networks bring outstanding results comparing to others methods. Exploring deep learning to structured data is not in a academic spotlight, whereas lots of business problems and decisions are related to structured data. My plan is to apply neural networks to practical real-world problem.

  • Apply Entity Embeddings for catergorical variables
  • Build neural networks on structured data to predict sales

Keywords (Entity Embeddings, Keras, Time Series, Python)


  • Image Preprocessing using Keras ImageDataGenerator.
  • Reference CNN model from the scratch.
  • Using data augmentation to mitigate overfitting.
  • Using some publicly available CNN architectures pre-trained on ImageNet dataset
  • Keywords (keras, ConvNet, VGG16, image classification, python)


  • use Deep Feature Synthesis to generate set of features in a automated way.
  • parallelize computation with Dask.
  • hyperparameter tunning of xgboost.
  • Keywords (Featuretools, Python, Dask, AWS)


  • my tutorial to explain concept of pipelines with practical examples in python and scikit-learn library.
  • Keywords (pipeline, scikit-learn, python, custom transformers)


  • Check some general assumption about data.
  • Use Dickey-Fuller Test to test statonaryand forecast the time series using ARIMA.
  • Introduce Prophet library.
  • Keywords (time series, ARIMA, Prophet, python)


About

A portfolio of my data science projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published