Skip to content

jawwada/Pyspark-Tensorflow

Repository files navigation

Pyspark-Tensorflow

This project is a journal of my learnings in PySpark, Spark MLLib, deep learning framework Tensor flow and an important library Keras that uses Tensor flow backend.

I mainly show two use cases binary classification and Timeseries Prediction. Both of the uses cases are important because binary classification can be easily extended to multi-class classification and time series modeling in neural works can be enhanced to do sequence modeling.

The algorithms I mainly use are logistic regression, random forests, single layer neural network, multilayer perceptrons and long short term memory neural networks which is a specific type of recurrent neural networks.

The data strucutres used for example spark data frames and tensor flow tensors are bread and butter of data scientists of the community.

The main techniques used demonstrate how to do data preprocessing for the models at hand, use the machine learning process flows in Spark i.e. pipelines, Tensor flows i.e. session and Keras i.e. modeling sequences. Furthermore it also shows how to set the usual loss fuctions e.g sigmoids, softmax and optimizers e.g gradient descent, stochastic gradient descent, adam and adagrad.

I also show a couple of graphs to visualize the results and compare the accuracies of the algorithms.

These examples are not meant to show the best machine learning approaches or accuracies, they are just some examples of how to use these techniques. The file that end with ipynb are the ipython notebooks showing the examples.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published