Skip to content
This repository has been archived by the owner on Apr 5, 2020. It is now read-only.

Latest commit

 

History

History
167 lines (142 loc) · 4.18 KB

File metadata and controls

167 lines (142 loc) · 4.18 KB

machine_learning_project

Algorithms

  • Deep learning
  • Ensemble
  • Neural networks
  • Regression
  • Decision Tree
  • Bayesian
  • Regularization
  • Rule system
  • Dimension Reduction
  • Instanced based
  • Clustering

Deep Learning

Neural network architecture
  • DNN
  • CNN
  • RNN
    • LSTM, GRU, Bidirectional LSTM
  • EA

AI

  • AI search algorithms : Dijktra search, A* search
  • AI game and Rule-based system
Frameworks
  • Tensorflow
  • Keras
  • Theano
  • Neon
  • Pytorch
  • Caffe
  • MXnet
  • Microsoft Cognitive Toolkit
  • DeepLearning4J

Cloud based platforms for DL

  • AWS , Azure, GCP, NVIDIA GPU Cloud
  • AMI : Ec2 - These AMIs come pre-installed with deep learning frameworks, such as TensorFlow, Gluon, and Apache MXNet, that are optimized for the NVIDIA Volta V100 GPUs within Amazon EC2 P3 instances
    • AML : model building feebatch prediction, Real time prediction
  • Google cloud ML Engine

Big data ML

Big Data Machine Learning General Big Data Framework: Big data cluster deployments frameworks HortonWorks Data Platform (HDP) Cloudera CDH Amazon Elastic MapReduce (EMR) Microsoft HDInsight Data acquisition: Publish-subscribe framework Source-sink framework SQL framework Message queueing framework Custom framework Data storage: Hadoop Distributed File System (HDFS) NoSQL Data processing and preparation: Hive and Hive Query Language (HQL) Spark SQL Amazon Redshift Real-time stream processing Machine Learning Visualization and analysis Batch Big Data Machine Learning H2O: H2O architecture Machine learning in H2O Tools and usage Case study Business problems Machine Learning mapping Data collection Data sampling and transformation Experiments, results, and analysis Spark MLlib: Spark architecture Machine Learning in MLlib Tools and usage Experiments, results, and analysis Real-time Big Data Machine Learning Scalable Advanced Massive Online Analysis (SAMOA): SAMOA architecture Machine Learning algorithms Tools and usage Experiments, results, and analysis The future of Machine Learning

Production pipeline - Big data

  • Cluster deployment framework : HDP, Cloudera , Amazon Elastic MapReduce,Microsoft Azure HDInsight

  • Data acquisition :

    • Publish-subscribe frameworks, Source-sink frameworks,
  • Datastorage

    • HDFS, NoSQL
  • Data pocessing & preparation

    • Data cleansing: Involves everything from correcting errors, type matching, normalization of elements, and so on, on the raw data. Data scraping and curating: Converting data elements and normalizing the data from one structure to another. Data transformation: Many analytical algorithms need features that are aggregates built on raw or historical data. Transforming and computing those extra features are done in this step
    • Hive HSQ, SparkSQL, Amazon Redshift MPP, Real-time stream processing

Big data ML platform

Funding by AI category
  • ML apps
  • NLP
  • Computer Vision
  • Smart robot
  • Virtual personal assistant
  • Gesture control
  • Speech recognition
  • Recommendation engine
  • Video content recognition
  • Context aware computing
  • Speech to speech translation
Tuning methods for DL networks
  • Back propagation
  • Learning rate decay
  • Max pooling
  • Long short term memory
  • Continuous bag of words
  • Transfer learning
  • Skipgram
  • Batch normalization
  • Dropout
  • Stochastic gradient descent

AI engine with DL libs

  • Theano
  • Tensorflow
  • CNTK
  • CaffeDL4L
  • Torch
  • SparkML Lib : fast and engine for large scale distributed data processing
  • Apache MXNet : state of the art model CNN and LSTM - Scalable
  • keras :

Public datasets

  • Image : Open Images V4 Google, Microsoft , UC Berkeley
  • Video : Youtube
  • Text : Squad, Yelp
  • Satellite data : Landsat data
  • Audio : Google Audio Set, Librispeech