A course repo for projects related to data science. In this course I had the opportunity to learn more about:
- Introduction to big data analytics, machine learning, understanding data type, tabular data and unstructured data, Python and visualization techniques
- Introduction to popular techniques to handle complex high-dimensional data, including categorical data, time-series data, text data, and image data.
- Introduce machine learning models including linear models, Kernel methods, Tree-based models, bagging and boosting and how it balances the bias/variance and controls the model complexity.
- Modern computation architecture for big data, introduction to distributed computing, data parallelism, model parallelism, design parallel algorithms for machine learning applications, brief tutorial of Apache spark.
- How to use the model for decision analysis and support with the statistical Decision Theory by including the Minimax or Bayesian decision rule. Introduce model Agnostic interpretation techniques such as SHAP values for better model interpretation.
A list projects we completed in this course and can be found in this repo are as follows :
- Airline tweet classification
- Click-through rate prediction
- Small MNIST data classification