Skip to content

My Data Analytics and Machine Learning Journey in 30 Days

Notifications You must be signed in to change notification settings

swaathi317/30DaysOfData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

30DaysOfData

Data is always amazing. If we process them properly, we can obtain great insights. This repository contains my day-to-day activities and learnings of my Data Analytics and Machine Learning Journey. Before recording my experience, here are the skillsets I already possess with respect to Data Analytics and ML:

  • Python
  • Hadoop MapReduce (with both JAVA and Python)
  • Working with Spark RDDs and PySpark
  • Elastic MapReduce (AWS)

Day 1

Learning Math behind algorithms is an essential part of learning Machine Learning.

I started reading Mathematics for Machine Learning book by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. In order to get a visual representation of the topics, I also watched the Essence of Linear Algebra Essential series by 3Blue1Brown (https://youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab).

As Albert Einstein had quoted, "If you can't explain it simply, you don't understand it well enough", so I have written an article explaining the topics that I have learned today in a simple manner.

Article Link: https://swaathi317.medium.com/math-for-machine-learning-part-1-582419c00932

Topics covered in the article:

  1. Scalars
  2. Vectors
    • Basis vectors
    • Span of vectors
    • Linear Dependency
  3. Matrix
    • Matrix multiplication
    • Inverse of matrix
    • Matrix Transpose
  4. Inner Products

I also enrolled myself into the Machine Learning course taught by Andrew NG (https://www.coursera.org/learn/machine-learning). I completed the first week's portions today. I also converted my learning notes into an article.

Article Link: https://swaathi317.medium.com/linear-regression-with-one-variable-b5f59f92ab22

Topics learned today and covered in the article:

  1. Univariate Linear Regression
  2. Cost/Loss function (with one variable)
  3. Gradient Descent (with one variable)

Day 2

I enrolled in a Introduction to Big Data course(https://www.coursera.org/learn/big-data-introduction) today to learn more about addressing the questions of why we need big data and how a big data strategy is formed. I completed the course and I have also converted my personal notes from the course into a Medium article.

Article Link: https://swaathi317.medium.com/big-data-an-introduction-b7bc048081c9

Topics covered in the article:

  1. Introduction to Big Data
  2. Who needs Big data solutions
  3. Building a Big Data strategy for organizations
  4. Steps in the Data Science process

Day 3

I learned about Multivariate Linear Regression today and also completed Week-2 of the Machine Learning course taught by Andrew NG (https://www.coursera.org/learn/machine-learning). I have also written an article regarding Multivariate Linear regression in Medium with my understanding.

Article Link: https://swaathi317.medium.com/multivariate-linear-regression-1c06b12cb982

Topics covered in the article:

  1. Multivariate Linear Regression
  2. Gradient Descent for multiple variables
    • How to check if gradient descent is working properly?
    • Feature Scaling
    • Mean Normalization
    • Features and Polynomial Regression
  3. Normal Equation
    • Normal Equation and Non-invertibility

I also implemented a Univariate Regression Model using the Salary Dataset (https://www.kaggle.com/karthickveerakumar/salary-data-simple-linear-regression), in order to predict the salary of employees.

Implementation Link (contains Dataset and Jupyter Notebook): https://github.com/swaathi317/30DaysOfData/tree/main/Univariate%20Linear%20Regression

Topics learned and implemented:

  • NumPy
  • Matplotlib
  • Pandas
  • Exploratory Data Analysis
  • Model building (Finding the minimum Coefficient, predicting outcome)
  • Evaluation techniques (Mean Square Error, Accuracy)

Day 4

Today I started reading O'Reilly's Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee (https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf). The book covers an in-depth knowledge of Spark concepts and implementation with PySpark.

Topics learned and implemented on a big data set today:

  • Basic Spark concepts: Transformation, Action, Lazy Evaluation
  • Spark APIs
  • Built-in data sources: Spark SQL tables and views, Spark Dataframes

I also worked on Kaggle problem (https://www.kaggle.com/pavellexyr/one-million-reddit-questions) to analyse a dataset with One Million Kaggle Questions.

Here is the Kaggle Link: https://www.kaggle.com/swaathis/factors-that-make-a-question-better

I have also uploaded the file in kaggle - https://github.com/swaathi317/30DaysOfData/blob/main/DataAnalysis/What%20makes%20a%20good%20reddit%20question.ipynb

Day 5, Day 6, Day 7, Day 8, Day 9, Day 10

For the past six days, I enrolled myself in DataStax's Apache Cassandra Developer path curriculum courses. As a part of the DataStax's Developer path curriculum (https://lnkd.in/gb5NX2h6), I completed the below courses.

  1. DS101: Introduction to Apache Cassandra
  2. DS201: Foundations of Apache Cassandra™ and DataStax Enterprise
  3. DS220: Data Modeling with Apache Cassandra™ and DataStax Enterprise

Screenshot 2021-11-15 093723

These courses are made up of video lessons and hands-on exercises that helped me to implement different use-cases. It was an exciting journey to learn and implement Cassandra's data modeling for several use-cases. In the relational world, we are always focused on #normalization but designing the database with the respect to the #queries (workflow of the application) has led Cassandra to achieve the highest data access rate of O(1).

I have completed the Apache Cassandra 3 Developer Associate Certification, which is a proctored online exam conducted by DataStax.

Cassandra developer certification

About

My Data Analytics and Machine Learning Journey in 30 Days

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published