This repository tracks my day-to-day progress for the #60DaysOfUdacity challenge for the Secure and Private AI scholarship by Udacity and Facebook.
All the code posted here will be slightly modified so that it doesn't violate Udacity's Honor Code of Conduct.
No | Project Title | Completed On |
---|---|---|
1 | Implementing Differential Privacy on the MNIST Dataset and performing PATE Analysis on the Model | Before the Challenge |
2 | Published an article on Medium for the above project. | Before the Challenge |
3 | Hackathon Blossom (Flower Classification) | Day 17 |
4 | Hackathon Auto_Matic (Car Classification) | Day 25 |
5 | Hackathon Sentimento (Sentiment Classification) | Day 32 |
6 | Hackathon Sentimento-V2 | Day 42 |
7 | Project Showcase Challenge - Automated Essay Grading | Day 57 |
- Took the #60DaysOfUdacity challenge pledge 🔥
- Completed Lesson 9, Video 3: Encrypted Subtraction and Public Multiplication
- Completed Lesson 9, Video 4: Encrypted Computation in PySyft.
- Finished a project performing Federated Learning on MNIST using PySyft's FederatedDataLoader.
- Learned matplotlib - bar charts, line charts and scatter plots
- Studied more about Encrypted Deep Learning by reading this post - Encrypted Deep Learning Classification with PyTorch & PySyft
- Learned Descriptive Statistics - Dispersion, Correlation
- Completed Lesson 9, Video 6 - Encypted Deep Learning with Pytorch.
- Learned Probability - Dependence and Independence, Conditional Probability
-
Completed Lesson 9, Video 7 - Encrypted Deep Learning with Keras
-
Completed Lesson 9, Video 8 and 9 - Keystone Project Description and course conclusion.
-
Learned Probability - Bayes Theorem, Random Variables, Continuous Distributions
-
Finally Completed the Course 🦓 🔥
- Read this article Federated Learning for Medical Imaging.
- Learned Statistical Hypothesis and Inference
- Read an article on CNNs - The Most Intuitive and Easiest Guide for Convolutional Neural Network
- Read PyTorch docs for the
torch.nn.Conv2d
andtorch.nn.MaxPool2d
layers.
- Read more about Statistical Hypothesis and Inference
- Read up on Gradient Descent - Stochastic Gradient Descent
- Explored https://data.gov.in for public datasets.
- Read about PyTorch data cleaning and data transformation pipelines.
- Learned more about Gradient Descent.
- Learning Backpropogation.
- Missed a Day but now I'm back on track
- Completed Backpropogation. Practiced a lot by differentiating equations by hand.
- Started learning how to scrape the web using BeautifulSoup.
- Started exploring the CO2 emmissions from Fossil Fuels dataset. Will understand the data for a few days then try forecasting, may even use encrypted deep learning to make things interesting.
- Started learning Dimensionality Reduction.
-
Played with the CO2 emissions from Fossil Fuels dataset and gained a few insights.
-
Gained the intuition about the dataset by plotting various graphs.
- Started the day of with this week's #sg_applied_dl group project: Dog Breed Identification competition on Kaggle. My Kaggle kernel: https://www.kaggle.com/ronitmankad/let-s-classify-dog-breeds/notebook
- Did some research on what ML models can I apply on the CO2 emissions dataset.
- Learned a lot about Seaborn and Plotly by doing these projects.
-
Spent the day solving the #sg_applied_dl weekly hackathon project - Dog Breed Identification competition on Kaggle. My Kaggle kernel: https://www.kaggle.com/ronitmankad/let-s-classify-dog-breeds
-
Created a custom dataset using by subclassing torch.utils.dataset.Dataset module and transformed the dataframe so that it could be fed into the classifier.
-
Started the #sg_hackathon-orgnizrs weekly hackathon: Hackathon Blossom (Flower Classification).
-
Developed a processing pipeline for the dataset as well as explored its features through visualization.
- Completed the #sg_hackathon-orgnizrs weekly hackathon: Hackathon Blossom (Flower Classification).
- Fine tuned the model and cleaned up the code so that it can be submitted for evaluation.
- Learned quite a lot about hyperparameter optimization and transfer learning while doing this hackathon.
- I would like to encourage all the paticipants: @Helena Barmer @Abhishek Lalwani @sourav kumar @Jess @Shahana @Vikas Sharma @Shanmugapriya @Ruchika Khemka @Naas Mohamed @Deepak @Shubhangi Jena @par @Droid @KT @Francesca @Jaffar @Aniket Thomas @Vebby @Archit @Halwai Aftab Hasan @Shivam Raisharma @Hitoishi Das @cibaca @Shashank Jain @Nirupama Singh @Perez Ogayo @shivu @Anita Goldpergel @Ivy
-
Finally submitted my kernel for Hackathon Blossom. Thank you to all the organizers in #sg_hackathon-orgnizrs. My Kernel link: https://www.kaggle.com/ronitmankad/flower-classification-using-pytorch
-
Started reading Hands On Machine Learning with ScikitLearn and Tensorflow.
- Started Chapter 3: Classification of the Hands on ML book.
- Learned about precision and recall and the ROC curve.
- Kept on reading Ch 3 Classification of the Hands-On-ML book.
- Learned about Error Analysis in classification.
- Read about Multilabel and Multioutput classification.
- Read about Sentiment Analysis in Coded Mixed Language (specifically Hindi-English) from this paper.
- Brushed up on text processing and NLP basics for an interview.
- Learned more about the mathematical notations in machine learning.
- Read up on Random Variables and Unbiased Estimators.
- Researched about sentence similarity prediction.
- Started the #sg_hackathon-orgnizrs weekly Hackathon - Auto_Matic!
- Explored the dataset and created a data processing pipeline.
- Will now create different models and try to increase the accuracy.
- Testing different models on the dataset for the hackathon.
- Trying to improve the validation and testing accuracy.
- Will try more image augmentation techniques.
- Completed #sg_hackathon-orgnizrs Hackathon Auto_Matic (even though my model's accuracy was very low).
- Kaggle Kernel Link: https://www.kaggle.com/ronitmankad/ronit-mankad-auto-matic
- Continued reading the 100 page ML book, read about Decision Tree classifiers.
- Continued reading the 100 Page ML book, learned more about Decision Trees and SVMs.
- Worked on the hackathon Auto_matic dataset and tried to learn from my mistakes.
- Completed the Decision Trees chapter and implemented Linear Regression, SVM and a Decision Tree in Python from scratch.
- Tried to implement a SVM with different kernels from scratch but ran into some problems, will continue tomorrow.
- Started learning K Nearest Neighbors.
- Started learning the different implementations of sentiment analysis for the next #sg_hackathon-orgnizrs Hackathon.
- Read two papers on Sentiment Analysis on Code Mixed languages (for an interview I have).
- Explored and processed a text dataset containing Hindi-English mixed tweets.
- Developed an embedding matrix for the data using Glove Embeddings.
- Worked on Sentiment Analysis in Code Mixed language.
- Developed a Sub-Word RNN.
- Also started Hackathon Sentimento! All the best to all the participants.
- Worked on the Sentiment Classification Hackathon Sentimento.
- Tried on different pre-processing techniques and different models on the dataset.
- Completed Hackathon Sentimento!
- Learned a lot about TFIDF and text preprocessing.
- Scored 0.89 on the leaderboard.
- Read more about K Nearest Neighbours from the 100 Page ML book.
- Also stareted reading about featuring engineering.
- Completed the KNN chapter from the 100 Page ML book.
- Implemented Decision Tree and visualized it's nodes by follwing this tutorial - https://mlcourse.ai/articles/topic3-dt-knn/
- Read this Medium Article - https://towardsdatascience.com/how-to-farm-kaggle-in-the-right-way-b27f781b78da
- Read and implemented this excellent Jupyter Notebook - https://mlcourse.ai/articles/topic2-visual-data-analysis-in-python/
- Completed implementing the Decision Tree code.
- Started the IEEE-CIS Fraud Detection competition on Kaggle - https://www.kaggle.com/c/ieee-fraud-detection
-
Learned about Ordinary Least Squares, Maximum Likelihood Estimation and Bias-Variance Decomposition from this excellent tutorial - https://mlcourse.ai/articles/topic4-part1-linreg/
-
Won the gold badge for Hackathon Sentimento! Thanks to all the good people at #sg_hackathon-orgnizrs
-
Created a new Kernel for my winning solution for Hackathon Sentimento - https://www.kaggle.com/ronitmankad/sentiment-analysis-eda-and-model-creation
-
Started exploring the dataset for the IEEE-Fraud Detection competition on Kaggle.
- Started the #sg_hackathon-orgnizrs weekly hackathon - Hackathon Sentimento-2.
- Created a kaggle kernel and started working on the dataset.
- Learned more about RNNs, LSTMs and GRUs for Sentiment Classification.
- Also learned more about Google's BERT by watching this video https://www.youtube.com/watch?v=BaPM47hO8p8
- Learned the basics of torchtext and started making a torch dataset for the hackathon data.
- Learned about Attention layers.
- Fine-Tuned a BERT model on Colab (P.S: Even on Colab each epoch takes hours).
- Worked on my kernel for Hackathon Sentimento.
- Worked some more on Hackathon Sentimento-V2.
- Implemented a GRU with Adaptive pooling into my kernel.
- Also implemented the BERT model on dataset, but it didn't perform well.
- Read about encoders and transformers in Neural Networks.
- Learned about Sklearn pipelines.
- Started reading about XGBoost.
- Worked some more on the Hackathon Sentimento kernel.
- Submitted my kernel for hackathon Sentimento-V2 - https://www.kaggle.com/ronitmankad/sentiment-analysis-bert-pytorch
- Learned about feature engineering and data imputation techniques from the 100 Page ML book.
- Learned about LSTMS and GRUs in detail from this video - https://www.youtube.com/watch?v=8HyCNIVRbSU
- Learned about the vanishing and exploding gradient problem by watching this video - https://www.youtube.com/watch?v=qO_NLVjD6zE
- Read on why Character Level RNNs/LSTMS can be better for text classification/generation in some cases from this blog post - https://towardsdatascience.com/besides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10
- Got 3rd place in Hackathon Sentimento-V2! Thank you to all the wonderful people at #sg_hackathon-orgnizrs.
- Started working on Hackathon Forcast!
- Started learning about time forecasting and Seq2Seq models.
- Learned about the data and forecasting approach by reading this artilce - https://towardsdatascience.com/web-traffic-forecasting-f6152ca240cb
- Finally submitted the initial version of my kernel for Hackathon Forcast.
- Learned a lot about time series forecasting.
- Learned about Stationary and Non-Stationary series by reading this article - https://towardsdatascience.com/basic-principles-to-create-a-time-series-forecast-6ae002d177a4.
- Learned more about Seq2Seq model and how to preprocess the data for them.
- Read more about time series forecasting and different transformations required on the data.
- Transformed the data from a non-stationary series to a stationary series.
- Tried to recreate a Keras Seq2Seq model in Pytorch.
- Viewed more solutions on Kaggle and made notes on the different approaches used by the winners.
- Performed some data analysis on the Hackathon Forcast data and gained key insights from it.
- Fine-tuned my LSTM model.
- Increased the accuracy by taking Median of Medians.
- Got my first internship as a Machine Learning Engineer!
- Read an article on how to implement Sentiment Analysis in production - https://mc.ai/deep-learning-in-production-sentiment-analysis-with-the-transformer-model/
- Read Google's guide to text classification - https://developers.google.com/machine-learning/guides/text-classification/
- Researched about Transfer Deep Learning in NLP tasks.
- Working on a sentiment analysis task, imported the data from 4000 csv files and merged the data into a single dataframe.
- Cleaned the data by removing punctuations and stopwords.
- Read a paper on Semi-Supervised Learning - https://pdfs.semanticscholar.org/4224/2edf4204c8a739f8f016405d0804e6c8f409.pdf
- Implemented the Semi-Supervised Learning paper's methodoloy on my sentiment dataset.
- Created and finetuned a Sub-Word RNN as described in the paper here - https://arxiv.org/pdf/1611.00472.pdf
- Learned about Latent Dirichlet Allocation (LDA Analysis) for NLP - https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
- Trained a KNN Classifier and augmented it's predictions with my LSTM predictions to then perform semi-supervised learning on my data.
- Went through all the tutorials for Syft and revised the core concepts - https://github.com/OpenMined/PySyft/tree/master/examples/tutorials
- Started working on my project for the Project Showcase challenge.
- Created PySyft's private dataloaders for the data.
- Read more about encrypted deep learning and brushed up on the basics of fixed predicion encoding.
- Completing my project for the Project Showcase Challenge
- Made a README file explaining the project.
- Followed the PySyft tutorial on performing encry on MNIST - https://github.com/OpenMined/PySyft/blob/dev/examples/tutorials/Part%2012%20bis%20-%20Encrypted%20Training%20on%20MNIST.ipynb
-
Finally submitted my project for the Project Showcase Challenge!! Here's the project repo - https://github.com/aksht94/UdacityOpenSource/tree/master/Ronit
-
Resumed my Sentiment Analysis in Code-Mixed language project.
-
Achieved ~94% accuracy on the validation set using a Sub-Word level RNN.
- Read up on Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features https://aclweb.org/anthology/N18-1049
- Resumed reading the 100 Page ML book.
- Read about Regularization and Model Performance Evaluation in detail.
- Read about Sentiment Analysis on Conversational Texts - https://www.aclweb.org/anthology/W15-1829
- Read from the 100 Page ML book - Multi Layer Perceptron and Neural Network Architecture.
- Came across this amazing blogpost - https://rsilveira79.github.io/fermenting_gradients/machine_learning/nlp/pytorch/text_classification_roberta/
- Started the code for implementing text classification using RoBERTa on my code-mixed language dataset.
- Learned Kernel Regression and Gradient Boosting from the 100 Page ML book.
- Read this amazing paper - https://eprint.iacr.org/2018/1056.pdf
- Still finetuning the BERT model and trying different preprocessing techniques on my Code-Mixed language dataset.
- Started Week 1 of Andrew Ng's Machine Learning course.
- Forked PySyft and started reading it's source code.
- Tried solving beginner level bugs in PySyft, trying to contribute to open source.
- Read an article on Clustering - https://www.analyticsindiamag.com/most-popular-clustering-algorithms-used-in-machine-learning/
- Completed Week 1 of Andrew Ng's Machine Learning course. Started Week 2.
- Learning Regularized Linear Regression
- Still learning more about PySyft by reading through it's codebase.
- Worked on
syft.messaging
package and tried solving issue #2512 raised by Andrew Trask.
The Final Day!
- Completed Week 2 of Andrew Ng's Machine Learning course. Started Week 3.
- Learning Hypothesis Representation and Advanced Optimization techniques.
- Worked on PySyft source code. Removed the
sklearn
dependency from the source code.