Skip to content
View raj26000's full-sized avatar

Block or report raj26000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
raj26000/README.md

Hi, I'm Rajdeep Agrawal πŸ‘‹

A brief introduction about myself - I'm Rajdeep Agrawal, a '22 B.Tech Graduate in Engineering Physics from the Indian Institute of Technology, Hyderabad. I currently work as a Data Scientist at Neuron7.ai, focussing primarily on NLP based applications. Passionate about building products that leverage data (in structured or unstructured form) to automate processes, eliminate redundancies and enable companies serve their customers and clients better.

Technical Skills πŸš€

  • Proficient in Python language, SQL querying and core Data Science concepts and fundamentals.
  • Skilled in Supervised Machine Learning, Exploratory Data Analysis, Ensemble Methods, Deep Learning and NLP tasks like Text Classification, Question Answering, Semantic Textual Similarity, Natural Language Inference along with their implementation using SOTA Transformer models in the PyTorch framework.
  • Well-versed with common Python libraries like NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn for ML and HuggingFace, SpaCy, NLTK, Gensim for NLP.
  • Possess working knowledge of Conda environments and have experience deploying applications using Streamlit Cloud.
  • Have extensively worked with Django Framework in the past, and used it exclusively in my internship at NTT-AT in Summer '21.

C C++ Python HTML5 scikit learn Scipy Keras Numpy Pandas Matplotlib PyTorch Django MS Office PyCharm Jupyter Git LaTeX

Work Samples πŸ› οΈ

I actively participate in Kaggle competitions and open source my notebooks to the community (I'm a Kaggle Notebooks Expert too). This is always a tremendous learning curve - both in the techniques involved as well as in documenting and communicating your code effectively, a skill valued immensely at any organisation. Some of my works are linked below, do upvote if you find them useful!

Project Name Description
Rider-Driven Cancellation Prediction Machine Learning model to predict Rider-driven delivery cancellations in advance (i.e. before getting delivered or marked as cancelled), given the details about riders and orders in a structured tabular format. Used the XGBoost algorithm to create the model, along with Optuna for Hyperparameter tuning. Employed Stratified K-Fold Cross Validation to determine the robustness of model on validation data, as well as to make Out-Of-Fold (OOF) predictions on test data.
Rider-Driven Cancellation EDA, Report Comprehensive Exploratory Data Analysis on the above dataset performed using Matplotlib and Seaborn. Also, created a report explaining the insights derived from the data in detail.
Song Popularity Classification Binary Classification task to predict if a song is popular or not based on features like song duration, acousticness, danceability, energy, instrumentalness, liveness, loudness, etc. EDA performed in same notebook.
Classifying Effectiveness of Argumentative elements in Student essays - TRAINING Training notebook for classifiying argumentative discourse elements like Lead, Position, Claim, Counterclaim, Evidence, Rebuttal and Conclusion found in student essays into 3 classes - Effective, Adequate and Ineffective. Finetuned the DeBERTa Transformer model in PyTorch with a Siamese architecture, which gave the best results on the Multiclass Logloss metric as compared to standard architectures or other models like RoBERTa. Employed Gradient Accumulation, Mixed Precision Training and other PyTorch optimizations to deal with GPU memory constraints and Early Stopping to prevent overfitting.
Classifying Effectiveness of Argumentative elements in Student essays - INFERENCE Inference notebook for the above task. Uses the saved model checkpoint of the best training model to make predictions on test data.
Abstractive Text Summarization Text Summarization task to generate summaries from news articles. Used an encoder-decoder architecture of Bidirectional LSTMs with a Deep 3-layer stacked encoder in the Keras framework. Not the best model in terms of performance, but greatly helped in understanding LSTM architecture and use.

Tech Profiles πŸ‘¨β€πŸ’»

GitHub Leetcode HackerRank Kaggle

Other Achievements πŸ†

  • Two-time winner of the Academic Excellence Award (2019, 2021) for securing the highest GPA in the branch in an academic year.
  • Lead the IITH team for the Bridgei2i Problem Statement at the 9th Inter-IIT Tech Meet, winning a Bronze medal πŸ₯‰.
  • Ranked in the top 12% in the ML Hackathon conducted by the Consulting and Analytics Club of IITG, as part of the 2022 Cascade Cup.
  • Kaggle Notebooks Expert.
  • Published a paper in the Journal of Cosmology and Astroparticle Physics, a peer-reviewed journal, as part of my B.Tech Project.

Let's Connect! πŸ”—

Pinned Loading

  1. Essay-Argument-Effectiveness Essay-Argument-Effectiveness Public

    Source Code and Streamlit deployment for NLP Project - Classification of Argumentative Elements in Student essays based on their effectiveness - Effective, Adequate, Ineffective

    Python

  2. NLG-Arxiv-AbstractGenerator NLG-Arxiv-AbstractGenerator Public

    Generating abstracts of cs.CL category arXiv papers from their titles by finetuning a GPT-2 model in PyTorch.

    Python