Skip to content
View ssjswaraj's full-sized avatar

Block or report ssjswaraj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ssjswaraj/README.md

Hi, I'm Swaraj Gupta πŸ‘‹

Welcome to my GitHub profile! I'm passionate about data science, machine learning, data engineering, and open-source projects. I specialize in developing data-driven solutions and building automated systems using cutting-edge technologies.

πŸš€ Experience

Associate Engineer at Smart Analytica

Client: Suryoday Bank
Technologies: PySpark, Machine Learning, Shell Script, Sqoop, Data Warehousing, Oozie, SQL

  • Designed and implemented scalable ETL pipelines using Sqoop, PySpark, Hive, and Shell, ensuring reliable and efficient data flow from source systems to the data warehouse.
  • Optimized Sqoop ingestion performance by selecting the ideal split-by column and tuning the number of mappers, reducing ingestion time by 25%.
  • Developed a pre-approved loan eligibility machine learning model using business-specific datamarts, improving targeting accuracy by 15% and reducing credit risk through better customer profiling.
  • Created and maintained 5+ business-critical datamarts, enabling seamless report generation and reducing data retrieval time by 30%.
  • Automated and orchestrated ETL workflows using Apache Oozie, cutting manual effort and improving job reliability and success rate.
  • Performed exploratory data analysis (EDA) and feature selection on customer transaction data using Python (Pandas, Seaborn, Scikit-learn) to identify key behavioral patterns, laying groundwork for future classification models.
  • Automated schema conversion from Greenplum to Hive using custom Linux shell scripting, accelerating migration efforts and reducing manual work.

πŸ”§ Technologies & Tools:

  • Languages: Python SQL Shell Script

  • Machine Learning & Deep Learning: Scikit-learn TensorFlow Keras

  • Data Engineering: PySpark Hive Hadoop Impala Oozie Kafka

  • Data Visualization: Tableau Matplotlib Seaborn

πŸš€ Projects:

  1. Implementation and Optimization of the Llama 2 Chat Model with Quantized LoRA

    Tools & Libraries: Python, Transformers, PEFT, Bitsandbytes, Accelerate, TRL, Hugging Face Datasets

    • Implemented and fine-tuned the Llama 2 Chat Model using Quantized LoRA (Low-Rank Adaptation) to reduce memory and computational requirements for resource-constrained environments.
    • Loaded and tokenized conversational datasets using Hugging Face libraries, and applied model quantization along with LoRA-based fine-tuning to significantly reduce trainable parameters and memory footprint.
    • Trained the model using the Trainer class with optimized hyperparameters, achieving low-latency response generation and 30–50% reduction in resource usage without sacrificing output qualityβ€”making the model ideal for edge deployments.
  2. Travel Insurance Prediction

    Tools: Machine Learning, Python, Scikit-learn, Streamlit, API Integration

    • Built and optimized a Random Forest model using real-time API data and preprocessing techniques, achieving 84% accuracy in predicting customer travel insurance purchases.
    • Performed hyperparameter tuning with GridSearchCV to enhance model performance and compared multiple algorithms for best results.
    • Developed and deployed an interactive Streamlit web app on the community cloud, enabling users to get instant purchase likelihood predictions.

🌱 Currently Learning:

  • Deep Dive into Advanced Machine Learning & Deep Learning
    Exploring complex models and techniques to push the boundaries of AI applications.
  • Mastering Transformer Architecture & Natural Language Processing (NLP)
    Focusing on state-of-the-art techniques for handling and understanding language in machines.

πŸ“« How to Reach Me:

πŸ’¬ Let's Connect:

I'm always open to collaboration, new ideas, and discussions. Feel free to reach out to me on any of the platforms above.

πŸ“ˆ GitHub Stats:

Your Stats


"Data is the new oil, but we must refine it." β€” Clive Humby

Pinned Loading

  1. LLM-and-NLP LLM-and-NLP Public

    Jupyter Notebook

  2. Deep-Learning-and-CV Deep-Learning-and-CV Public

    Jupyter Notebook

  3. Big_Data Big_Data Public

    Jupyter Notebook

  4. Assignment Assignment Public

    Jupyter Notebook

  5. Machine-Learning Machine-Learning Public

    Jupyter Notebook

  6. Statistics-and-AB-testing Statistics-and-AB-testing Public