I am passionate about Data Science, Data Engineering and Machine Learning. With the proliferation of data in today's digital age, data science has become a crucial tool for businesses, governments, and other organizations to understand and analyze complex data sets in order to make better decisions, improve operations, and drive growth. I have gained in depth-knowledge in these fields by taking online courses, reading relavant books and articles about Big Data Analytics, Machine Learning, Statistics,Data Engineering and Data Science.
I'm currently pursuing Masters in Data Analytics from San Jose State University,Department of Applied Data Science. With 4.5 years of experience in data modeling and analysis, I bring a unique perspective to the field. I have a comprehensive understanding of Statistics, Python, SQL, Big Data, and Machine Learning, and have worked with cutting-edge technologies like GANs, Large Language Models (LLMs), and Transformers. Additionally, I have completed courses in Machine Learning Fundamentals, Natural Language Processing (NLP), Deep Learning, Big Data, and Statistics from DataCamp, GeekForGeeks, Coursera, and Udemy, which have provided me with a strong theoretical foundation in these fields.To gain hands on experience, I worked on projects ranging from retail,banking,movies and automobile industries.My portfolio on GitHub showcases a selection of my projects and experiences, demonstrating my capabilities as a data science and machine learning professional.
Currently, I am working on a cutting-edge project developing a Multi Media Chatbot AI testing tool that incorporates natural language processing (NLP) and conversational AI. To enhance the functionality of the tool, I am also using automatic speech recognition and text-to-speech technology, which enables the chatbot to have more human-like interactions with users. Furthermore, I am actively involved in deep learning projects, where I am creating animated versions of real images using StyleGAN, JojoGAN, and Cartoon GAN. These deep learning models allow me to generate novel images that are similar in style to the input images.
I am always eager to learn and expand my knowledge in these fields, and I have a passion for using data and technology to make a positive impact. My portfolio on GitHub showcases a selection of my projects and experiences, and demonstrates my capabilities as a Data Science, Data Engineer and Machine Learning Professional.Whether it's through building Predictive Models, Computer Vision, Recommendation System, A/B Testing,Text Mining or Developing Data Pipelines, I am dedicated to using my skills to drive innovation and solve complex problems. I am excited to share my work with others and contribute to the Data Science and Machine Learning Community. Thank you for considering my portfolio.π
I started learning python in my first semester which helped to understand data science and machine learning.I also learned R language which is one of the important languages in statistics and machine learning. Below is a list of skills that I have gained through my experience in the field of data science and machine learning. Learning these essential skills and techniques helped me to built and deployed projects successfully.
Thank you for visiting my portfolio! I have had an awesome experience working on machine learning and deep learning projects and am excited to share them with you. Please find below links to my projects. For more detailed information about each project, including results and descriptions, please click on the project link. Below is a summary of my entire portfolio. Thank you for considering my work.π
Bank Customers Churn Prediction Model | Gym Members attendance Prediction Model |
---|---|
RFM Analysis and Customer Segmentation | Time Series Modelling using ARIMA & SARIMAX Model |
---|---|
Movies Recommendation System | Collaborative Filtering for Movies using Matrix Factorisation |
---|---|
House Price Prediction Model | Popular Recipe Predition Model |
---|---|
Speech Emotion Recognition |
---|
Animated Face Generation using GAN |
---|
Fashion Clothes Classification Using Convolutional Neural Network | Sign Language Indentification |
---|---|
Credit Card Customer Segmentation |
---|
Analyzing Customer Shopping Behavior using AWS Services | End to End ML Model Using Pyspark |
---|---|
Analysing Car's Performance using Pyspark | Performance Comparsion between Mongodb and Cassandra |
---|---|
Twitter Sentiment Analysis | Extracting Stock Sentiment from News Headlines |
---|---|
Market Basket Analysis for Grocery Store | Restaurant Recommendation System based on Yelp Data |
---|---|
Netflix Movies Analysis |
---|
Danny's Diner Case Study | Foodie-Fi Case Study |
---|---|
Online News Exhibition | Best Selling Video Games |
---|---|
Multi-Variate Regression Using Statsmodels and Gradient Descent Optimisation | Finding the best version for Mobile Game using A/B Testing |
---|---|
β π Masters in Data Analytics | San Jose State University, San Jose, California [Jan 2022 - December 2023]
β π Master of Business Administration | Indian Institute of Management, Ranchi, India[June 2016 - Feb 2018]
β π Bachelor of Technology | Govind Ballabh Pant University, India[August 2010 - June 2014]
β π Data Analyst
β π Data Scientist
β π Machine Learning for Time Series Data in Python
β π Market Basket Analysis in Python
β π Supervised Machine Learning with Scikit-Learn
β π Introduction to Tensorflow in Python
β π Image Processing With Keras In Python
β π Building Chatbots Using Google Dialogflow
β π Deep Learning With Keras
β π Image Processing With Python
β:bookmark: Customer Churn Prediction in Python
β π ARIMA Models in Python
β π Hypothesis Testing in Python
β π Customer Analytics and A/B Test in Python
β π Intermediate Regression with Statsmodels in Python
β π Introduction to Regression with Statsmodels in Python
β π Introduction to Statistics in Python
β π Sampling Techniques in Python
β π Statistics Fundamentals in Python
β π Big Data Fundamentals with Pyspark
β π Building Recommendation System with Pyspark
β π Writing Functions and Stored Procedures in SQL Server
β π Learning Hadoop
β π Data Cleaning Using Pyspark
β π Introduction to Airflow in Python
β π Understanding Data Engineering
β π Analysis Marketing Campaigns Using Pandas
β π Introduction to Deep Learning In Python
β π Analysing Data In Tableau
β π Deep Learning in Pytorch
β π Advanced DeepLearning With Keras
β π Modernising Data Lakes and Data Warehouse with Google Cloud
β π Google Cloud Big Data and Machine Learning Fundamentals
β π Data Science with Tableau
β π Amazon Web Services: Data Analytics
β π Apache Spark Essential Training
β π Advanced SQL for Data Science: Time Series
β π» Data Engineer |Roku Inc [May 2023 - Present]
β π» Senior Executive |Oil and Natural Gas Corporation Ltd [July 2018 - April 2021]
β π» Intern | Wipro Ltd [April 2017 - June 2017]
β π» SAP Consultant | Tata Consultancy Services Ltd [September 2014 - June 2016]
βπ DATA-255 Deep Learning
ββ ββ π Homework 1
ββ ββ π Homework 2
ββ ββ π Homework 3
ββ ββ π Homework 4
βπ DATA-245 Machine Learning
ββ ββ π Homework 1
ββ ββ π Homework 2
ββ ββ π Homework 3
ββ ββ π Homework 4
ββ ββ π Homework 5
β π DATA-228 Big Data
ββ ββ π Homework 1
ββ ββ π Homework 2
ββ ββ π Homework 3
ββ ββ π Homework 4
ββ ββ π Homework 5
ββ π©βπ« Leadership Skills
ββ π©βπ« Biased for Action
ββ π©βπ« Communication Skills
ββ π©βπ« Team Work
ββ π©βπ« Curios
ββ π©βπ« Problem-solving Skills
ββ π©βπ« Time Management
ββ π©βπ« Accountiblity
ββ π Learn Python in 30 Days
ββ π How to Prepare for SQL Interviews in 15 Days
ββ π What is Dummy Variable Trap and How it can be avoided?
ββ π Hyper-Parameter Tuning for Machine Learning Models Using Optuna
ββ π Detecting and Resolving Duplicate Records using Record Linkage
ββ π Two Simple Tests to check Normality of the data
ββ π How to perform Market Basket Analysis using Apriori Algorithm and Association Rules
ββ π How to determine the order of ARIMA or SARIMA Models
ββ π How to Perform Hyper-Parameter Tuning in Artificial Neural Networks
ββ π Different-Loss-Functions-used-in-Regression
ββ π Using Hexbin Plots to visualise relationship between two variables
ββ π Different Correlation Coefficients to measure the relationship between two variables
ββ π Different Methods to replace Missing Values in Data
ββ π How to find Optimal Parameters for Regression Model using Scipy
ββ π What is Pandas Profiler and Why it is used
ββ π OLAP Operations in SQL
ββ π Feature-Encoding-Using-K-Fold-Target-Encoding
ββ π Different Linkage Methods used in Hierarchical Clustering
ββ π What is LIME and how it can be used?
ββ π Benefits Of Dropblock Over Dropout in CNN
ββ π Benefits Of Dropblock Over Dropout in CNN
ββ π Difference between Prediction Interval and Confidence Interval
ββ π Understanding Keras Embedding for NLP
ββ π Augmenting Text Using Large Language Models GPT-2 GPT-3 BERT
ββ π Equiangular Basis Vectors:a better alternative to softmax for classification tasks
ββ π Federated Learning by Google bringing Privacy to Machine Learning
ββ π Different Types of Filters in Tableau
ββ π What is Struct in SQL
ββ π Topic Modelling using LDA
ββ π Difference Between Stateful and Stateless RNNs
ββ π Power of Continuous Integration and Continuous Deployment (CI/CD) in Data Engineering
ββ π Useful Data Science Libraries in Python
ββ π What is Lamda and Kappa Architecture
ββ π Difference between Data Lake and Data Lakehouse
ββ π AWS Dynamic Frames
ββ π Thompson Sampling for Multi Arm Bandit Problem
ββ π Confounding and Instrumental Variables
ββ π Comprehensive Overview of Introduction to MLOPS by Oreilly
ββ π Responsibleai Rainwidget: a tool for interpretable fair and accurate Machine Learning
ββ π Grouped Data Cross Validation with Scikit-learn: Groupshufflesplit and Groupkfold
β SHAP (Explainable AI) (SHapley Additive exPlanations) is a framework for interpretability and visualization of machine learning models. It provides an explanation for the predictions made by a machine learning model by assigning an importance score to each feature, indicating its contribution to the prediction. The SHAP values are based on a game theoretic approach to feature attribution, known as the Shapley values, which are a measure of the contribution of each feature to the prediction of a model.
The SHAP values are calculated for each prediction and are designed to be both model-agnostic and locally accurate. This means that they can be used to interpret any machine learning model, not just specific types of models, and the values for each feature are only relevant for the specific prediction being made. This allows for detailed, local interpretation of model predictions, rather than relying on global feature importance values.
β Record Linkage ,also known as entity resolution or data matching, is the process of identifying records in different databases that refer to the same real-world entity, despite differences in their representation or encoding. The goal of record linkage is to merge duplicate records into a single, accurate representation of the entity. This is an important step in data cleaning, which is a crucial part of the data preparation process in data analysis and machine learning.
Record linkage can be accomplished using various techniques, including deterministic methods, such as exact or fuzzy matching, and probabilistic methods, such as statistical matching and Bayesian inference. The choice of method will depend on the specific requirements of the data, such as the size of the data set, the amount of noise in the data, and the desired level of accuracy.
In practice, record linkage is used in a wide range of applications, including data integration, data quality improvement, fraud detection, customer relationship management, and market research.
β Optuna is an open-source library for hyperparameter optimization that enables users to efficiently perform Bayesian optimization, grid search, and random search. It supports a wide range of machine learning frameworks, including TensorFlow, PyTorch, and XGBoost. Optuna provides a high-level API for defining objectives and constraints, as well as a set of built-in algorithms for choosing the next set of hyperparameters to evaluate. It also integrates with popular visualization libraries such as Matplotlib and Plotly for easy visualizations of the optimization process. Optuna is designed to be easy to use and customizable, allowing users to implement their own optimization algorithms or to extend the existing ones with custom functions
β CUPED CUPED (Controlled Uplift Pre-Processing Experimental Design) is a technique used in A/B testing to reduce bias and improve the accuracy of estimated treatment effects. The main goal of CUPED is to adjust the treatment effect estimate by removing the systematic variation in the control group that is correlated with the treatment group.The basic idea behind CUPED is to fit a model to the control group data that predicts the outcome variable based on the covariates, and then use the residuals from this model to adjust the treatment effect estimate. This is done by subtracting the average residual in the control group from the observed treatment effect.The CUPED method has several advantages over traditional A/B testing methods. First, it can help reduce bias and improve the accuracy of treatment effect estimates by accounting for covariates that are correlated with both the treatment and the outcome. Second, it can help reduce the variance of the treatment effect estimate by removing the noise due to covariate variation in the control group. Finally, it can help improve the power of the test by reducing the sample size needed to detect a significant treatment effect.
β BERT, which stands for Bidirectional Encoder Representations from Transformers, is a powerful language model that has revolutionized natural language processing (NLP) tasks. It uses a transformer-based neural network architecture to learn contextual relationships between words and generate high-quality representations of text. BERT is pre-trained on massive amounts of text data, making it adept at tasks such as sentiment analysis, text classification, and question-answering. Its effectiveness has made it a popular choice for NLP researchers and practitioners alike, and it continues to be a driving force in the development of cutting-edge NLP applications.
β LIME (Local Interpretable Model-Agnostic Explanations) is a powerful AI tool used for interpreting the decisions made by machine learning models. It provides insights into how a model arrives at its output, which can help users understand and validate model behavior. LIME generates model-agnostic explanations, meaning that it can be used with a wide range of machine learning models. Its local explanations are designed to be interpretable, meaning they are easy for humans to understand. LIME has proven to be a valuable tool in fields such as healthcare, finance, and law, where interpretability of machine learning models is crucial for decision-making.
β GPT API by OpenAI is a powerful language model that uses deep learning techniques to generate natural language text. It is based on the GPT architecture, which stands for Generative Pre-trained Transformer, and has been pre-trained on massive amounts of text data. The GPT API allows users to generate high-quality text content with just a few lines of code, making it a popular choice for applications such as chatbots, text summarization, and content creation. Its ability to understand the context and generate coherent, natural-sounding text has made it a game-changer in the field of natural language processing.
ββ π± Phone No: 669-230-9604
ββ π LinkedIn: https://www.linkedin.com/in/iqra-bismi/
ββ π« Email: iqrabismi1992@gmail.com
ββ βπ» Medium: https://medium.com/@iqra.bismi