IQRA BISMI iqrabismii

Hi there, I'm Iqra Bismi👋

I am passionate about Data Science, Data Engineering and Machine Learning. With the proliferation of data in today's digital age, data science has become a crucial tool for businesses, governments, and other organizations to understand and analyze complex data sets in order to make better decisions, improve operations, and drive growth. I have gained in depth-knowledge in these fields by taking online courses, reading relavant books and articles about Big Data Analytics, Machine Learning, Statistics,Data Engineering and Data Science.

About Me:

I'm currently pursuing Masters in Data Analytics from San Jose State University,Department of Applied Data Science. With 4.5 years of experience in data modeling and analysis, I bring a unique perspective to the field. I have a comprehensive understanding of Statistics, Python, SQL, Big Data, and Machine Learning, and have worked with cutting-edge technologies like GANs, Large Language Models (LLMs), and Transformers. Additionally, I have completed courses in Machine Learning Fundamentals, Natural Language Processing (NLP), Deep Learning, Big Data, and Statistics from DataCamp, GeekForGeeks, Coursera, and Udemy, which have provided me with a strong theoretical foundation in these fields.To gain hands on experience, I worked on projects ranging from retail,banking,movies and automobile industries.My portfolio on GitHub showcases a selection of my projects and experiences, demonstrating my capabilities as a data science and machine learning professional.

Currently, I am working on a cutting-edge project developing a Multi Media Chatbot AI testing tool that incorporates natural language processing (NLP) and conversational AI. To enhance the functionality of the tool, I am also using automatic speech recognition and text-to-speech technology, which enables the chatbot to have more human-like interactions with users. Furthermore, I am actively involved in deep learning projects, where I am creating animated versions of real images using StyleGAN, JojoGAN, and Cartoon GAN. These deep learning models allow me to generate novel images that are similar in style to the input images.

I am always eager to learn and expand my knowledge in these fields, and I have a passion for using data and technology to make a positive impact. My portfolio on GitHub showcases a selection of my projects and experiences, and demonstrates my capabilities as a Data Science, Data Engineer and Machine Learning Professional.Whether it's through building Predictive Models, Computer Vision, Recommendation System, A/B Testing,Text Mining or Developing Data Pipelines, I am dedicated to using my skills to drive innovation and solve complex problems. I am excited to share my work with others and contribute to the Data Science and Machine Learning Community. Thank you for considering my portfolio.🙂

🔧 🪚 My Skills:

I started learning python in my first semester which helped to understand data science and machine learning.I also learned R language which is one of the important languages in statistics and machine learning. Below is a list of skills that I have gained through my experience in the field of data science and machine learning. Learning these essential skills and techniques helped me to built and deployed projects successfully.

📁 🔖 Portfolio Overview

Thank you for visiting my portfolio! I have had an awesome experience working on machine learning and deep learning projects and am excited to share them with you. Please find below links to my projects. For more detailed information about each project, including results and descriptions, please click on the project link. Below is a summary of my entire portfolio. Thank you for considering my work.🙂

Machine Learning Projects

Bank Customers Churn Prediction Model	Gym Members attendance Prediction Model

RFM Analysis and Customer Segmentation	Time Series Modelling using ARIMA & SARIMAX Model

Movies Recommendation System	Collaborative Filtering for Movies using Matrix Factorisation

House Price Prediction Model	Popular Recipe Predition Model

Speech Emotion Recognition

Computer Vision Projects

Animated Face Generation using GAN

Fashion Clothes Classification Using Convolutional Neural Network	Sign Language Indentification

Unsupervised Machine Learning Projects

Credit Card Customer Segmentation

Big Data Projects

Analyzing Customer Shopping Behavior using AWS Services	End to End ML Model Using Pyspark

Analysing Car's Performance using Pyspark	Performance Comparsion between Mongodb and Cassandra

Natural Language Processing (NLP) Projects

Twitter Sentiment Analysis	Extracting Stock Sentiment from News Headlines

Data Mining Projects

Market Basket Analysis for Grocery Store	Restaurant Recommendation System based on Yelp Data

Data Visualisation Projects

Netflix Movies Analysis

SQL Case Study

Danny's Diner Case Study	Foodie-Fi Case Study

Online News Exhibition	Best Selling Video Games

Statistics

Multi-Variate Regression Using Statsmodels and Gradient Descent Optimisation	Finding the best version for Mobile Game using A/B Testing

Education

📚 Masters in Data Analytics | San Jose State University, San Jose, California [Jan 2022 - December 2023]

📚 Master of Business Administration | Indian Institute of Management, Ranchi, India[June 2016 - Feb 2018]

📚 Bachelor of Technology | Govind Ballabh Pant University, India[August 2010 - June 2014]

List of Certifications

🔖 Data Analyst

🔖 Data Scientist

🔖 Machine Learning for Time Series Data in Python

🔖 Market Basket Analysis in Python

🔖 Supervised Machine Learning with Scikit-Learn

🔖 Introduction to Tensorflow in Python

🔖 Image Processing With Keras In Python

🔖 Building Chatbots Using Google Dialogflow

🔖 Deep Learning With Keras

🔖 Image Processing With Python

:bookmark: Customer Churn Prediction in Python

🔖 ARIMA Models in Python

🔖 Hypothesis Testing in Python

🔖 Customer Analytics and A/B Test in Python

🔖 Intermediate Regression with Statsmodels in Python

🔖 Introduction to Regression with Statsmodels in Python

🔖 Introduction to Statistics in Python

🔖 Sampling Techniques in Python

🔖 Statistics Fundamentals in Python

🔖 Big Data Fundamentals with Pyspark

🔖 Building Recommendation System with Pyspark

🔖 Writing Functions and Stored Procedures in SQL Server

🔖 Learning Hadoop

🔖 Data Cleaning Using Pyspark

🔖 Introduction to Airflow in Python

🔖 Understanding Data Engineering

🔖 Analysis Marketing Campaigns Using Pandas

🔖 Introduction to Deep Learning In Python

🔖 Analysing Data In Tableau

🔖 Deep Learning in Pytorch

🔖 Advanced DeepLearning With Keras

🔖 Modernising Data Lakes and Data Warehouse with Google Cloud

🔖 Google Cloud Big Data and Machine Learning Fundamentals

🔖 Data Science with Tableau

🔖 Amazon Web Services: Data Analytics

🔖 Apache Spark Essential Training

🔖 Advanced SQL for Data Science: Time Series

Work Experience

💻 Data Engineer |Roku Inc [May 2023 - Present]

💻 Senior Executive |Oil and Natural Gas Corporation Ltd [July 2018 - April 2021]

💻 Intern | Wipro Ltd [April 2017 - June 2017]

💻 SAP Consultant | Tata Consultancy Services Ltd [September 2014 - June 2016]

Assignments and Coursework

📙 DATA-255 Deep Learning

📖 Homework 1

📖 Homework 2

📖 Homework 3

📖 Homework 4

📙 DATA-245 Machine Learning

📖 Homework 1

📖 Homework 2

📖 Homework 3

📖 Homework 4

📖 Homework 5

📙 DATA-228 Big Data

📖 Homework 1

📖 Homework 2

📖 Homework 3

📖 Homework 4

📖 Homework 5

Competencies

👩‍🏫 Leadership Skills
👩‍🏫 Biased for Action
👩‍🏫 Communication Skills
👩‍🏫 Team Work
👩‍🏫 Curios
👩‍🏫 Problem-solving Skills
👩‍🏫 Time Management
👩‍🏫 Accountiblity

Blogs On Medium

📃 Learn Python in 30 Days

📃 How to Prepare for SQL Interviews in 15 Days

📃 What is Dummy Variable Trap and How it can be avoided?

📃 Hyper-Parameter Tuning for Machine Learning Models Using Optuna

📃 Detecting and Resolving Duplicate Records using Record Linkage

📃 Two Simple Tests to check Normality of the data

📃 How to perform Market Basket Analysis using Apriori Algorithm and Association Rules

📃 How to determine the order of ARIMA or SARIMA Models

📃 How to Perform Hyper-Parameter Tuning in Artificial Neural Networks

📃 Different-Loss-Functions-used-in-Regression

📃 Using Hexbin Plots to visualise relationship between two variables

📃 Different Correlation Coefficients to measure the relationship between two variables

📃 Different Methods to replace Missing Values in Data

📃 How to find Optimal Parameters for Regression Model using Scipy

📃 What is Pandas Profiler and Why it is used

📃 OLAP Operations in SQL

📃 Feature-Encoding-Using-K-Fold-Target-Encoding

📃 Different Linkage Methods used in Hierarchical Clustering

📃 What is LIME and how it can be used?

📃 Benefits Of Dropblock Over Dropout in CNN

📃 Difference between Prediction Interval and Confidence Interval

📃 Understanding Keras Embedding for NLP

📃 Augmenting Text Using Large Language Models GPT-2 GPT-3 BERT

📃 Equiangular Basis Vectors:a better alternative to softmax for classification tasks

📃 Federated Learning by Google bringing Privacy to Machine Learning

📃 Different Types of Filters in Tableau

📃 What is Struct in SQL

📃 Topic Modelling using LDA

📃 Difference Between Stateful and Stateless RNNs

📃 Power of Continuous Integration and Continuous Deployment (CI/CD) in Data Engineering

📃 Useful Data Science Libraries in Python

📃 What is Lamda and Kappa Architecture

📃 Difference between Data Lake and Data Lakehouse

📃 AWS Dynamic Frames

📃 Thompson Sampling for Multi Arm Bandit Problem

📃 Confounding and Instrumental Variables

📃 Comprehensive Overview of Introduction to MLOPS by Oreilly

📃 Responsibleai Rainwidget: a tool for interpretable fair and accurate Machine Learning

📃 Grouped Data Cross Validation with Scikit-learn: Groupshufflesplit and Groupkfold

State-of-the-art (SOTA) Techniques in Artificial Intelligence and Statistics:

SHAP (Explainable AI) (SHapley Additive exPlanations) is a framework for interpretability and visualization of machine learning models. It provides an explanation for the predictions made by a machine learning model by assigning an importance score to each feature, indicating its contribution to the prediction. The SHAP values are based on a game theoretic approach to feature attribution, known as the Shapley values, which are a measure of the contribution of each feature to the prediction of a model.

The SHAP values are calculated for each prediction and are designed to be both model-agnostic and locally accurate. This means that they can be used to interpret any machine learning model, not just specific types of models, and the values for each feature are only relevant for the specific prediction being made. This allows for detailed, local interpretation of model predictions, rather than relying on global feature importance values.

Record Linkage ,also known as entity resolution or data matching, is the process of identifying records in different databases that refer to the same real-world entity, despite differences in their representation or encoding. The goal of record linkage is to merge duplicate records into a single, accurate representation of the entity. This is an important step in data cleaning, which is a crucial part of the data preparation process in data analysis and machine learning.

Record linkage can be accomplished using various techniques, including deterministic methods, such as exact or fuzzy matching, and probabilistic methods, such as statistical matching and Bayesian inference. The choice of method will depend on the specific requirements of the data, such as the size of the data set, the amount of noise in the data, and the desired level of accuracy.

In practice, record linkage is used in a wide range of applications, including data integration, data quality improvement, fraud detection, customer relationship management, and market research.

Optuna is an open-source library for hyperparameter optimization that enables users to efficiently perform Bayesian optimization, grid search, and random search. It supports a wide range of machine learning frameworks, including TensorFlow, PyTorch, and XGBoost. Optuna provides a high-level API for defining objectives and constraints, as well as a set of built-in algorithms for choosing the next set of hyperparameters to evaluate. It also integrates with popular visualization libraries such as Matplotlib and Plotly for easy visualizations of the optimization process. Optuna is designed to be easy to use and customizable, allowing users to implement their own optimization algorithms or to extend the existing ones with custom functions

CUPED CUPED (Controlled Uplift Pre-Processing Experimental Design) is a technique used in A/B testing to reduce bias and improve the accuracy of estimated treatment effects. The main goal of CUPED is to adjust the treatment effect estimate by removing the systematic variation in the control group that is correlated with the treatment group.The basic idea behind CUPED is to fit a model to the control group data that predicts the outcome variable based on the covariates, and then use the residuals from this model to adjust the treatment effect estimate. This is done by subtracting the average residual in the control group from the observed treatment effect.The CUPED method has several advantages over traditional A/B testing methods. First, it can help reduce bias and improve the accuracy of treatment effect estimates by accounting for covariates that are correlated with both the treatment and the outcome. Second, it can help reduce the variance of the treatment effect estimate by removing the noise due to covariate variation in the control group. Finally, it can help improve the power of the test by reducing the sample size needed to detect a significant treatment effect.

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a powerful language model that has revolutionized natural language processing (NLP) tasks. It uses a transformer-based neural network architecture to learn contextual relationships between words and generate high-quality representations of text. BERT is pre-trained on massive amounts of text data, making it adept at tasks such as sentiment analysis, text classification, and question-answering. Its effectiveness has made it a popular choice for NLP researchers and practitioners alike, and it continues to be a driving force in the development of cutting-edge NLP applications.

LIME (Local Interpretable Model-Agnostic Explanations) is a powerful AI tool used for interpreting the decisions made by machine learning models. It provides insights into how a model arrives at its output, which can help users understand and validate model behavior. LIME generates model-agnostic explanations, meaning that it can be used with a wide range of machine learning models. Its local explanations are designed to be interpretable, meaning they are easy for humans to understand. LIME has proven to be a valuable tool in fields such as healthcare, finance, and law, where interpretability of machine learning models is crucial for decision-making.

GPT API by OpenAI is a powerful language model that uses deep learning techniques to generate natural language text. It is based on the GPT architecture, which stands for Generative Pre-trained Transformer, and has been pre-trained on massive amounts of text data. The GPT API allows users to generate high-quality text content with just a few lines of code, making it a popular choice for applications such as chatbots, text summarization, and content creation. Its ability to understand the context and generate coherent, natural-sounding text has made it a game-changer in the field of natural language processing.

Contact Information

📱 Phone No: 669-230-9604

🖇 LinkedIn: https://www.linkedin.com/in/iqra-bismi/

📫 Email: iqrabismi1992@gmail.com

✍🏻 Medium: https://medium.com/@iqra.bismi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IQRA BISMI iqrabismii

Achievements

Achievements

Block or report iqrabismii

Hi there, I'm Iqra Bismi👋

About Me:

🔧 🪚 My Skills:

📁 🔖 Portfolio Overview

Machine Learning Projects

Computer Vision Projects

Unsupervised Machine Learning Projects

Big Data Projects

Natural Language Processing (NLP) Projects

Data Mining Projects

Data Visualisation Projects

SQL Case Study

Statistics

Education

List of Certifications

Work Experience

Assignments and Coursework

Competencies

Blogs On Medium

State-of-the-art (SOTA) Techniques in Artificial Intelligence and Statistics:

Contact Information

Popular repositories Loading