Data Science and Machine Learning Portfolio Website: https://sugatagh.github.io/dsml/
Ford Motor Company
2024-Present
Reliability Data Scientist
- Department: Global Data Insight and Analytics
Indian Institute of Science Education and Research Kolkata
2018-2024
Research Fellow and Teaching Assistant
-
Research Focus: Stochastic ordering
-
Teaching Assistantship: Served as Teaching Assistant for the courses Statistics I, Probability I, and Analysis I. Involved in conducting tutorial sessions, preparing question papers, and the grading process.
Indian Institute of Science Education and Research Kolkata
2018-2024
Doctor of Philosophy in Statistics
Indian Institute of Technology Kanpur
2015-2017
Master of Science in Statistics
University of Calcutta
2012-2015
Bachelor of Science in Statistics
Languages:
Python
,SQL
,R
,MATLAB
Tools:
LaTeX
,Jupyter Notebook
Statistical Software:
Minitab
Refereed Journal Publications
-
Ghosh, S., Nanda, A. K. (2023) Conditional precedence orders for stochastic comparison of random variables. Statistics and Probability Letters. https://doi.org/10.1016/j.spl.2022.109702
-
Ghosh, S., Dutta, S., Genton, M. G. (2017) A note on inconsistent families of discrete multivariate distributions. Journal of Statistical Distributions and Applications. https://doi.org/10.1186/s40488-017-0061-8
Preprints
-
Ghosh, S., Nanda, A. K. (2021) Departure-based Asymptotic Stochastic Order for Random Processes. https://arxiv.org/abs/2103.01727
-
Ghosh, S., Nanda, A. K. (2021) Asymptotic Stochastic Comparison of Random Processes. https://arxiv.org/abs/2103.01720
Academic Magazine Articles
-
Banerjee, P., Ghosh, S. (2016) A brief review on missing data. Prakarsho.*
-
Ghosh, S. (2014) A generalization of the Kelly gambling system. Prakarsho.
-
Dutta, T., Ghosh, S. (2014) An attempt to generate random numbers. Prakarsho.
Presentations
-
Departure-based Asymptotic Stochastic Order for Random Processes. International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022, IISER Kolkata.
-
On Some Inconsistent Multivariate Distributions. Open House'16, IIT Kanpur.
*Prakarsho: Departmental magazine published by the Department of Statistics, St. Xavier's College, Kolkata.
Scholarship and Research Fellowship
-
Research Fellowship from University Grants Commission, MHRD, Government of India.
-
National Scholarship from Department of Higher Education, MHRD, Government of India.
Test Performances
-
AIR-94 in Mathematical Science paper in CSIR-UGC NET-JRF (Dec 2016).
-
AIR-31 in Mathematical Statistics paper in IIT-JAM (2015).
Winter School on Deep Learning: From Perceptrons to Diffusion Models.
Organized by Electronics and Communication Sciences Unit, ISI Kolkata.
International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022.
Organized by Department of Mathematics and Statistics, IISER Kolkata.
Indo-French Center for Applied Mathematics (IFCAM) Winter School 2018.
On Stochastic Methods for Uncertainty Quantification and Sensitivity Analysis of Complex Models.
National Seminar on Application of Statistics and Statistical Computing.
Organized by Xaverian Statistical Association under Department of Statistics, St. Xavier’s College, Kolkata.
TMLC Fellowship Program
2022-2023
Conducted by The Machine Learning Company.
Contributed to the Conversational AI DeepPavlov project.
Author Identification with Natural Language Processing
- Predicted the author of a new text, given a dataset of texts with corresponding authors.
- Trained an LSTM algorithm with the help of GloVe embeddings and obtained a validation log loss of
$0.581$ .- GitHub repository: https://github.com/sugatagh/Spooky-Author-Identification
E-commerce Text Classification
- Classified products into four given categories based on their descriptions available on an e-commerce platform.
- Employed TF-IDF vectorizer and Word2Vec embedder with a number of classifiers. Obtained test accuracy of
$0.949$ with the hyperparameter-tuned model achieving the highest validation accuracy (TF-IDF + Linear SVM).- GitHub repository: https://github.com/sugatagh/E-commerce-Text-Classification
Anomaly Detection in Credit Card Transactions
- Identified credit card transactions as authentic or fraudulent based on time, amount, and other attributes.
- Fitted a multivariate normal distribution and achieved test
$F_2$ -score of$0.816$ with threshold value$\approx 3.87 \times 10^{-19}$ .- GitHub repository: https://github.com/sugatagh/Anomaly-Detection-in-Credit-Card-Transactions
Higgs Boson Event Detection
Conducted by The Machine Learning Company.
- Predicted whether or not an event produced in a particle accelerator indicates the discovery of a new particle.
- Trained a deep neural network, achieving test AMS (approximate median significance) score of
$1.200$ and test accuracy of$0.824$ , using GridSearchCV for hyperparameter optimization.- GitHub repository: https://github.com/sugatagh/Higgs-Boson-Event-Detection
Patient Survival Prediction
Conducted by The Machine Learning Company.
- Performed analytics and predicted the survival of a patient based on various relevant medical information.
- Trained a deep neural network, achieving test accuracy of
$0.923$ , using Keras Tuner for hyperparameter tuning.- GitHub repository: https://github.com/sugatagh/Patient-Survival-Prediction-using-Deep-Learning
Electron Energy Flux Prediction
Conducted by The Machine Learning Company.
- Predicted the total electron energy flux based on various relevant features, in the context of modeling electron particle precipitation from the magnetosphere to the ionosphere.
- Trained a deep neural network, achieving test
$R^2$ -score of$0.699$ , using Keras Tuner for hyperparameter tuning.- GitHub repository: https://github.com/sugatagh/Electron-Energy-Flux-Prediction-using-Deep-Learning
Site Energy Usage Intensity Prediction
Conducted by The Machine Learning Company.
- Estimated energy usage intensity of a building in a given year, based on building characteristics and weather data.
- Trained random forest regressor to obtain test
$R^2$ -score of$0.703$ , using Optuna for hyperparameter optimization.- GitHub repository: https://github.com/sugatagh/Site-Energy-Usage-Intensity-Prediction
Road Traffic Accident Severity Classification
Conducted by The Machine Learning Company.
- Built prediction models to classify severity of road traffic accidents (slight injury, serious injury or fatal injury).
- Obtained test weighted
$F_1$ -score of$0.795$ with XGBoost classifier, using GridSearchCV for hyperparameter tuning.- GitHub repository: https://github.com/sugatagh/Road-Traffic-Accident-Severity-Classification
Natural Language Processing with Disaster Tweets
Jointly with Shyambhu Mukherjee.
- Predicted whether a tweet indicates a disaster or not, using bag-of-words, TF-IDF, and Word2Vec models.
- Obtained average cross-validation
$F_1$ -score of$0.784$ with {Word2Vec, support vector machine}.- GitHub repository: https://github.com/sugatagh/Natural-Language-Processing-with-Disaster-Tweets
Credit Card Fraud Detection
Jointly with Shyambhu Mukherjee.
- Classified credit card transactions as authentic or fraudulent, based on relevant data such as time and amount.
- Obtained test
$F_2$ -score of$0.880$ with random forest algorithm after oversampling the minority class (fraudulent transactions) in the training set via synthetic minority over-sampling technique (SMOTE).- GitHub repository: https://github.com/sugatagh/Credit-Card-Fraud-Detection
Machine Learning Internship Program
2022
Conducted by Uniconverge Technologies and The IoT Academy.
- Detected duplication of points of interest in a dataset of over
$1.5$ million place entries.- Trained several algorithms and obtained test accuracy of
$0.770$ with hyperparameter-tuned XGBoost classifier.- GitHub repository: https://github.com/sugatagh/Foursquare-Location-Matching
A Time Series Analysis of Monthly Airline Revenue Passenger Mile (RPM)
Supervisor: Dr. Amit Mitra (IIT Kanpur).
- Analyzed RPM data for
$1996 – 2014$ and built a predictive model for forecasting future revenue values.
A Study on Performances in the Olympic Games
Supervisor: Dr. Sharmishtha Mitra (IIT Kanpur).
- Built a regression model to predict the overall performance of the countries in the Summer Olympic Games.
Students' Future Plans and the Reasons Behind
Supervisor: Dr. Shalabh (IIT Kanpur).
- Examined the variation in career choices of the students at IIT Kanpur and how the reasons for such choices vary.
A Statistical Analysis of the Variation in Preference to Movie Genres among Spectators
Supervisors: Dr. Durba Bhattacharya and Prof. Soumya Banerjee (St. Xavier's College, Kolkata).
- Studied how hobbies influence preferred movie genre of an individual. Checked bias due to gender and age-group.
- Analyzed differences in preferring one factor for a movie's success over another across age-groups and gender.
Generative AI for Everyone
2023
Authorized by DeepLearning.AI, offered by Coursera.
https://www.coursera.org/account/accomplishments/certificate/EV8T2EF4VUKN
Machine Learning Specialization
2022
Authorized by Stanford University, offered by Coursera.
https://www.coursera.org/account/accomplishments/specialization/certificate/U2MZV5HWRG5L
Data Analyst in SQL Track
2022
Offered by DataCamp.
https://www.datacamp.com/statement-of-accomplishment/track/689ba9d0ab9984f55aac593e6caacd1f9d197194
IBM Data Science Specialization
2022
Authorized by IBM, offered by Coursera.
https://www.coursera.org/account/accomplishments/specialization/certificate/9V355HMT2FB6
Applied Data Science with Python
2021
Offered by Electronics and ICT Academy, IIT Roorkee.
https://eict.iitr.ac.in/wp-content/uploads/L214613B669.jpg
Statistics
Regression Analysis, Statistical Inference, Time-Series Analysis, Statistical Simulation and Data Analysis, Probabilistic Theory of Pattern Recognition, Multivariate Analysis, Analysis of Variance, Robust Statistical Methods, Nonparametric Inference, Non-linear Regression, Large Sample Theory, Sampling Theory, Matrix Theory and Linear Estimation, Design of Experiments, Statistical Quality Control, Distributions Theory in Statistics, Population Statistics, Economic Statistics.
Mathematics
Real Analysis, Linear Algebra, Multivariable Calculus, Numerical Analysis, Complex Variables, Ergodic Theory, Introduction to Graph Theory, Measure theory.
Probability and Applications
Probability Theory, Applied Stochastic Process.
Others
Computer Programming and Data Structures, Research Methodology.