Data cleaning was performed on a million English sentences, followed by the application of transfer learning on pre-trained Large Language Models such as BART and MarianMT to enhance text accuracy. The models underwent extensive testing, including novel comparisons on both natural and synthetic errors. Observations were made regarding shifts in error categories, and analyses were conducted on the effectiveness of transfer learning versus pre-training. Threshold points were determined through this process. The findings were drafted into two conference papers and a journal paper.
Worked along with six team members to gather 50,000 YouTube video data through the YouTube API, then meticulously cleaned and pre-processed the dataset. Conducted a exploratory data analysis and forumulated hypothesis tests to assess the impact of video titles on views. Inspired by the hypothesis test outcomes, developed a BERT-based classification model for predicting view class based on video titles, achieving an impressive 76% accuracy score.
Successfully developed a model that automates the classification of casting products as either defective or non-defective, replacing the need for human labor in this task. Casting product images were pre-processed using VGG16. Models such as Fine tuned ANN with L1, L2 regularization and fine tuned CNN architecture were developed and tested to classify images. Fine tuned CNN resulted to be the best model with a classification accuracy of 95%.