Skip to content

rrstofer/rrstofer.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Ryan Stofer's Portfolio



IMD Solutions Inc. - GlucoGuard

(Jan 2024 - June 2024)

Overview: I collaborated with a team to develop a predictive model and an application for IMD solutions Inc.'s newest product, GlucoGuard, aimed at individuals with type 1 diabetes. GlucoGuard continuously monitors glucose levels and administers glucose during nocturnal hypoglycemia to prevent low blood sugar events during sleep. The project included two key components: a predictive model and a user application.

The predictive model was designed to forecast hypoglycemic events, particularly during sleep. I created a logistic regression model due to its efficiency and effectiveness, given the limited feature set of the data. In the absence of actual patient data, we utilized CGM data from Kaggle and achieved a precision of 93.1% across 50 user data sets. The code was structured for easy deployment within the app, ensuring seamless integration with the device’s functionality.

Gluco_ML

The user application was developed to provide a user-friendly interface that connects to the device's API. This application allows users to monitor their glucose levels and receive alerts, enhancing their experience and safety when using GlucoGuard.

Gluco_Wireframe

Improvements: While the model demonstrated promising results with experimental data, testing and retraining the model using actual patient data obtained from the GlucoGuard mouthpiece would enhance its accuracy and reliability. Additionally, incorporating more metadata could enable the development of more complex models, such as a time series model like SARIMA, which could further improve the prediction of hypoglycemic events and provide deeper insights into glucose level trends over time.

Technical Skills: Machine Learning, Logisitic Regression

Tools: Python, React Native

Team: Mina Yoon, Aashi Singh, Sophia Zhang, Ryan Stofer

Specifics and files are not shared for confidentiality purposes.

Open Poster


2024 SoCal RUG Hackathon - Linguistic Isolation in California

(April 2024)

Overview: I participated in a 36-hour hackathon where my team and I analyzed IPUMS US Census data to extract meaningful insights. Given the vast amount of data and the limited time frame, we focused on the variable "linguistic isolation." Linguistic isolation is defined as households where no person aged 14 or older speaks only English at home, or no person aged 14 or older who speaks a language other than English at home speaks English "very well." This definition applies to both the U.S. and Puerto Rican censuses as well as the ACS and PRCS. Our goal was to analyze how linguistic isolation trends across different states in the U.S. over the years.

We concentrated on the top five and bottom four states, discovering the most notable trends in California. Additionally, we performed some geospatial analysis by plotting the data from 2014 and 2022 across different IPUMS regions, highlighting a significant decline in linguistic isolation in the Southeast, Orange County/LA County, and the Bay Area.

SoCalRug_Presentation_1

SoCalRug_Presentation_2

Despite the constraints of the hackathon, our project provided valuable insights into linguistic isolation trends in the U.S. Our analysis revealed significant regional variations, with California showing the most notable trends. The comparison between 2014 and 2022 data indicated a marked decline in linguistic isolation in specific regions, particularly in the Southeast, Orange County/LA County, and the Bay Area. This trend suggests a potential improvement in English proficiency or demographic shifts in these areas.

We also attemptted to predict linguistic isolation using a random forest model and although it was not highly successful, it underscored the importance of occupation type as a key covariate. This insight points to the potential influence of employment sectors and economic opportunities on linguistic isolation.

Improvements: For future research, a deeper exploration into feature selection and model optimization could enhance the predictive power of our models. Investigating the underlying causes of the drastic decline in linguistic isolation in certain IPUMS areas over the eight-year interval would also be valuable. Applying an interrupted time series analysis could help identify events or policy changes that contributed to this decline, providing a more comprehensive understanding of the factors driving linguistic isolation trends.

Technical Skills: Data Visualization

Tools: R, arrow

Team: Kabir Snell, Peyton Politewicz, Steven Yang, Ryan Stofer

Open Poster


Super Resolution of Land Surface Temperature (LST) images

(May 2022 – September 2023)

Overview: I led a collaborative research project with a graduate student where we trained a U-Net-based convolutional neural network (CNN) to improve the resolution of coarse remote sensing data by utilizing high-resolution RGB imagery.

To overcome the challenge of acquiring extensive ground truth data for CNN training, we introduced an innovative pre-training procedure. This method involved applying a randomized function to the RGB images, generating synthetic high-resolution data with varying relationships to the RGB bands. This allowed the model to learn from the abundant high-resolution RGB data before specializing in super resolution tasks. We compared our deep learning approach's effectiveness against a pixel-based statistical downscaling method and noticed a significant improvement of around 28% for its R^2 value. By using a deep learning model, we reduce the need for airborne ground truth data, providing a practical solution to enhance the resolution of coarse remote sensing data, especially when research demands exceed current resolution capabilities.

I presented our work at the Fall 2022 American Geophysical Union conference in Chicago.

AGU Poster

Improvements: I would have aimed to introduce increased intricacy into both the randomization function within the pre-training process (such as incorporating non-linear transformations and adding Gaussian noise) and the overall U-Net model (by considering enhancements like the Laplace filter and deeper layers). These modifications could have been explored to further refine our ability to enhance pixel unmixing, particularly at the boundaries of distinct semantic regions.

Technical Skills: Convolutional Neural Network, Deep Learning, PyTorch, TensorFlow

Tools: Python, R

Team: Anna Boser, Ryan Stofer

Open Poster Open Repository


UCSB Capstone - Deep Learning X-ray Diffraction Model

(January 2023 – June 2023)

Overview: As part of the UCSB Capstone program, my group and I partnered with the Stanford Synchrotron Radiation Lightsource (SLAC SSRL), a division of SLAC National Accelerator Laboratory, to research and develop a deep learning model capable of classifying the resolution of individual X-ray diffraction shots.

We divided our project into two teams, each focused on a specific aspect of multi-task learning model development. One team developed a CNN for classifying images as single- or multi-lattice while my team created a CNN to predict image resolution. Using a ResNet-based architecture, both teams achieved significant success in model performance. Our multi-lattice detection CNN achieved 94% accuracy, and the resolution quantification CNN attained a 0.96 Pearson correlation value on simulated data, with high qualitative performance observed on experimental data. This project demonstrates the potential of deep learning in addressing the data rate challenge in protein crystallography.

Capstone_Poster Capstone_Poster

Improvements: We would like to extend our work further by testing our model on more real user data and assessing its performance on both tasks. Furthermore, we could also look into implementing more image artifacts to our simulated data sets to increase the variation in our data.

Technical Skills: Convolutional Neural Network, Deep Learning, PyTorch, TensorFlow

Tools: Python, CUDA

Team: Aleksander Cichosz, Vardan Martirosyan, Teo Zeng, Ryan Stofer

Open Poster Open Repository


National Hate Crime Model

(January 2022 - March 2022)

Overview: During my undergraduate machine learning course, my partner and I were assigned the task of developing a machine learning model on a topic of our choice. We chose to create a model capable of classifying offenders' races based on data from hate crimes recorded by the FBI Crime Data Explorer from 2010 to 2019. To achieve this, we explored a range of machine learning models, including Random Forest, Naive-Bayes, Boosting, and Logistic Regression. We also created an HTML report that documented our thought processes, offering explanations for exploratory data analysis (EDA), data preprocessing, and model testing and evaluation.

Upon analyzing our results, a noteworthy pattern emerged: our Random Forest model exhibited exceptional accuracy when classifying White offenders but significantly higher errors when dealing with minority groups. This trend was consistent across all our models. The root cause lay in the imbalanced distribution of hate crime offender races, with over 75% being White offenders. Further investigation, including variance importance analysis, revealed that the description of the bias (the type of hate crime offense) held the most significant influence in determining a hate crime offender's race. Ultimately, our best-performing models were Random Forest and Boosting, both with an error rate of approximately 23%.

In conclusion, our project demonstrated that predicting an offender's race using various crime features is feasible. However, it may be inherently biased due to the pronounced disparity in the distribution of crimes among different offender races.

HC_1 HC_2

Improvements: Considering the uneven distribution of offender races in our dataset, we would have liked to enhance our pipeline to overcome this obstacle. These improvements would encompass oversampling and data augmentation techniques, stratified data sampling methods, and the implementation of more stringent regularization measures which would all be designed for mitigating the impact of our data imbalance.

Technical Skills: Machine Learning, Random Forest, Naive-Bayes, Boosting, Logistic Regression

Tools: R, R Markdown

Team: Mitchell Rapaport, Ryan Stofer

Open Report Open Rmd

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages