# Diagnosing Lipohypertrophy 

### MDS Capstone Project with the Gerontology Diabetes Research Lab (GDRL)

### Proposal Report Prepared by: Ela Bandari, Peter Yang, Lara Habashy and Javairia Raza

### Mentor: Tomas Beuzen

### Capstone Partner: Dr. Ken Madden 

<br>

#### 3.1. Executive Summary

A brief and high level summary of the project proposal.

<br>

#### 3.2. Introduction

**The introduction should start broad, introducing the question being asked/problem needing solving and why it is important. Any relevant information to understand the question/problem and its importance should be included. The proposal should communicate the question/problem in your own words. This is important to show your partner and mentor that you understand the question/problem.**

Lipohypertrophy is a common complication for diabetic patients who inject insulin. It is considered a ball of fall cells, fibrous tissue with lowered vascularity that forms in the skin following repeated trauma of insulin injection in the same area. Our focus for this paper is on subclinical hypertrophy which forms in the subcutaneous layer which is the deepest layer of the skin. It is critical that insulin is not injected into areas of lipohypertrophy as it reduces the effectiveness of the insulin such that patients are unable to manage their blood sugar levels and may require more insulin to achieve the same therapeutic benefits. Luckily, research has found ultrasound imaging techniques that are able to detect these masses much better than a physical examination of the body by a healthcare professional. But, currently, the ability to classify if a lipohypertrophy mass is present or not requires a trained eye that is only known by a small group of physicians as of now. Our capstone partner, Dr. Ken Madden from Gerontology Diabetes Research Laboratory (GDRL) came to the MDS team to ask if we could leverage supervised machine learning techniques that would be able to accurately classify the presence of lipohypertrophy given an ultrasound image. 

**Next, you should refine the big-picture problem into tangible objectives that are directly addressable by data science techniques.**

Having spent some time understanding the problem, our objectives as a data science team are to build a binary classification convolutional neural network (CNN) model that classifies an ultrasound image into two classes, lipohypertrophy present or lipohypertrophy is not present. Furthermore, we would like to utilize object detection techniques to classify, given a positive lipohypertrophy site, the exact area of its location on an ultrasound image.

**Finally, describe the final data product to be delivered to the partner. Example components of this product might include (but are not limited to) one or more of the following:**

A Shiny or Dash app;
A Python or R package;
A data pipeline;
Documentation;
Etc.



<br>

#### 3.3. Data Science Techniques

**Describe how you will use data science techniques in the project. Be sure to discuss the appropriateness of the data for the proposed data science techniques, as well as difficulties the data might pose. You should include a description of the data (variables/features and observational units) and some examples/snippets of what the data looks like (as a table or a visualization).**


[insert images from Big Question slide here]


Be sure to always always start with simple data science techniques to obtain a simple version of your data science product. There are two benefits to this approach. First, the simple method gives you a baseline to which you can compare future results. Second, the simple method may solve the problem, in which case you don't need something more complicated. For example, if dealing with time series data, your first model might be an ARIMA model, it should not be an LSTM.

[pre-process]

The pre-processed data is then fed into a CNN architecture that utilizes the pre-trained model densemodels, chosen based on its popularity in deep learning applications. Also, since implementing our own CNN from scratch is a rather difficult task and will yield poor results, incorporating a pre-trained model would allow us to leverage a model that has already been trained on thousands of images to detect particular features. The densemodels CNN model will be used as our baseline model.

Further, our research showed that there are many other popular deep learning architectures that have been proven successful across a wide range of problems. As a second approach, we plan on utilizing the VGG architecture, proposed by Karen Simonyan and Andrew Zisserman in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The VGG architecture has proven successful in applications with small images, and combats the large memory requirement of the densemodels architecture. As further research shows a VGG pre-trained model CNN is slow to train since the learned weights are rather large, we plan on exploring yet another architecture such as Inception, proposed in the paper “Going Deeper with Convolutions”. There are many other architectures we plan on modeling such that their performance is evaluated and compared to that of the baseline model. 

To evaluate the performance of each CNN model, we will use both accuracy and f1-score as our evaluation metrics. The use of the f1-score here is due to the fact that both precision and recall are important metrics from the healthcare providers perspective. That is, we would imagine bedside nurses administering insulin injections may want to reduce the number of false negatives, all the cases in which the model predicts there is no Lipohypertrophy when there is. The number of false positives is also equally significant as it indicates the number of cases in which the model detects the presence of Lipohypertrophy when there is none in reality. 

[talk about validation splits here]
[feature importance]



Difficulty data might pose: imbalance? 

Fix my grammar 
Check italics 



**Peters' section**

<br>

#### 3.4. Timeline

Indicate a rough timeline of the project, including the milestones you hope to achieve.

<br>

#### Conclusion 

insert conclusion here

<br>

#### References 

insert references here 