# Visualizing Diabetes Data


Powered by `pandasplot` and `matplotlib`


## Imports


In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

----

## About the Data

This dataset contains health metrics and demographic information for individuals, primarily used for studying diabetes and related health conditions. It includes various blood parameters, such as cholesterol, glucose, and glycated hemoglobin levels, along with physical characteristics like height, weight, and body frame. The data is collected from individuals in two locations, Buckingham and Louisa, and includes demographic details such as age and gender. This dataset can help analyze the relationship between different health metrics and demographic factors, aiding in diabetes research and understanding its impact on different populations.

In [None]:
data_frame = pd.read_csv("diabetes.csv")
# Drop the missing values rows
data_frame = data_frame.dropna()
data_frame.head(5)

## Plotting the Data
### Cholesterol Levels vs Weight

This code visualizes the distribution of cholesterol levels and weight within the dataset using density plots. A density plot provides a smooth estimate of the data distribution, highlighting where values are concentrated over a continuous interval.

In [None]:
data_frame["chol"].plot.density()
data_frame["weight"].plot.density()

### Cholesterol Levels and Age


A view of the data at a new angle; comparing the Cholesterol Levels to Ages in order to see how that match up. As we can see, Most of our subjects' records indicate their cholesterol levels to be within the 150-250 range with some outliers. 

In [None]:
# Scatter plot for cholesterol levels and age
data_frame.plot(kind='hexbin', x='chol', y='age', gridsize=20, title='Cholesterol Levels and Age')
plt.xlabel('Cholesterol Level')
plt.ylabel('Age')
plt.show()

### Cholesterol and Weight

Uses a Cartesian coordinate system to plot points along a grid where the X and Y axis are separate variables.

Each point is assigned a label or category. Each plotted point then represents a third variable by the area of its circle. Colors can be used to distinguish between categories or used to represent an additional data variable. 

Used to compare and show the relationships between categorised circles, by the use of positioning and proportions. The overall picture can be use to analyse for patterns/correlations.

In [None]:
data_frame.plot(kind='hexbin', x='chol', y='weight',
        gridsize=20,
        title="Cholesterol and Weight")

### Weight and Age

Initially, density plots suggest that weight influences cholesterol levels. However, hexbin plots for both weight vs. cholesterol and age vs. cholesterol appear similar. This raises the question: does age influence cholesterol levels?

To explore this, we will examine the relationship between age and weight to determine if age significantly affects weight, which might in turn influence cholesterol levels.

In [None]:
data_frame.plot(kind='scatter', x='weight', y='age', 
        title="Weight and Age")

## Conclusion



Based on our analysis, the scatter plot for age versus weight is widely dispersed, indicating no significant correlation between the two variables. This suggests that age does not influence weight in a meaningful way. Consequently, it's unlikely that age impacts cholesterol levels through weight. Therefore, the initial impression that weight alone influences cholesterol levels remains valid, and age does not appear to be a contributing factor.