# Regression Analysis of C-Peptide Dependence on Age and Base Deficit

## INTRO AND SUMMARY GOES HERE

### Question: 
Is there a relationship between the serum C-Peptide and other factors that could predict insulin-resistant Diabetes Mellitus in children (Patient age and Base Deficit)?

### Data Source:
KEEL Diabetes Data Set
KEEL Diabetes Dataset (By KEEL). (n.d.). [Dataset]. https://sci2s.ugr.es/keel/dataset.php?cod=45

### Data Structure:

`Age`: Patient Age, domain: [0.9,15.6]

`Deficit`: Base Deficit, measure of acidity, domain: [-29.0,-0.2]

`C-peptide`: Logarithm of C-Peptide concentration (pmol/ml), domain: [3.0,6.6]

### Imports

In [None]:
import pandas as pd
import requests
import zipfile
import os
import io

### Download Data into data/ directory
This code was adapted from Microsoft Copilot prompts:
'Use Python to download a data file from a download link into a directory, as a csv'

In [2]:
url = "https://sci2s.ugr.es/keel/dataset/data/regression/diabetes.zip"

file_path = os.path.join("data", "diabetes.zip")

response = requests.get(url)
zip_bytes = io.BytesIO(response.content)

with open(file_path, "wb") as f:
    f.write(response.content)


In [3]:
with zipfile.ZipFile(zip_bytes, "r") as zip_ref:
    dat_files = [f for f in zip_ref.namelist() if f.endswith(".dat")]
    dat_content = zip_ref.read(dat_files[0]).decode("utf-8")

In [4]:
lines = dat_content.splitlines()
data_lines = [line for line in lines if not line.startswith("@") and line.strip()]

In [5]:
rows = [line.strip().split(",") for line in data_lines]
diabetes_df = pd.DataFrame(rows)

diabetes_df.columns = ["Age", "Deficit", "C_peptide"]

In [6]:
csv_path = os.path.join("data", "diabetes.csv")
diabetes_df.to_csv(csv_path, index=False)