https://archive.ics.uci.edu/dataset/329/diabetic+retinopathy+debrecen

## Alex Khvatov Capstone project

_This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not._

In [18]:
!cd data; wget https://archive.ics.uci.edu/static/public/329/diabetic+retinopathy+debrecen.zip -O diabetic+retinopathy+debrecen.zip

--2025-01-03 21:26:08--  https://archive.ics.uci.edu/static/public/329/diabetic+retinopathy+debrecen.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
connected. to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... 
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘diabetic+retinopathy+debrecen.zip’

diabetic+retinopath     [ <=>                ]  46.52K  --.-KB/s    in 0.1s    

2025-01-03 21:26:09 (321 KB/s) - ‘diabetic+retinopathy+debrecen.zip’ saved [47634]



In [21]:
!cd data; unzip -o diabetic+retinopathy+debrecen.zip

Archive:  diabetic+retinopathy+debrecen.zip
  inflating: messidor_features.arff  


In [22]:
!cd data; rm diabetic+retinopathy+debrecen.zip

In [3]:
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns

pd.options.mode.copy_on_write = True

In [27]:
columns = [
    'quality',
    'pre_screening',
    'ma1',
    'ma2',
    'ma3',
    'ma4',
    'ma5',
    'ma6',
    'exudate1',
    'exudate2',
    'exudate3',
    'exudate4',
    'exudate5',
    'exudate6',
    'exudate7',
    'exudate8',
    'macula_opticdisc_distance',
    'opticdisc_diameter',
    'am_fm_classification',
    'class'
]

In [28]:
path_to_data = Path.resolve(Path("./data/messidor_features.arff"))

### Variables Table


| Variable Name	| Role	| Type	| Description	| Units	| Missing Values |
|---------------|-------|-------|---------------|-------|----------------|
| quality	    |Feature|Binary	|The binary result of quality assessment. 0 = bad quality 1 = sufficient quality.| |no|
|pre_screening  |Feature|Binary	|The binary result of pre-screening, where 1 indicates severe retinal abnormality and 0 its lack.| |no|
|ma1            |Feature|Integer| ma1 - ma-6 contain the results of MA detection. Each feature value stand for the number of MAs found at the confidence levels alpha = 0.5, . . . , 1, respectively.| |no|
|ma2	|Feature	|Integer		|	| |no|
|ma3	|Feature	|Integer		|	| |no|
|ma4	|Feature	|Integer		|	| |no|
|ma5	|Feature	|Integer		|	| |no|
|ma6	|Feature	|Integer		|	| |no|
|exudate1	|Feature	|Continuous	|exudate1 - exudate8 contain the same information as 2-7) for exudates. However, as exudates are represented by a set of points rather than the number of pixels constructing the lesions, these features are normalized by dividing the number of lesions with the diameter of the ROI to compensate different image sizes.		| |no|
|exudate2	|Feature	|Continuous	|		| |no|
|exudate3	|Feature	|Continuous	|		| |no|
|exudate3	|Feature	|Continuous	|		| |no|
|exudate5	|Feature	|Continuous	|		| |no|
|exudate6	|Feature	|Continuous	|		| |no|
|exudate7	|Feature	|Continuous	|		| |no|
|exudate8	|Feature	|Continuous	|		| |no|
|macula_opticdisc_distance	|Feature	|Continuous	|The euclidean distance of the center of the macula and the center of the optic disc to provide important information regarding the patient's condition. This feature is also normalized with the diameter of the ROI.	| |no|
|opticdisc_diameter	|Feature	|Continuous	|The diameter of the optic disc.		| |no|
|am_fm_classification	|Feature	|Binary	|The binary result of the AM/FM-based classification.		| |no|
|Class	|Target	|Binary	|Class label. 1 = contains signs of DR (Accumulative label for the Messidor classes 1, 2, 3), 0 = no signs of DR.		| |no|


In [29]:
df = pd.read_csv(path_to_data, skiprows=24, names=columns)

In [32]:
df.dtypes

quality                        int64
pre_screening                  int64
ma1                            int64
ma2                            int64
ma3                            int64
ma4                            int64
ma5                            int64
ma6                            int64
exudate1                     float64
exudate2                     float64
exudate3                     float64
exudate4                     float64
exudate5                     float64
exudate6                     float64
exudate7                     float64
exudate8                     float64
macula_opticdisc_distance    float64
opticdisc_diameter           float64
am_fm_classification           int64
class                          int64
dtype: object