### Fertility Data Set
https://archive.ics.uci.edu/ml/datasets/Fertility

| Field Name            | Order | Type (Format)     | Description                                                                                                                                    |
|-----------------------|-------|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| season                | 1     | number (default)  | Season in which the analysis was performed. 1) winter, 2) spring, 3) Summer, 4) fall. (-1, -0.33, 0.33, 1)                                     |
| age                   | 2     | number (default)  | Age at the time of analysis. 18-36 (0, 1)                                                                                                      |
| childish-disease      | 3     | integer (default) | Childish diseases (ie, chicken pox, measles, mumps, polio) 1) yes, 2) no. (0, 1)                                                               |
| trauma                | 4     | integer (default) | Accident or serious trauma 1) yes, 2) no. (0, 1)                                                                                               |
| surgical-intervention | 5     | integer (default) | Surgical intervention 1) yes, 2) no. (0, 1)                                                                                                    |
| fevers                | 6     | integer (default) | High fevers in the last year 1) less than three months ago, 2) more than three months ago, 3) no. (-1, 0, 1)                                   |
| alcoholic             | 7     | number (default)  | Frequency of alcohol consumption 1) several times a day, 2) every day, 3) several times a week, 4) once a week, 5) hardly ever or never (0, 1) |
| smoking               | 8     | integer (default) | Smoking habit 1) never, 2) occasional 3) daily. (-1, 0, 1)                                                                                     |
| sitting               | 9     | number (default)  | Number of hours spent sitting per day ene-16 (0, 1)                                                                                            |
| output                | 10    | string (default)  | Output: Diagnosis normal (N), altered (O)                                                                                                      |

In [12]:
%matplotlib inline

import matplotlib
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

In [13]:
df = pd.read_csv("fertility_Diagnosis.csv",header=None)
df = df.replace(to_replace='?', value=0)
df = df.apply(pd.to_numeric, errors='ignore')
#df_new = pd.DataFrame(df.to_numpy()).convert_dtypes()
le = LabelEncoder()
for label, content in df.iteritems():
    if content.dtype == "object":
        df[label] = le.fit_transform(df[label])
df1 = df.apply(pd.to_numeric, errors='coerce')
df1.fillna(0)
continuous_index =  df1.dtypes[df1.dtypes == "float64"].index.values.tolist()

In [14]:
df.dtypes

0    float64
1    float64
2      int64
3      int64
4      int64
5      int64
6    float64
7      int64
8    float64
9      int64
dtype: object

In [15]:
continuous_index

[0, 1, 6, 8]

In [16]:
df1

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,-0.33,0.69,0,1,1,0,0.8,0,0.88,0
1,-0.33,0.94,1,0,1,0,0.8,1,0.31,1
2,-0.33,0.50,1,0,0,0,1.0,-1,0.50,0
3,-0.33,0.75,0,1,1,0,1.0,-1,0.38,0
4,-0.33,0.67,1,1,0,0,0.8,-1,0.50,1
...,...,...,...,...,...,...,...,...,...,...
95,-1.00,0.67,1,0,0,0,1.0,-1,0.50,0
96,-1.00,0.61,1,0,0,0,0.8,0,0.50,0
97,-1.00,0.67,1,1,1,0,1.0,-1,0.31,0
98,-1.00,0.64,1,0,1,0,1.0,0,0.19,0


In [17]:
df2 = pd.DataFrame()
for i in df1.dtypes.index.values.tolist():
    if(i in continuous_index):
        npa = df1.iloc[:,i].to_numpy()
        npa1 = npa[~np.isnan(npa)]
        bins = np.histogram_bin_edges(npa1)
        npa2 = np.digitize(npa, bins)
        df2.insert(loc=i,column=i,value=npa2)
    else:
        series = df1.iloc[:,i]
        df2.insert(loc=i,column=i,value=series)
df2

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,4,4,0,1,1,0,8,0,9,0
1,4,9,1,0,1,0,8,1,3,1
2,4,1,1,0,0,0,11,-1,5,0
3,4,6,0,1,1,0,11,-1,4,0
4,4,4,1,1,0,0,8,-1,5,1
...,...,...,...,...,...,...,...,...,...,...
95,1,4,1,0,0,0,11,-1,5,0
96,1,3,1,0,0,0,8,0,5,0
97,1,4,1,1,1,0,11,-1,3,0
98,1,3,1,0,1,0,11,0,2,0


In [18]:
header_names=[]
for i in range(0, len(df2.columns)-1):
    header_names.append('X'+str(i))
header_names.append('Y1')
df2.to_csv("fertility_Diagnosis-discretized.csv", index=False, header=header_names)