# TP - Computação Natural
#### "Predict whether a mammogram mass is benign or malignant"

1. BI-RADS assessment: 1 to 5 (ordinal)  
2. Age: patient's age in years (integer)
3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)
5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)
6. Severity: benign=0 or malignant=1 (binominal)

## Import Libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Get the Data

In [None]:
data = pd.read_csv('mammographic_masses.data.txt')
data

** Convert missing data (indicated by a ?) into NaN and add the appropriate column names (BI_RADS, age, shape, margin, density, and severity) **

In [None]:
data = data.replace('?',np.nan)
data.columns = ['BI_RADS','age','shape','margin','density','severity']
data

** Convert datatype 'object' to 'float64' **  

In [None]:
data.info()

In [None]:
data = data.astype(float)
data

In [None]:
data.info()

In [None]:
data.describe()

** The missing data seems randomly distributed, so we drop rows with missing data **

In [None]:
data = data.dropna()
data.index = np.arange(1, len(data) + 1)
data

In [None]:
data.describe()

## Exploratory Data Analysis

** Countplot of the Severity (Benign 0 vs Malignant 1) **

In [None]:
sns.countplot(x='severity',data=data)

** Histogram showing Age based on the Severity column **

In [None]:
sns.set_style('darkgrid')
g = sns.FacetGrid(data,hue="severity",palette='coolwarm',size=6,aspect=2)
g = (g.map(plt.hist,'age',bins=20,alpha=0.7)).add_legend()