# Medical Insurance Cost Prediction
This notebook demonstrates the process of predicting medical insurance costs using a dataset. It includes data loading, preparation, visualization, and analysis.

## Data Loading and Preparation
### Downloading the Dataset
We will download the dataset from a GitHub repository. The dataset contains information about individuals' medical insurance costs and related features.

In [2]:
from urllib.request import urlretrieve

csv_url = 'https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv'

urlretrieve(csv_url, 'insurance.csv')

('insurance.csv', <http.client.HTTPMessage at 0x1cffa076210>)

### Loading the Data
We will now load the downloaded CSV file into a pandas DataFrame for further analysis.

In [3]:
import pandas as pd

medical_df = pd.read_csv('insurance.csv')

### Displaying the Data
Here is a preview of the dataset to understand its structure and contents.

In [4]:
medical_df

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


In [5]:
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Setting Visualization Styles
We will configure the default styles for our visualizations to make them more appealing.

In [6]:
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

### Statistical Summary of Age
We will calculate and display the statistical summary of the `age` column to understand the distribution of ages in the dataset.

In [7]:
medical_df.age.describe()

count    1338.000000
mean       39.207025
std        14.049960
min        18.000000
25%        27.000000
50%        39.000000
75%        51.000000
max        64.000000
Name: age, dtype: float64

### Age Distribution
The following histogram shows the distribution of ages in the dataset. A box plot is also included to highlight the spread and any potential outliers.

In [None]:
fig = px.histogram(medical_df, 
                   x='age', 
                   marginal='box', 
                   nbins=47, 
                   title='Distribution of Age')
fig.update_layout(bargap=0.1, 
                  title_font_size=20, 
                  title_x=0.5, 
                  xaxis_title='Age', 
                  yaxis_title='Count')
fig.update_traces(marker_color='ora', marker_line_color='black', marker_line_width=1.5)
fig.show()

#### Observations:
- The age distribution appears to be fairly uniform, with no significant peaks or troughs.
- The box plot indicates that there are no extreme outliers in the age data.