# U.S. Medical Insurance Costs

## Tasks

[x] Import a dataset into your program
- Analyze a dataset by building out functions or class methods
- Use libraries to assist in your analysis
- Optional: Document and organize your findings
- Optional: Make predictions about a dataset’s features based on your findings

## Scoping Your Project

### Preparation

[x] check the data
    - check for missing data
    - clean the data
        - convert _nominal_ and _ordinal_ data from `str` to `int` (e.g. _binary_ (_nominal_) `sex` data to `0` or `1`)[1^]
[^1]: The Pandas Category Data Type
When working with categorical variables in Python, especially ordinal categorical variables, it can often be advantageous to use the Pandas specific category datatype, which allows you to store category names with associated values and rankings.

### Exploration

- [ ] analyse data by the means of descriptive statistics
    - e.g. mean, min, max, median, mean, Boxplot  

### Exploitation

- [ ] check for by the means of exploratory statistics
    - check for linear correlations R^2
    - graph and chart data
    - make a (machine learning) model 






#### Import a dataset into your program
1. check whether are headers available

In [95]:
import csv

with open('insurance.csv',mode='r',newline='') as insurance_csv:
    sniffer = csv.Sniffer()
    insurance_reader = insurance_csv.read()
    has_header = sniffer.has_header(insurance_reader)
    print(has_header)

True


2. read the csv file

In [96]:
with open('insurance.csv',mode='r',newline='') as insurance_csv:
  reader = csv.DictReader(insurance_csv, restval=None)

  insurance_data = []
  for row in reader:
    insurance_data.append(row)
  print('first rows of source data:')
  print(insurance_data[:1])

# test = [{'height':20},{None:True},{'age':None}]
count_missing_data = 0
for dict in insurance_data:
  if None in dict.values():
      count_missing_data += 1

print('Total of {} instances is missing in the source data'.format(count_missing_data))

first rows of source data:
[{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}]
Total of 0 instances is missing in the source data


3. clean the data
- convert _nominal_ and _ordinal_ data from `str` to `int` (e.g. _binary_ (_nominal_) `sex` data to `0` or `1`)[1^]

In [97]:
sex_mapping = {
    'category': 'sex',
    'male': 0,
    'female' : 1
}

smoker_mapping = {
    'category': 'smoker',
    'yes': 1,
    'no' : 0
}

def convert_nominal_to_int(in_data:list, mapping:dict) -> list:
    for datum in in_data:
        key = mapping['category']
        in_value = datum[key]
        out_value = mapping[in_value]
        
        datum[key] = out_value

convert_nominal_to_int(insurance_data,sex_mapping)
convert_nominal_to_int(insurance_data,smoker_mapping)

print(insurance_data[0:10])

[{'age': '19', 'sex': 1, 'bmi': '27.9', 'children': '0', 'smoker': 1, 'region': 'southwest', 'charges': '16884.924'}, {'age': '18', 'sex': 0, 'bmi': '33.77', 'children': '1', 'smoker': 0, 'region': 'southeast', 'charges': '1725.5523'}, {'age': '28', 'sex': 0, 'bmi': '33', 'children': '3', 'smoker': 0, 'region': 'southeast', 'charges': '4449.462'}, {'age': '33', 'sex': 0, 'bmi': '22.705', 'children': '0', 'smoker': 0, 'region': 'northwest', 'charges': '21984.47061'}, {'age': '32', 'sex': 0, 'bmi': '28.88', 'children': '0', 'smoker': 0, 'region': 'northwest', 'charges': '3866.8552'}, {'age': '31', 'sex': 1, 'bmi': '25.74', 'children': '0', 'smoker': 0, 'region': 'southeast', 'charges': '3756.6216'}, {'age': '46', 'sex': 1, 'bmi': '33.44', 'children': '1', 'smoker': 0, 'region': 'southeast', 'charges': '8240.5896'}, {'age': '37', 'sex': 1, 'bmi': '27.74', 'children': '3', 'smoker': 0, 'region': 'northwest', 'charges': '7281.5056'}, {'age': '37', 'sex': 0, 'bmi': '29.83', 'children': '2', 

In [98]:
def change_type(in_data:list, category: str, out_type: 'function') -> list:
    for datum in in_data:
        key = category
        in_value = datum[key]
        out_value = out_type(in_value)

        datum[key] = out_value

change_type(insurance_data, 'age', float)
change_type(insurance_data, 'bmi', float)
change_type(insurance_data, 'charges', float)

print(insurance_data[0:10])

[{'age': 19.0, 'sex': 1, 'bmi': 27.9, 'children': '0', 'smoker': 1, 'region': 'southwest', 'charges': 16884.924}, {'age': 18.0, 'sex': 0, 'bmi': 33.77, 'children': '1', 'smoker': 0, 'region': 'southeast', 'charges': 1725.5523}, {'age': 28.0, 'sex': 0, 'bmi': 33.0, 'children': '3', 'smoker': 0, 'region': 'southeast', 'charges': 4449.462}, {'age': 33.0, 'sex': 0, 'bmi': 22.705, 'children': '0', 'smoker': 0, 'region': 'northwest', 'charges': 21984.47061}, {'age': 32.0, 'sex': 0, 'bmi': 28.88, 'children': '0', 'smoker': 0, 'region': 'northwest', 'charges': 3866.8552}, {'age': 31.0, 'sex': 1, 'bmi': 25.74, 'children': '0', 'smoker': 0, 'region': 'southeast', 'charges': 3756.6216}, {'age': 46.0, 'sex': 1, 'bmi': 33.44, 'children': '1', 'smoker': 0, 'region': 'southeast', 'charges': 8240.5896}, {'age': 37.0, 'sex': 1, 'bmi': 27.74, 'children': '3', 'smoker': 0, 'region': 'northwest', 'charges': 7281.5056}, {'age': 37.0, 'sex': 0, 'bmi': 29.83, 'children': '2', 'smoker': 0, 'region': 'northeas

#### Analyze a dataset by building out functions or class methods

In [113]:
def find_max_list(data:list,category:str) -> list:
    max_value = 0
    max_datum = []
    for datum in data:
        value = datum[category]
        if value > max_value:
            max_datum = [datum]
            max_value = value
        elif value == max_value:
            max_datum.append(datum)


    print('The maximum value for the {category} is {max_value} in the datum {max_datum}'.format(category = category, max_value = max_value, max_datum = max_datum))
    return max_datum


find_max_list(data = insurance_data, category = 'bmi')
find_max_list(data = insurance_data, category = 'age')

def find_min_single(data:list,category:str) -> dict:
    min_datum = min(data, key = lambda datum: datum[category])
    min_value = min_datum[category]
    # min_value = min(iterable = data, key = lambda datum: datum[category])
    
    print('The minimum value for the {category} is {min_value} for exampel in the datum {min_datum}'.format(category = category, min_value = min_value, min_datum = min_datum))

find_min_single(data = insurance_data, category = 'bmi')
find_min_single(data = insurance_data, category = 'age')



The maximum value for the bmi is 53.13 in the datum [{'age': 18.0, 'sex': 0, 'bmi': 53.13, 'children': '0', 'smoker': 0, 'region': 'southeast', 'charges': 1163.4627}]
The maximum value for the age is 64.0 in the datum [{'age': 64.0, 'sex': 0, 'bmi': 24.7, 'children': '1', 'smoker': 0, 'region': 'northwest', 'charges': 30166.61817}, {'age': 64.0, 'sex': 1, 'bmi': 31.3, 'children': '2', 'smoker': 1, 'region': 'southwest', 'charges': 47291.055}, {'age': 64.0, 'sex': 1, 'bmi': 39.33, 'children': '0', 'smoker': 0, 'region': 'northeast', 'charges': 14901.5167}, {'age': 64.0, 'sex': 1, 'bmi': 33.8, 'children': '1', 'smoker': 1, 'region': 'southwest', 'charges': 47928.03}, {'age': 64.0, 'sex': 0, 'bmi': 34.5, 'children': '0', 'smoker': 0, 'region': 'southwest', 'charges': 13822.803}, {'age': 64.0, 'sex': 1, 'bmi': 30.115, 'children': '3', 'smoker': 0, 'region': 'northwest', 'charges': 16455.70785}, {'age': 64.0, 'sex': 0, 'bmi': 25.6, 'children': '2', 'smoker': 0, 'region': 'southwest', 'charg

In [None]:
import pandas as pd
import numpy as np
import plotly as pl