Analysis on Various Factors Affecting Insurance Costs of U.S. Adults
In this project, I intend to analyze a sample of insurance costs of individuals across the United States, classified by age, sex, bmi, number of children, and inhabiting region. Through my analysis, I aim to find out:
- The variable that has the largest correlation to the cost of health insurance charges
- How the smoking habits in each sex affect the insurance charges
- How the insurance charges differ (and maybe why) in each age group.
Prior to any analysis or examination of the dataset, I initially hypothesize that:
- The variable with the largest correlation to the insurance costs is the "bmi", since it is a good indicator of one's health condition
- The smoking habit of a female incurs a higher insurance charges than that of a male, since researches show that female smokers have higher chance of getting respiratory diseases than do male smokers
- As the age group moves from a younger group to an older group, the insurance charges increase, and the insurance charges of each age group are affected by the 'bmi' variable.
Note: The dataset 'insurance.csv' was acquired from CodeCademy.com.