## Analyze Healthcare Data
Now, let’s import all the necessary libraries that we need to analyze the healthcare data with python:

In [None]:
# import requisite libraries 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Now let’s read the data and have a quick look at some initial rows from the data:

data = pd.read_csv("../input/ntr-arogya-seva-2017/ntrarogyaseva.csv")
data.head()

In [None]:
# print summary statistics
data.describe(include='all')

Now to analyze this healthcare data in a better way we need to first look at how is the data distributed into columns. So let’s have a quick look at the columns of the dataset:

In [None]:
# display all the column names in the data
data.columns

## Data Exploration
value_counts () is a Pandas function that can be used to print data distributions (in the specified column). Let’s start by checking the gender statistics of the data:

In [None]:
# Display the counts of each value in the SEX column
data['SEX'].value_counts()

It appears that there are duplicate values ​​in this column. Male and MALE are not two different sexes. We can substitute the column names to resolve this issue. I will also rename Male (Child) -> Boy and Female (Child) -> Girl for convenience:

In [None]:
# mappings to standardize and clean the values
mappings = {'MALE' : 'Male', 'FEMALE' : 'Female', 'Male(Child)' : 'Boy', 'Female(Child)' : 'Girl'}
# replace values using the defined mappings
data['SEX'] = data['SEX'].replace(mappings)
data['SEX'].value_counts()

Viewing the above distribution can be done easily using Pandas’ built-in plot feature:

In [None]:
# plot the value counts of sex 
data['SEX'].value_counts().plot.bar()

In [None]:
# print the mean, median and mode of the age distribution
print("Mean: {}".format(data['AGE'].mean()))
print("Median: {}".format(data['AGE'].median()))
print("Mode: {}".format(data['AGE'].mode()))

Top 10 current ages of data. Do not hesitate to play by replacing 10 with the number of your choice:

In [None]:
# print the top 10 ages
data['AGE'].value_counts().head(10)

Boxplots are commonly used to visualize a distribution when bar charts or point clouds are too difficult to understand:

In [None]:
# better looking boxplot (using seaborn) for age variable
sns.boxplot(data['AGE'])

## Analyze Healthcare Data Deeply
What if I wanted to analyze only the records relating to Krishna district? I should select a subset of data to continue. Fortunately, Pandas can help us do this too, in two steps: 1. Condition to be satisfied: data [‘DISTRICT_NAME’] == ‘Krishna’ 2. Insertion of the condition in the dataframe: data [data [‘DISTRICT_NAME’] == “Krishna”]:

In [None]:
# subset involving only records of Krishna district
data[data['DISTRICT_NAME']=='Krishna'].head()

Now, if we want the most common surgery, at the district level, this can be done by going through all the district names and selecting the data subset for that district:

In [None]:
# Most common surgery by district
for i in data['DISTRICT_NAME'].unique():
    print("District: {}nDisease and Count: {}".format(i,data[data['DISTRICT_NAME']==i]['SURGERY'].value_counts().head(1)))

We note that only two surgeries dominate all the districts: Dialysis (7 districts) Long bone fracture (6 districts).

Now, let’s have a look at the average claim amount district wise:

In [None]:
# Average claim amount for surgery by district
for i in data['DISTRICT_NAME'].unique():
    print("District: {}nAverage Claim Amount: ₹{}".format(i,data[data['DISTRICT_NAME']==i]['CLAIM_AMOUNT'].mean()))

Now let’s look at the surgery statistics to analyze this healthcare data. I will use the Pandas GroupBy concept to collect statistics by grouping data by category of surgery. The Pandas groupby works similarly to the SQL command of the same name:

In [None]:
# group by surgery category to get mean statistics
data.groupby('CATEGORY_NAME').mean()

Cochlear implant surgery appears to be the most expensive surgery, costing an average of ₹ 520,000. Prostheses cost ₹ 1,200, the cheapest. The youngest age group is also that of cochlear implant surgery: 1.58 years, while neurology has an average age of 56 years.