# Objective :

An NGO organisation takes initiatives to improve primary education in India and want to carry out this program in Karnataka. It wants to target districts that fall behind in areas such as 

- Education Infrastructure
- Education Awareness
- Demographic features

Identify such districts that could be targeted in its first phase.

The source data for this exercise is obtained from data.gov.in

## Goal :

The goal of this notebook was primarily to:

1.       Explain the data, define your target and come up with features that can be used for modelling.

2.       Create a model based on your features and come up with the list of target districts

3.       Detailed analysis to include all components such as 

        - Data fetch
        - Data cleansing
        - Exploratory data analysis
        - Summary and Data Visualization
        


# Import Libraries

In [1]:
#Import Required Libraries

# Import Pandas for data manipulation using dataframes
import pandas as pd

#Import Numpy for statistical calculations
import numpy as np

# Import Warnings 
import warnings
warnings.filterwarnings('ignore')

# Import Bokeh Library for data visualisation
from bokeh.io import output_notebook
output_notebook()

from bokeh.models import ColumnDataSource
from bokeh.models import HoverTool
from bokeh.models import LinearInterpolator,CategoricalColorMapper
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.palettes import Spectral8

## Data Fetching

In [2]:
# Read the raw data from the csv file extracted from data.gov.in into a dataframe object
data = pd.read_csv('./data/Town-wise-education - Karnataka.csv')

## Exploratory Data Analysis

In [3]:
# Let us have a quick glance of what the data looks like by observing the first and last few rows

data.head()

Unnamed: 0,Table Name,State Code,District Code,Town Code,Total/ Rural/ Urban,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
0,C2308,29,1,40117000,Urban,Belgaum (M Corp.),All ages,399653,204598,195055,...,362,7143,5210,1933,41152,26488,14664,3,2,1
1,C2308,29,1,40117000,Urban,Belgaum (M Corp.),0-6,47642,24768,22874,...,0,0,0,0,0,0,0,0,0,0
2,C2308,29,1,40117000,Urban,Belgaum (M Corp.),7,6759,3495,3264,...,0,0,0,0,0,0,0,0,0,0
3,C2308,29,1,40117000,Urban,Belgaum (M Corp.),8,8067,4152,3915,...,0,0,0,0,0,0,0,0,0,0
4,C2308,29,1,40117000,Urban,Belgaum (M Corp.),9,6948,3559,3389,...,0,0,0,0,0,0,0,0,0,0


In [4]:
data.tail()

Unnamed: 0,Table Name,State Code,District Code,Town Code,Total/ Rural/ Urban,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
807,C2308,29,26,42604000,Urban,Mysore (M Corp.),65-69,13904,6773,7131,...,9,289,228,61,1898,1585,313,0,0,0
808,C2308,29,26,42604000,Urban,Mysore (M Corp.),70-74,10754,5501,5253,...,3,149,118,31,1177,1012,165,0,0,0
809,C2308,29,26,42604000,Urban,Mysore (M Corp.),75-79,5359,2728,2631,...,2,73,62,11,574,503,71,0,0,0
810,C2308,29,26,42604000,Urban,Mysore (M Corp.),80+,6236,2848,3388,...,2,57,42,15,410,344,66,0,0,0
811,C2308,29,26,42604000,Urban,Mysore (M Corp.),Age not stated,539,267,272,...,1,12,4,8,44,28,16,0,0,0


In [5]:
# Lets us examine the shape of this dataset
data.shape

(812, 46)

This means that we have 812 dimensions(rows) and 46 features (columns) in this dataset.

In [6]:
# Now let us explore the data types of the dataset
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 812 entries, 0 to 811
Data columns (total 46 columns):
Table Name                                                                                   812 non-null object
State Code                                                                                   812 non-null int64
District Code                                                                                812 non-null int64
Town Code                                                                                    812 non-null int64
Total/ Rural/ Urban                                                                          812 non-null object
Area Name                                                                                    812 non-null object
Age-Group                                                                                    812 non-null object
Total - Persons                                                                              812 non-null i

The above observation shows that there are categorical and numerical features in the dataset.Let us explore further...

Now let us find out how many unique categories are available from the above categorical features

In [7]:
# Let us examine if there are any nulls in the dataset

data.isnull().sum()

Table Name                                                                                   0
State Code                                                                                   0
District Code                                                                                0
Town Code                                                                                    0
Total/ Rural/ Urban                                                                          0
Area Name                                                                                    0
Age-Group                                                                                    0
Total - Persons                                                                              0
Total - Males                                                                                0
Total - Females                                                                              0
Illiterate - Persons                              

In [8]:
# Let us also look at the entire metrics including the inter quartile range,mean,standard deviation for all the features
data.describe(include = 'all')

Unnamed: 0,Table Name,State Code,District Code,Town Code,Total/ Rural/ Urban,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
count,812,812.0,812.0,812.0,812,812,812,812.0,812.0,812.0,...,812.0,812.0,812.0,812.0,812.0,812.0,812.0,812.0,812.0,812.0
unique,1,,,,1,28,29,,,,...,,,,,,,,,,
top,C2308,,,,Urban,Raichur (CMC),0-6,,,,...,,,,,,,,,,
freq,812,,,,812,29,28,,,,...,,,,,,,,,,
mean,,29.0,15.035714,41509820.0,,,,27507.64,14222.53,13285.11,...,26.519704,482.08867,371.315271,110.773399,3043.32266,1905.359606,1137.963054,0.054187,0.03202,0.022167
std,,0.0,6.714906,671824.5,,,,164098.3,85432.87,78673.9,...,239.305779,3034.946141,2344.507666,693.711738,22771.068866,13770.026347,9031.48477,0.429515,0.279085,0.220975
min,,29.0,1.0,40117000.0,,,,50.0,23.0,27.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,,29.0,11.25,41127000.0,,,,2916.0,1466.75,1425.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,,29.0,16.5,41655000.0,,,,5657.5,2880.5,2710.0,...,0.0,12.0,9.0,2.0,7.5,4.5,2.0,0.0,0.0,0.0
75%,,29.0,20.0,42009250.0,,,,13917.5,7124.5,6696.0,...,4.0,199.75,153.5,48.25,1205.0,844.0,252.75,0.0,0.0,0.0


In [9]:
# Now let us look in detail the categorical features.For this basically extract all the categorical features 
# into a dataframe object

categorical_features = data.select_dtypes(include=[np.object])
categorical_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 812 entries, 0 to 811
Data columns (total 4 columns):
Table Name             812 non-null object
Total/ Rural/ Urban    812 non-null object
Area Name              812 non-null object
Age-Group              812 non-null object
dtypes: object(4)
memory usage: 25.5+ KB


#### Let us observe the unique categories for all the object varables

In [10]:
for column_name in data.columns:
    if data[column_name].dtypes == 'object':
        data[column_name] = data[column_name].fillna(data[column_name].mode().iloc[0])
        unique_category = len(data[column_name].unique())
        print("Feature '{column_name}' has '{unique_category}' unique categories".format(column_name = column_name,
                                                                                         unique_category=unique_category))

Feature 'Table Name' has '1' unique categories
Feature 'Total/ Rural/ Urban' has '1' unique categories
Feature 'Area Name' has '28' unique categories
Feature 'Age-Group' has '29' unique categories


So based on the above results it is evident that 'Table Name' and 'Total/Rural/Urban' categorical features are redundant in nature which can be eliminated as it has no significance.

We can also eliminate 'State Code' as we are dealing with only Karnataka.

#### Let us now quickly see if there are any null values in the dataset

From above observations we can conclude that we do not need to do any imputation as there are no missing values.

Before we jump into data visualisations and explore further let us observe the general metrics using the describe function

ok now let me explore more about the data in detail and come up with some basic observations

## Data Cleansing


Now let us drop some of the columns as discussed above which has no importance in our EDA such as 
- 'Table Name'
- 'State Code'
- 'Total/Rural/Urban'

In [11]:
 data.drop('Table Name',axis =1,inplace = True)
 data.drop('State Code',axis =1,inplace = True)
 data.drop('Total/ Rural/ Urban',axis =1,inplace = True)


Let us further get more insights of the data by observing first few records say for a district & town 

In [12]:
data.head(29)

Unnamed: 0,District Code,Town Code,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,Illiterate - Persons,Illiterate - Males,Illiterate - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
0,1,40117000,Belgaum (M Corp.),All ages,399653,204598,195055,91358,36857,54501,...,362,7143,5210,1933,41152,26488,14664,3,2,1
1,1,40117000,Belgaum (M Corp.),0-6,47642,24768,22874,47642,24768,22874,...,0,0,0,0,0,0,0,0,0,0
2,1,40117000,Belgaum (M Corp.),7,6759,3495,3264,1375,662,713,...,0,0,0,0,0,0,0,0,0,0
3,1,40117000,Belgaum (M Corp.),8,8067,4152,3915,568,292,276,...,0,0,0,0,0,0,0,0,0,0
4,1,40117000,Belgaum (M Corp.),9,6948,3559,3389,275,137,138,...,0,0,0,0,0,0,0,0,0,0
5,1,40117000,Belgaum (M Corp.),10,9586,5009,4577,368,164,204,...,0,0,0,0,0,0,0,0,0,0
6,1,40117000,Belgaum (M Corp.),11,7315,3751,3564,160,63,97,...,0,0,0,0,0,0,0,0,0,0
7,1,40117000,Belgaum (M Corp.),12,9326,4872,4454,337,133,204,...,0,0,0,0,0,0,0,0,0,0
8,1,40117000,Belgaum (M Corp.),13,7360,3710,3650,198,68,130,...,0,0,0,0,0,0,0,0,0,0
9,1,40117000,Belgaum (M Corp.),14,7983,4159,3824,272,127,145,...,0,0,0,0,0,0,0,0,0,0


#### Few Data Observations:

The general observation observed in each of the district are as follows
- 29 unique records for age group for each town code
    - 'All Ages' category is a summation of all ages 
- All of the below 12 categories are depicted in the form of persons ,male and female where persons is summation of male and female
    - Illiterate
    - Literate
    - Educational Level - Literate without Educational Level
    - Educational Level - Below Primary 
    - Educational Level - Primary
    - Educational Level - Middle
    - Educational Level - Matric/Secondary
    - Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary
    - Educational Level - Non-technical Diploma or Certificate Not Equal to Degree
    - Educational Level - Technical Diploma or Certificate Not Equal to Degree
    - Educational Level - Graduate & Above
    - Unclassified
Note : Also we have 'Total - Persons', 'Total - Male','Total - Female' which do not have any significance to our analysis as these are summation of all the above categories person/male/female wise for each of the above category

#### Current Focus :

 - Since our main focus of this exercise is to improve the 'Primary Education'. So the analysis going forward  as per my assumption that the following categories fall in the need of primary education and rest not
    - Illiterate 
    - Educational Level - Literate without Educational Level
    - Educational Level - Below Primary 
    - Educational Level - Primary
    - Unclassified
    
   Which means that the following features can be eliminated from the dataset
   
    Total - Persons                                                                              
    Total - Males                                                                                
    Total - Females                                                                              
    Literate - Persons                                                                           
    Literate - Males                                                                             
    Literate - Females                                                                           
    Educational Level - Middle Persons                                                           
    Educational Level - Middle Males                                                             
    Educational Level - Middle Females                                                           
    Educational Level - Matric/Secondary Persons                                                 
    Educational Level - Matric/Secondary Males                                                   
    Educational Level - Matric/Secondary Females                                                 
    Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Persons    
    Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Males      
    Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Females    
    Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Persons         
    Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Males           
    Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females         
    Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons             
    Educational Level - Technical Diploma or Certificate Not Equal to Degree Males               
    Educational Level - Technical Diploma or Certificate Not Equal to Degree Females             
    Educational Level - Graduate & Above Persons                                                 
    Educational Level - Graduate & Above Males                                                   
    Educational Level - Graduate & Above Females 
    
 
 #### Key Note:

For each district code & town code we have the total of all age groups as 'All Ages' category in 'Age Group' feature .In my opinion it is irrelevant as we need to focus on age groups which need primary education.So lets go ahead and remove these rows in the dataset

In [13]:
data = data[data['Age-Group'] != 'All ages']

In [14]:
# Now let us look in detail the numerical features that need to be dropped as they do not contribute to the primary education.

columns = [ 'Total - Persons',
           'Total - Males',
           'Total - Females',
           'Literate - Persons',
           'Literate - Males',
           'Literate - Females',
           'Educational Level - Middle Persons',
           'Educational Level - Middle Males',
           'Educational Level - Middle Females',
           'Educational Level - Matric/Secondary Persons',
           'Educational Level - Matric/Secondary Males',
           'Educational Level - Matric/Secondary Females',
           'Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Persons',
           'Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Males',
           'Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Females',
           'Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Persons',
           'Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Males',
           'Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females',
           'Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons',
           'Educational Level - Technical Diploma or Certificate Not Equal to Degree Males',
           'Educational Level - Technical Diploma or Certificate Not Equal to Degree Females',
           'Educational Level - Graduate & Above Persons',
           'Educational Level - Graduate & Above Males',
           'Educational Level - Graduate & Above Females']                                                                            

data.drop(columns,axis =1,inplace = True)

## Exploratory Data Analysis:

In this section I am going to visualize data with respect to categories focused on improving 'Primary Education'.

- Illiterate 
- Educational Level - Literate without Educational Level
- Educational Level - Below Primary 
- Educational Level - Primary
- Unclassified

#### I have used Bokeh interactive data visualisation library to visualise data .Please note that the mouse hover over function is enabled to see the data visualisations for each district based on the above mentioned categories depicted below .Also please note the size of the circle depicts the size of the feature .

The visualisations depict district and age wise data representations for each of the above mentioned groups including total count as well as male and female counts.

#### Illeiterate :

In this section you can observe the data visualisation of the following features with respect to district and age group 

- Illiterate - Persons
- Illiterate - Males
- Illiterate - Females

In [15]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Illiterate - Persons'],
    area = data['Area Name'],
    illerate = data['Illiterate - Persons'],
    illerate_male = data['Illiterate - Males'],
    illerate_female = data['Illiterate - Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Illiterate - Persons'].min(),data['Illiterate - Persons'].max()],
    y = [2,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,100000))

p = figure(title = 'Illiteracy District/Area Wise',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Illerate - Total Persons ','@illerate'),
                           ('Illerate - Total Males ','@illerate_male'),
                           ('Illerate - Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Illiterates',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         legend = 'area',
         source = source)
p.legend.location = (0,-50)
p.right.append(p.legend[0])
p.legend.border_line_color = None
show(p,notebook_handle=True)

#### Educational Level - Literate without Educational Level :

In this section you can observe the data visualisation of the following features with respect to district and age group 
- Educational Level - Literate without Educational Level - Persons
- Educational Level - Literate without Educational Level - Males
- Educational Level - Literate without Educational Level - Females

In [16]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Educational Level - Literate without Educational Level Persons'],
    area = data['Area Name'],
    illerate = data['Educational Level - Literate without Educational Level Persons'],
    illerate_male = data['Educational Level - Literate without Educational Level Males'],
    illerate_female = data['Educational Level - Literate without Educational Level Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Educational Level - Literate without Educational Level Persons'].min(),
         data['Educational Level - Literate without Educational Level Persons'].max()],
    y = [2,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,6000))

p = figure(title = 'Educational Level - Literate without Educational Level (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Educational Level - Literate without Educational Level - Total Persons ','@illerate'),
                           ('Educational Level - Literate without Educational Level - Total Males ','@illerate_male'),
                           ('Educational Level - Literate without Educational Level - Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Educational Level - Literate without Educational Level',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         legend = 'area',
         source = source)
p.legend.location = (0,-50)
p.right.append(p.legend[0])
p.legend.border_line_color = None
show(p,notebook_handle=True)

#### Educational Level - Below Primary:

In this section you can observe the data visualisation of the following features with respect to district and age group 
- Educational Level - Below Primary - Persons
- Educational Level - Below Primary - Males
- Educational Level - Below Primary - Females

In [17]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Educational Level - Below Primary Persons'],
    area = data['Area Name'],
    illerate = data['Educational Level - Below Primary Persons'],
    illerate_male = data['Educational Level - Below Primary Males'],
    illerate_female = data['Educational Level - Below Primary Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Educational Level - Below Primary Persons'].min(),
         data['Educational Level - Below Primary Persons'].max()],
    y = [2,50]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,100000))

p = figure(title = 'Educational Level - Below Primary (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Educational Level - Below Primary Total Persons ','@illerate'),
                            ('Educational Level - Below Primary Total Males ','@illerate_male'),
                           ('Educational Level -  Below Primary Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Educational Level - Below Primary Persons',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         legend = 'area',
         source = source)
p.legend.location = (0,-50)
p.right.append(p.legend[0])
p.legend.border_line_color = None
show(p,notebook_handle=True)

#### Educational Level - Primary:

In this section you can observe the data visualisation of the following features with respect to district and age group 
- Educational Level - Primary - Persons
- Educational Level - Primary - Males
- Educational Level - Primary - Females

In [18]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Educational Level - Primary Persons'],
    area = data['Area Name'],
    illerate = data['Educational Level - Primary Persons'],
    illerate_male = data['Educational Level - Primary Males'],
    illerate_female = data['Educational Level - Primary Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Educational Level - Primary Persons'].min(),
         data['Educational Level - Primary Persons'].max()],
    y = [2,50]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,100000))

p = figure(title = 'Educational Level - Primary (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Educational Level -  Primary Total Persons ','@illerate'),
                           ('Educational Level -  Primary Total Male ','@illerate_male'),
                           ('Educational Level -  Primary Total Female ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Educational Level - Primary Persons',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         legend = 'area',
         source = source)
p.legend.location = (0,-50)
p.right.append(p.legend[0])
p.legend.border_line_color = None
show(p,notebook_handle=True)

#### Unclassified:

In this section you can observe the data visualisation of the following features with respect to district and age group 
- Unclassified - Persons
- Unclassified - Males
- Unclassified - Females

In [19]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Unclassified - Persons'],
    area = data['Area Name'],
    illerate = data['Unclassified - Persons'],
    illerate_male = data['Unclassified - Males'],
    illerate_female = data['Unclassified - Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Unclassified - Persons'].min(),
         data['Unclassified - Persons'].max()],
    y = [1,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,400))

p = figure(title = 'Unclassified -  (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Unclassified - Total Persons ','@illerate'),
                           ('Unclassified - Total Males ','@illerate_male'),
                           ('Unclassified - Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'Unclassified - Persons',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         legend = 'area',
         source = source)
p.legend.location = (0,-50)
p.right.append(p.legend[0])
p.legend.border_line_color = None
show(p,notebook_handle=True)

Now let us look at summary total count of all categories with respect to total persons,total males & total females for each of the current features of our focus as show below.We are going to create three new features for the same namely
- Total
- Total_Males
- Total_Females

In [20]:
data['Total']=data['Illiterate - Persons']+data['Educational Level - Below Primary Persons']+data['Educational Level - Literate without Educational Level Persons']+data['Educational Level - Primary Persons']+data['Unclassified - Persons']
data['Total_Males']=data['Illiterate - Males']+data['Educational Level - Below Primary Males']+data['Educational Level - Literate without Educational Level Males']+data['Educational Level - Primary Males']+data['Unclassified - Males']
data['Total_Females']=data['Illiterate - Females']+data['Educational Level - Below Primary Females']+data['Educational Level - Literate without Educational Level Females']+data['Educational Level - Primary Females']+data['Unclassified - Females']


Now let us visualise with the above new features created to get a summary holistic view of the entire analysis.

#### Summary (Total):

In this section you can observe the data visualisation of the following features with respect to district and age group 
- Total 
- Total - Males
- Total - Females

In [24]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Total'],
    area = data['Area Name'],
    illerate = data['Total'],
    illerate_male = data['Total_Males'],
    illerate_female = data['Total_Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Total'].min(),
         data['Total'].max()],
    y = [5,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,120000))

p = figure(title = 'Summary -  (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Summary','@illerate'),
                           ('Total Males ','@illerate_male'),
                           ('Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'Total Population needing Primary Education',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         legend = 'area',
         source = source)
p.legend.location = (0,-50)
p.right.append(p.legend[0])
p.legend.border_line_color = None
show(p,notebook_handle=True)

## Conclusion:
    
- The above summary data visualisation depicts that the target districts that need to be focused with respect to primary education as part of the Phase 1 NGO initiative
    - Hubli Darwad 
    - Mysore
    - Bangalore
    - Belguam
    - Gulbarga
    - Bellary
    - Davanagiri
    - Mangalore
- Also it is observed that age groups of 0-6 years and 30-45 years need more attention for most of the cases
Scope of improvement :
To perform more detailed analysis and understand the reasons and come up with a predictive model.
Due to time constraints of doing this exercise this part is left for further exercise .