# GDP Analysis : India

## Project Brief

The task is, working as the chief data scientist at NITI Aayog, reporting to the CEO. The CEO has initiated a project wherein the NITI Aayog will provide top-level recommendations to the Chief Ministers (CMs) of various states, which will help them prioritise areas of development for their respective states. Since different states are in different phases of development, the recommendations should be specific to the states.

The overall goal of this project is to help the CMs focus on areas that will foster economic development for their respective states. Since the most common measure of economic development is the GDP, we will analyse the GDP of the various states of India and suggest ways to improve it.

## Understanding GDP

Gross domestic product (GDP) at current prices is the GDP at the market value of goods and services produced in a country during a year. In other words, GDP measures the 'monetary value of final goods and services produced by a country/state in a given period of time'.

GDP can be broadly divided into goods and services produced by three sectors: the primary sector (agriculture), the secondary sector (industry), and the tertiary sector (services).

It is also known as nominal GDP. More technically, (real) GDP takes into account the price change that may have occurred due to inflation. This means that the real GDP is nominal GDP adjusted for inflation. We will use the nominal GDP for this exercise. Also, we will consider the financial year 2015-16 as the base year, as most of the data required for this exercise is available for the aforementioned period.

## Per Capita GDP and Income

Total GDP divided by the population gives the per capita GDP, which roughly measures the average value of goods and services produced per person. The per capita income is closely related to the per capita GDP (though they are not the same). In general, the per capita income increases when the per capita GDP increases, and vice-versa. For instance, in the financial year 2015-16, the per capita income of India was ₹93,293, whereas the per capita GDP of India was $1717, which roughly amounts to ₹1,11,605. 

## Data

The data is sourced from https://data.gov.in/, an Open Government Data (OGD) platform of India. The data for GDP analysis of the Indian states is divided into two parts:

Data I-A: This dataset consists of the GSDP (Gross State Domestic Product) data for the states and union territories.

Data I-B: This dataset contains the distribution of GSDP among three sectors: the primary sector (agriculture), the secondary sector (industry) and the tertiary sector (services) along with taxes and subsidies. There is separate dataset for each of the states. 


There are two parts to this project. In the first part, we analyse and compare the GDPs of various Indian states (both total and per capita). The GDP of a state is referred to as the GSDP (Gross State Domestic Product). Then, we divide the states into four categories based on the GDP per capita, and for each of these four categories, analyse the sectors that contribute the most to the GDP (such as agriculture, real estate, manufacturing, etc.).

In the second part, we analyse whether GDP per capita is related to dropout rates in schools and colleges.


Note: We filtered out the union territories (Delhi, Chandigarh, Andaman and Nicobar Islands, etc.) for further analysis, as they are governed directly by the centre, not state governments.

#### Importing the relavant libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Part-I: GDP Analysis of the Indian States

### Part I-A:

In [None]:
# Reading the data
data1a = pd.read_csv('/kaggle/input/GSDP.csv')

In [None]:
data1a.head()

In [None]:
# Basic info regarding the data
data1a.info()

In [None]:
# Observe the various columns in the dataset
data1a.columns

In [None]:
# Remove the rows: (% Growth over the previous year)' and 'GSDP - CURRENT PRICES (in Crore) for the year 2016-17.
data1a = data1a[data1a['Duration'] != '2016-17']
data1a

In [None]:
# Check the total number of null values in each columns
data1a.isnull().sum()

In [None]:
# Check if any column has all the values as NAN
data1a.isnull().all(axis=0)

In [None]:
# removing West Bengal as the whole column is NAN
data1a = data1a.drop('West Bengal1', axis = 1)

In [None]:
data1a

#### Calculating the average growth of states for the duration 2013-14, 2014-15 and 2015-16 by taking the mean of the row '(% Growth over previous year)'. 

In [None]:
data1a.iloc[6:].isnull().sum() # since there are at max. only 1 missing value we can take the average of the other two numbers

In [None]:
avg_growth = data1a.iloc[6:]

In [None]:
avg_growth #dataframe to find the average growth of states

In [None]:
avg_growth.columns

In [None]:
# Taking only the values for the states
average_growth_values = avg_growth[avg_growth.columns[2:34]].mean()  

In [None]:
# Sorting the average growth rate values and then making a dataframe for all the states
average_growth_values = average_growth_values.sort_values()
average_growth_rate = average_growth_values.to_frame(name='Average growth rate')
average_growth_rate

In [None]:
# plotting the average growth rate for all the states
plt.figure(figsize=(12,10), dpi = 300)

sns.barplot(x = average_growth_rate['Average growth rate'], y = average_growth_values.index,palette='viridis')
plt.xlabel('Average Growth Rate', fontsize=12)
plt.ylabel('States', fontsize=12)
plt.title('Average Growth Rate for all the states',fontsize=13)
plt.show()

##### Observations:
1.  We can see an interesting observation from the above plot, the average growth rate has been the maximum for the North East states except for Assam and Meghalaya which is not what we generally expect so we should take a further look at these states.

2. The average growth rate has been least for states like Goa, Odisha, Meghalaya, Sikkim, Jammu & Kashmir etc.

In [None]:
# top 5 states as per average growth rate

average_growth_rate['Average growth rate'][-5:]

In [None]:
# top 5 states as per average growth rate for the years 2013-14, 2014-15, 2015-16

avg_growth[['Mizoram','Tripura','Nagaland','Manipur','Arunachal Pradesh']]

1. We can see that the growth rate for the above states actually decreased substantially for the year 14-15 in comparison to the year 13-14 but as the growth rate was very high for the year 13-14 so the average is higher for these states. 
2. In the absence of data for the year 2015-16 we cannot say definitivily that these are high performing states as their growth rate decreased for the year 2014-15

#### To find out the states that have been growing continuously fast we need to take a look at the Standard Deviation and the Mean growth rate for the states.

In [None]:
#create a dataframe to store the mean and the standard deviation of the growth rate for various states

describe = pd.DataFrame(avg_growth.describe())
describe = describe.T
describe

In [None]:
# states having mean growth rate greater than 12 and standard deviation less than 2

describe[(describe['mean']>12) & (describe['std']<2)]

In [None]:
# states having mean growth rate greater than 13 and standard deviation greater than 2

describe[(describe['mean']<12) & (describe['std']>2)]

### By comparing the average growth rate for the year 2013-14, 2014-15, 2015-16 and the standard deviation.

States that are growing consistently fast are:
1. Andhra Pradesh
2. Assam 
3. Kerala 
4. Tamil Nadu
5. Telangana

States that are struggling are:
1. Goa
2. Meghalaya 
3. Odisha 
4. Jammu & Kashmir
5. Jharkhand

#### Plotting the total GDP of the states for the year 2015-16:
#### Identifying the top 5 and the bottom 5 states based on total GDP.

In [None]:
data1a.head()

In [None]:
# filtering out the data for the year 2015-16 and storing it in a dataframe
total_GDP_15_16 = data1a[(data1a['Items  Description'] == 'GSDP - CURRENT PRICES (` in Crore)') & (data1a['Duration'] == '2015-16')]
total_GDP_15_16

In [None]:
# carrying out necessary transformation to make the data ready for plotting

total_GDP_15_16_states = total_GDP_15_16[total_GDP_15_16.columns[2:34]].transpose()
total_GDP_15_16_states = total_GDP_15_16_states.rename(columns={4: 'Total GDP of States 2015-16'})
total_GDP_15_16_states = total_GDP_15_16_states.dropna()
total_GDP_15_16_states = total_GDP_15_16_states.sort_values('Total GDP of States 2015-16',ascending=True)
total_GDP_15_16_states

In [None]:
plt.figure(figsize=(10,8), dpi = 600)

sns.barplot(x = total_GDP_15_16_states['Total GDP of States 2015-16'], y = total_GDP_15_16_states.index,palette='plasma')
plt.xlabel('Total GDP of States for 2015-16', fontsize=12)
plt.ylabel('States', fontsize=12)
plt.title('Total GDP of States 2015-16 for all the states',fontsize=12)
plt.show()

#### Top 5 states in terms of total GDP for the year 2015-16

In [None]:
top_5_eco = total_GDP_15_16_states[-5:]
top_5_eco

#### Bottom 5 states in terms of total GDP for the year 2015-16

In [None]:
bottom_5_eco = total_GDP_15_16_states[:5]
bottom_5_eco

# Part I-B

#### Reading the CSV files for all the states

In [None]:
Andhra_Pradesh = pd.read_csv('/kaggle/input/NAD-Andhra_Pradesh-GSVA_cur_2016-17.csv')

In [None]:
Arunachal_Pradesh = pd.read_csv('/kaggle/input/NAD-Arunachal_Pradesh-GSVA_cur_2015-16.csv')

In [None]:
Assam = pd.read_csv('/kaggle/input/NAD-Assam-GSVA_cur_2015-16.csv')

In [None]:
Bihar = pd.read_csv('/kaggle/input/NAD-Bihar-GSVA_cur_2015-16.csv')

In [None]:
Chhattisgarh = pd.read_csv('/kaggle/input/NAD-Chhattisgarh-GSVA_cur_2016-17.csv')

In [None]:
Goa = pd.read_csv('/kaggle/input/NAD-Goa-GSVA_cur_2015-16.csv')

In [None]:
Gujarat = pd.read_csv('/kaggle/input/NAD-Gujarat-GSVA_cur_2015-16.csv')

In [None]:
Haryana = pd.read_csv('/kaggle/input/NAD-Haryana-GSVA_cur_2016-17.csv')

In [None]:
Himachal_Pradesh = pd.read_csv('/kaggle/input/NAD-Himachal_Pradesh-GSVA_cur_2014-15.csv')

In [None]:
Jharkhand = pd.read_csv('/kaggle/input/NAD-Jharkhand-GSVA_cur_2015-16.csv')

In [None]:
Karnataka = pd.read_csv('/kaggle/input/NAD-Karnataka-GSVA_cur_2015-16.csv')

In [None]:
Kerala = pd.read_csv('/kaggle/input/NAD-Kerala-GSVA_cur_2015-16.csv')

In [None]:
Madhya_Pradesh = pd.read_csv('/kaggle/input/NAD-Madhya_Pradesh-GSVA_cur_2016-17.csv')

In [None]:
Maharashtra = pd.read_csv('/kaggle/input/NAD-Maharashtra-GSVA_cur_2014-15.csv')

In [None]:
Manipur = pd.read_csv('/kaggle/input/NAD-Manipur-GSVA_cur_2014-15.csv')

In [None]:
Meghalaya = pd.read_csv('/kaggle/input/NAD-Meghalaya-GSVA_cur_2016-17.csv')

In [None]:
Mizoram = pd.read_csv('/kaggle/input/NAD-Mizoram-GSVA_cur_2014-15.csv')

In [None]:
Nagaland = pd.read_csv('/kaggle/input/NAD-Nagaland-GSVA_cur_2014-15.csv')

In [None]:
Odisha = pd.read_csv('/kaggle/input/NAD-Odisha-GSVA_cur_2016-17.csv')

In [None]:
Punjab = pd.read_csv('/kaggle/input/NAD-Punjab-GSVA_cur_2014-15.csv')

In [None]:
Rajasthan = pd.read_csv('/kaggle/input/NAD-Rajasthan-GSVA_cur_2014-15.csv')

In [None]:
Sikkim = pd.read_csv('/kaggle/input/NAD-Sikkim-GSVA_cur_2015-16.csv')

In [None]:
Tamil_Nadu = pd.read_csv('/kaggle/input/NAD-Tamil_Nadu-GSVA_cur_2016-17.csv')

In [None]:
Telangana = pd.read_csv('/kaggle/input/NAD-Telangana-GSVA_cur_2016-17.csv')

In [None]:
Tripura = pd.read_csv('/kaggle/input/NAD-Tripura-GSVA_cur_2014-15.csv')

In [None]:
Uttar_Pradesh = pd.read_csv('/kaggle/input/NAD-Uttar_Pradesh-GSVA_cur_2015-16.csv')

In [None]:
Uttarakhand = pd.read_csv('/kaggle/input/NAD-Uttarakhand-GSVA_cur_2015-16.csv')

### Taking data only for year 2014-15

In [None]:
andhra_pradesh = Andhra_Pradesh[['S.No.','Item', '2014-15']]
andhra_pradesh = andhra_pradesh.rename(columns={'2014-15': 'Andhra_Pradesh'})

arunachal_pradesh = Arunachal_Pradesh[['S.No.','Item', '2014-15']]
arunachal_pradesh = arunachal_pradesh.rename(columns={'2014-15': 'Arunachal_Pradesh'})

assam = Assam[['S.No.','Item', '2014-15']]
assam = assam.rename(columns={'2014-15': 'Assam'})

bihar = Bihar[['S.No.','Item', '2014-15']]
bihar = bihar.rename(columns={'2014-15': 'Bihar'})

chhattisgarh = Chhattisgarh[['S.No.','Item', '2014-15']]
chhattisgarh = chhattisgarh.rename(columns={'2014-15': 'Chhattisgarh'})

goa = Goa[['S.No.','Item', '2014-15']]
goa = goa.rename(columns={'2014-15': 'Goa'})

gujarat = Gujarat[['S.No.','Item', '2014-15']]
gujarat = gujarat.rename(columns={'2014-15': 'Gujarat'})

haryana = Haryana[['S.No.','Item', '2014-15']]
haryana = haryana.rename(columns={'2014-15': 'Haryana'})

himachal_Pradesh = Himachal_Pradesh[['S.No.','Item', '2014-15']]
himachal_Pradesh = himachal_Pradesh.rename(columns={'2014-15': 'Himachal_Pradesh'})

jharkhand = Jharkhand[['S.No.','Item', '2014-15']]
jharkhand = jharkhand.rename(columns={'2014-15': 'Jharkhand'})

karnataka = Karnataka[['S.No.','Item', '2014-15']]
karnataka = karnataka.rename(columns={'2014-15': 'Karnataka'})

kerala = Kerala[['S.No.','Item', '2014-15']]
kerala = kerala.rename(columns={'2014-15': 'Kerala'})

madhya_pradesh = Madhya_Pradesh[['S.No.','Item', '2014-15']]
madhya_pradesh = madhya_pradesh.rename(columns={'2014-15': 'Madhya_Pradesh'})

maharashtra = Maharashtra[['S.No.','Item', '2014-15']]
maharashtra = maharashtra.rename(columns={'2014-15': 'Maharashtra'})

manipur = Manipur[['S.No.','Item', '2014-15']]
manipur = manipur.rename(columns={'2014-15': 'Manipur'})

meghalaya = Meghalaya[['S.No.','Item', '2014-15']]
meghalaya = meghalaya.rename(columns={'2014-15': 'Meghalaya'})

mizoram = Mizoram[['S.No.','Item', '2014-15']]
mizoram = mizoram.rename(columns={'2014-15': 'Mizoram'})

nagaland = Nagaland[['S.No.','Item', '2014-15']]
nagaland = nagaland.rename(columns={'2014-15': 'Nagaland'})

odisha = Odisha[['S.No.','Item', '2014-15']]
odisha = odisha.rename(columns={'2014-15': 'Odisha'})

punjab = Punjab[['S.No.','Item', '2014-15']]
punjab = punjab.rename(columns={'2014-15': 'Punjab'})

rajasthan = Rajasthan[['S.No.','Item', '2014-15']]
rajasthan = rajasthan.rename(columns={'2014-15': 'Rajasthan'})

sikkim = Sikkim[['S.No.','Item', '2014-15']]
sikkim = sikkim.rename(columns={'2014-15': 'Sikkim'})

tamil_nadu = Tamil_Nadu[['S.No.','Item', '2014-15']]
tamil_nadu = tamil_nadu.rename(columns={'2014-15': 'Tamil_Nadu'})

telangana = Telangana[['S.No.','Item', '2014-15']]
telangana = telangana.rename(columns={'2014-15': 'Telangana'})

tripura = Tripura[['S.No.','Item', '2014-15']]
tripura = tripura.rename(columns={'2014-15': 'Tripura'})

uttar_pradesh = Uttar_Pradesh[['S.No.','Item', '2014-15']]
uttar_pradesh = uttar_pradesh.rename(columns={'2014-15': 'Uttar_Pradesh'})

uttarakhand = Uttarakhand[['S.No.','Item', '2014-15']]
uttarakhand = uttarakhand.rename(columns={'2014-15': 'Uttarakhand'})

In [None]:
# Merging all the tables for different states into a single dataframe

dfs = [andhra_pradesh,arunachal_pradesh, assam, bihar, chhattisgarh, goa, gujarat, haryana,himachal_Pradesh,
       jharkhand, karnataka,kerala,madhya_pradesh, maharashtra,manipur,meghalaya,mizoram, nagaland,odisha,
       punjab,rajasthan,sikkim,tamil_nadu,telangana,tripura,uttarakhand, uttar_pradesh]


from functools import reduce
df_final = reduce(lambda left,right: pd.merge(left,right,how ='left',on=['S.No.', 'Item']), dfs)

In [None]:
df_final.columns

In [None]:
# Renaming some of the state names for merging data at a later stage

df_final = df_final.rename(columns={'Andhra_Pradesh':'Andhra Pradesh', 'Arunachal_Pradesh':'Arunachal Pradesh',
                                   'Himachal_Pradesh':'Himachal Pradesh','Madhya_Pradesh':'Madhya Pradesh',
                                   'Tamil_Nadu':'Tamil Nadu','Uttar_Pradesh':'Uttar Pradesh',
                                   'Chhattisgarh':'Chhatisgarh','Uttarakhand':'Uttrakhand'})

In [None]:
# Final dataframe having the data for all the states for all the sectors and subsectors of the economy

df_final

### Creating the GDP per capita Data Frame

In [None]:
gdp_per_capita = df_final.iloc[32][2:].sort_values()
gdp_per_capita = gdp_per_capita.to_frame(name = 'GDP per capita')
gdp_per_capita

#### Plotting GDP per capita

In [None]:
plt.figure(figsize=(12,8), dpi=600)                             

sns.barplot(x = gdp_per_capita['GDP per capita'], y =gdp_per_capita.index, palette='Reds' )
plt.xlabel('GDP per capita', fontsize=12)
plt.ylabel('States', fontsize=12)
plt.title('GDP per capita vs States',fontsize=12)
plt.show()

#### Top 5 states based on GDP per capita

In [None]:
top_5_gdp_per_capita = gdp_per_capita[-5:]
top_5_gdp_per_capita

#### Bottom 5 states based on GDP per capita

In [None]:
bottom_5_gdp_per_capita = gdp_per_capita[:5]
bottom_5_gdp_per_capita

#### Ratio of highest per capita GDP tp the lowest per capita GDP

In [None]:
ratio = gdp_per_capita['GDP per capita'].max()/gdp_per_capita['GDP per capita'].min()
print('The Ratio of highest per capita GDP to the lowest per capita GDP is: ',ratio)

In [None]:
# Identifying the Primary, Secondary and the tertiary sectors and concating these to form a dataframe

primary = df_final[df_final['Item']=='Primary']
secondary = df_final[df_final['Item']=='Secondary']
tertiary = df_final[df_final['Item']=='Tertiary']
gdp = df_final[df_final['Item']=='Gross State Domestic Product']

pst = pd.concat([primary, secondary,tertiary,gdp], axis = 0).reset_index()
pst =  pst.drop(['index','S.No.'], axis = 1).set_index('Item')

In [None]:
pst

In [None]:
# calculating the percentage contribution of each sector to the Gross State Domestic Product for each state

pst.loc['primary_percentage'] = pst.loc['Primary'] / pst.loc['Gross State Domestic Product'] * 100
pst.loc['secondary_percentage'] = pst.loc['Secondary'] / pst.loc['Gross State Domestic Product'] * 100
pst.loc['tertiary_percentage'] = pst.loc['Tertiary'] / pst.loc['Gross State Domestic Product'] * 100

In [None]:
pst

In [None]:
# Transposing the dataframe for better readability

pst = pst.T
pst = pst.sort_values('Gross State Domestic Product')
pst

#### Plotting the percentage contribution of the primary, secondary and tertiary sectors as a percentage of the total GDP for all the states.

In [None]:
plt.figure(figsize=(12,10), dpi =600)

bars1 = pst['primary_percentage']
bars2 = pst['secondary_percentage']
bars3 = pst['tertiary_percentage']
 
legends = ['Primary %', 'Secondary %', 'Tertiary %']

bars = np.add(bars1, bars2).tolist()
 
r = np.arange(0,len(pst.index))
 
names = pst.index
barWidth = 1
 
# Create red bars
plt.bar(r, bars1, color='red', edgecolor='white')
# Create green bars (middle), on top of the firs ones
plt.bar(r, bars2, bottom=bars1, color='green', edgecolor='white')
# Create blue bars (top)
plt.bar(r, bars3, bottom=bars, color='blue', edgecolor='white')
 
plt.xticks(r, names,rotation=90)
plt.xlabel('States',fontsize=12)
plt.ylabel('Percentage contribution to GDP',fontsize=12)
plt.title('Percentage contribution of the Primary, Secondary and Tertiary sectors as a percentage of the total GDP for all the states')

plt.legend(legends)

plt.tight_layout()


#### Dividing the states in to group based on GDP per capita for the 20th, 50th, 85th and 100th percentile values

In [None]:
gdp_per_capita

In [None]:
# States between the 85th and 100th percentile

C1 = gdp_per_capita[gdp_per_capita['GDP per capita'] > gdp_per_capita['GDP per capita'].quantile(0.85)]
C1

In [None]:
# States between the 50th and 85th percentile

C2 = gdp_per_capita[(gdp_per_capita['GDP per capita'] > gdp_per_capita['GDP per capita'].quantile(0.50)) & (gdp_per_capita['GDP per capita'] < gdp_per_capita['GDP per capita'].quantile(0.85))]
C2

In [None]:
# States between the 20th and 50th percentile

C3 = gdp_per_capita[(gdp_per_capita['GDP per capita'] > gdp_per_capita['GDP per capita'].quantile(0.20)) & (gdp_per_capita['GDP per capita'] <= gdp_per_capita['GDP per capita'].quantile(0.50))]
C3

In [None]:
# States below the 20th percentile

C4 = gdp_per_capita[gdp_per_capita['GDP per capita'] < gdp_per_capita['GDP per capita'].quantile(0.20)]
C4

### Creating dataframe for C1, C2, C3 and C4 states

In [None]:
C1_df = df_final[['S.No.','Item']+list(states for states in C1.index)]
C2_df = df_final[['S.No.','Item']+list(states for states in C2.index)]
C3_df = df_final[['S.No.','Item']+list(states for states in C3.index)]
C4_df = df_final[['S.No.','Item']+list(states for states in C4.index)]

In [None]:
C1_df = C1_df.iloc[[0,5,7,8,9,11,14,22,23,24,25,30,32]]
C2_df = C2_df.iloc[[0,5,7,8,9,11,14,22,23,24,25,30,32]]
C3_df = C3_df.iloc[[0,5,7,8,9,11,14,22,23,24,25,30,32]]
C4_df = C4_df.iloc[[0,5,7,8,9,11,14,22,23,24,25,30,32]]

In [None]:
C1_df.reset_index(drop=True, inplace=True)
C2_df.reset_index(drop=True, inplace=True)
C3_df.reset_index(drop=True, inplace=True)
C4_df.reset_index(drop=True, inplace=True)

In [None]:
C1_df

In [None]:
# Creating the column for Total values for all sub-sectors for all the states and the column for the percentage contribution
# to the total GSDP by each of the sub-sectors for all the states

C1_df['Total for all states'] = C1_df['Kerala']+C1_df['Haryana']+C1_df['Sikkim']+C1_df['Goa']
C1_df['Percentage of Total GDP'] = C1_df['Total for all states']/C1_df['Total for all states'][11] * 100
C1_df

In [None]:
# Identifying the major sub-sectors contributing more to the GSDP  by finding the cumulative sum

C1_contributor = C1_df[['Item','Percentage of Total GDP']][:-2].sort_values(by='Percentage of Total GDP', ascending=False)
C1_contributor.reset_index(drop=True, inplace=True)
C1_contributor['Cumulative sum'] = C1_contributor['Percentage of Total GDP'].cumsum()
C1_contributor

In [None]:
plt.figure(figsize=(6,4), dpi=600)
sns.barplot(y=C1_contributor['Item'], x = C1_contributor['Percentage of Total GDP'], palette='inferno')
plt.xlabel("Percentage of Total GSDP for C1 States")
plt.ylabel('Sub-sectors')
plt.title('Percentage of Total GSDP for C1 States vs Sub-sectors')
plt.savefig("Percentage of Total GSDP for C1 States vs Sub-sectors.png", bbox_inches='tight', dpi=600)

plt.show()

#### C1 States:
1. We can see that for C1 states subsectors like Real Estate, Agriculture, Trade and Hotels, Manufacturing contribute evenly with very high contribution for each category which leads to the overall increase in the GDP for C1 States.
2. Construction also contributes substantially to the total GDP for C1 states as these states have rapid urbanization taking place which leads to increase in overall GDP.

In [None]:
C2_df['Total for all states']=list(C2_df[list(states for states in C2_df.columns)[2:]].sum(axis=1))
C2_df['Percentage of Total GDP'] = C2_df['Total for all states']/C2_df['Total for all states'][11] * 100
C2_contributor = C2_df[['Item','Percentage of Total GDP']][:-2].sort_values(by='Percentage of Total GDP', ascending=False)
C2_contributor.reset_index(drop=True, inplace=True)
C2_contributor['Cumulative sum'] = C2_contributor['Percentage of Total GDP'].cumsum()
C2_contributor

In [None]:
plt.figure(figsize=(6,4), dpi=600)
sns.barplot(y=C2_contributor['Item'], x = C2_contributor['Percentage of Total GDP'],palette='hot')
plt.xlabel("Percentage of Total GSDP for C2 States")
plt.ylabel('Sub-sectors')
plt.title('Percentage of Total GSDP for C2 States vs Sub-sectors')
plt.show()

#### C2 States:
1. For C2 states Manufacturing leads in terms of overall contribution to GDP which comes a no surprise as states like Gujarat, Karnataka, Tamil Nadu and Maharashtra are considered to be manufacturing hubs of India which huge investments in hte field of Automobiles and other tech industries are taking place in these states.
2. Real Estate and Professional services also contribute substantially to the total GDP for C2 states as these states have rapid urbanization taking place and people are moving to these states from villages in search of jobs and better livelihood.
3. Agriculture forms the backbone of India's GDP so it is obvious that it finds a place in the top 3 sub-sectors for C2 states as well but since rapid urbanization may be leading to less land available for agricultural purpose it contributes fairly less when compared to the top 2 sub-sectors.

In [None]:
C3_df['Total for all states']=list(C3_df[list(states for states in C3_df.columns)[2:]].sum(axis=1))
C3_df['Percentage of Total GDP'] = C3_df['Total for all states']/C3_df['Total for all states'][11] * 100
C3_contributor = C3_df[['Item','Percentage of Total GDP']][:-2].sort_values(by='Percentage of Total GDP', ascending=False)
C3_contributor.reset_index(drop=True, inplace=True)
C3_contributor['Cumulative sum'] = C3_contributor['Percentage of Total GDP'].cumsum()
C3_contributor

In [None]:
plt.figure(figsize=(6,4), dpi=600)
sns.barplot(y=C3_contributor['Item'], x = C3_contributor['Percentage of Total GDP'], palette='autumn')
plt.xlabel("Percentage of Total GSDP for C3 States")
plt.ylabel('Sub-sectors')
plt.title('Percentage of Total GSDP for C3 States vs Sub-sectors')

plt.show()

#### C3 States:
1. C3 states like Andhra Pradesh, Odisha, Meghalaya, Chattisgarh, Mizoram have highly arable land and receive good amount of rain every year during the monsoon so it is obvious that Agriculture is the sub-sector that contributes more than 23% to these states.
2. Manufacturing is at a distant second place contributing about 12% to the overall GDP followed by Trade, Hotels and restraunts as these states are home to some of the top tourist attractions in India.
3. Slowly but steadily these states are experiencing increase in urbanization and hence Real Estate and Construction feature in the top 5 contributors as well.

In [None]:
C4_df['Total for all states']=list(C4_df[list(states for states in C4_df.columns)[2:]].sum(axis=1))
C4_df['Percentage of Total GDP'] = C4_df['Total for all states']/C4_df['Total for all states'][11] * 100
C4_contributor = C4_df[['Item','Percentage of Total GDP']][:-2].sort_values(by='Percentage of Total GDP', ascending=False)
C4_contributor.reset_index(drop=True, inplace=True)
C4_contributor['Cumulative sum'] = C4_contributor['Percentage of Total GDP'].cumsum()
C4_contributor

In [None]:
plt.figure(figsize=(6,4), dpi=600)
sns.barplot(y=C4_contributor['Item'], x = C4_contributor['Percentage of Total GDP'], palette='spring')
plt.xlabel("Percentage of Total GSDP for C4 States")
plt.ylabel('Sub-sectors')
plt.title('Percentage of Total GSDP for C4 States vs Sub-sectors')

plt.show()

#### C4 States:
1. C4 states like Bihar, Jharkhand, Uttar Pradesh have low literacy rate and huge population(U.P. is the most populous state in India) and thus agriculture features at the top again.
2. UP is one of the top tourist attracting states as it is home to some of the most amazing places like Agra which has the Taj Mahal, Varansi (Regarded as the spiritual capital of India, the city draws Hindu pilgrims who bathe in the Ganges River’s sacred waters and perform funeral rites. Along the city's winding streets are some 2,000 temples, including Kashi Vishwanath, the “Golden Temple,” dedicated to the Hindu god Shiva.), Jim Corbett National Park, India's oldest national park, opened in 1936, with a Bengal tiger reserve, visitor centre & safaris.

## C1, C2, C3 and C4 states:
1. The major sub sectors contributing to the economy of the states are:
    * Agriculture, Real Estate, Manufacturing, Trade Hotels and restraunts and Construction.
    * One key observation is that for C1,C2 states the major contribution comes from Real Estate which is reasonable as these states have a big real estate and housing industry due to people migrating from villages to these states for employment.
    * Agriculture forms the back-bone of the Indian economy and hence it features in the top 3 spot for all the category of states.
    * India is home to some of the top hotels, restraunts and tourist destinations and hence these contribute significantly to the economy as well.
    * Slowly but steadily India is working to increase its manufacturing capabilities and new companies are opening their factories in India which is the reason for manufacting appearing as a top contributor as well. The 'Make in India' initiative by PM. Modi is also helping to increase manufacturing activities in India.
    * For any country to improve the standard of living of its people, it requires good quality infrastructure. India is experincing rapid urbanization with several new roads, bridges, ports etc. being constructed to aid in increasing the GDP of India which is growing very fast and people need good quality jobs. Construction provides jobs to several people and leads to people having better livelihood. 
    
2. Sub-sectors for which states should pay invest more or pay greater attention:
    * Improving Road, Railways, Air transpotation services will not only help in easier access for people to each and every nook and corner of India but also aid in transportation of goods and materials required for construction purposes. So all the catergories of states should improve Transportation seervices.
    * C1 States:
        * All the 4 states in C1 categories are top tourist destination so they should invest more in:
            1. Trade, repair, hotels and restaurants.
            2. Transport, storage, communication & services for easier access to tourist destination.
            3. Being high GDP per capita states, they should focus on improving the Financial Services and also invest more in public administration.
    * C2 States:
        * C2 states comprises of some of the powerhouse states in India which contribute immensely to India's overall GDP:
            1. States like Karnataka, Maharashtra, Tamil Nadu and Telengana are manufacturing hubs of India and thus should invest even more in the Manufacturing sector.
            2. Same could also be said for Real estate indudtry as well.
            3. Construction, Transportation and other services should also be looked at to improve the overall GDP.
    * C3 States:
        * C3 states should focus on:
            1. Manufacturing Sector provinding easier access to lands to industries.
            2. Mining and quarrying should also be considered as these states have large deposits of natural resources.
            3. States in the C3 categories like Odisha, Mizoram, Nagaland, Tripura and Meghalaya are big tourist destinations so they should invest more in Trade, repair, hotels and restaurants and Transport, storage, communication & services.
    * C4 States:
        * C4 states should focus on:
            1. States like UP and Bihar should focus on investing in Public administration,Transport, storage, communication & services as they have a very large population and some people still live in remote places with mo direct access to the major cities .
            2. Invest more in Trade, repair, hotels and restaurants to boost tourism.
            3. Construction is also area where more investment is needed and not to forget Transport, storage, communication & services.   

# Part-II: GDP and Education Dropout Rates

In [None]:
# Reading the data and selecting the data for the year 2014-14 and the education level for Primary, Upper Primary and Secondary

data2 = pd.read_csv('/kaggle/input/droupout rate.csv')
data2 = data2[['Level of Education - State','Primary - 2014-2015.1','Upper Primary - 2014-2015','Secondary - 2014-2015']]
data2

In [None]:
# Dropping rows of data which we don not need like Union Territories and for which we don't have GDP per-capita available like West Bengal

data2 =  data2.drop([0,5,7,8,9,14,18,26,35,36])
data2 = data2.reset_index(drop = True)
data2=data2.rename(columns={'Level of Education - State': 'State'})

In [None]:
# Necessary transformation like resetting the index and renaming the column name for merging with another dataframe

states_gdp_per_capita = gdp_per_capita.reset_index()
states_gdp_per_capita=states_gdp_per_capita.rename(columns={'index':'State'})

In [None]:
# Merging the above dataframe with the GDP per-capita dataframe

data2_final = pd.merge(data2,states_gdp_per_capita,how='left',on=['State'])

In [None]:
data2_final = data2_final.rename(columns={'State':'Level of education - State'})

In [None]:
# Final dataframe having the education level dropout rates for all the states and the GDP per capita

data2_final

In [None]:
data2_final.describe()

#### Observation:
1. We can see that the mean Drop out rate for Primary and Upper primary are comparable at approximately 5 and 4.5 % whereas the mean Drop out rate for Secondary is extremely large at 17.8%.
2. The minimum Drop out rate for Secondary is also high at 6%.
3. This means greater number of students are more likely to continue their Primary and Upper primary education but not Secondary education.

In [None]:
# Primary - 2014-2015.1

plt.figure(figsize=(8,6), dpi= 600)

sns.regplot(y=data2_final['GDP per capita'],x=data2_final['Primary - 2014-2015.1'])
plt.xlabel('Primary Drop out rate')
plt.ylabel('Per capita GDP')
plt.title('Per capita GDP vs Primary Drop out rate')
plt.show()

#### We can observe an almost  linear relationship between GDP percapita and Primary Dropout rate for the year 2014-15.

In [None]:
# Upper Primary - 2014-2015

plt.figure(figsize=(8,6), dpi= 600)

sns.regplot(y=data2_final['GDP per capita'],x=data2_final['Upper Primary - 2014-2015'])
plt.xlabel('Upper Primary Drop out rate')
plt.ylabel('Per capita GDP')
plt.title('Per capita GDP vs Upper Primary Drop out rate')
plt.show()

#### We can observe a linear relationship between GDP percapita and Upper Primary Dropout rate for the year 2014-15.

In [None]:
# Secondary - 2014-2015

plt.figure(figsize=(8,6), dpi= 100)

sns.regplot(y=data2_final['GDP per capita'],x=data2_final['Secondary - 2014-2015'])
plt.xlabel('Secondary Drop out rate')
plt.ylabel('Per capita GDP')
plt.title('Per capita GDP vs Secondary Drop out rate')
plt.show()

#### We can observe a linear relationship between GDP percapita and Secondary Dropout rate for the year 2014-15 with some outliers.

## Hypothesis
It is evident that education level dropout rate has a direct correlation with GDP per capita. This is obvious as there are less number of skilled worker the quality of jobs available to them is less and hence they earn less when compared to their graduate counterparts. The states should investigate why the Secondary education dropout level is high and find a solution to this problem. Normally there are a lot of programs which focus on Primary and Upper education in India so students are less likely ot drop out from these levels. 