# GDP Growth Analysis

Before working on this assignment please read these instructions fully. In the submission area, you will notice that you can click the link to **Preview the Grading** for each step of the assignment. This is the criteria that will be used for peer grading. Please familiarize yourself with the criteria before beginning the assignment.

This assignment requires that you to find **at least** two datasets on the web which are related, and that you visualize these datasets to answer a question with the broad topic of **religious events or traditions** (see below) for the region of **Ahmedabad, Gujarat, India**, or **India** more broadly.

You can merge these datasets with data from different regions if you like! For instance, you might want to compare **Ahmedabad, Gujarat, India** to Ann Arbor, USA. In that case at least one source file must be about **Ahmedabad, Gujarat, India**.

You are welcome to choose datasets at your discretion, but keep in mind **they will be shared with your peers**, so choose appropriate datasets. Sensitive, confidential, illicit, and proprietary materials are not good choices for datasets for this assignment. You are welcome to upload datasets of your own as well, and link to them using a third party repository such as github, bitbucket, pastebin, etc. Please be aware of the Coursera terms of service with respect to intellectual property.

Also, you are welcome to preserve data in its original language, but for the purposes of grading you should provide english translations. You are welcome to provide multiple visuals in different languages if you would like!

As this assignment is for the whole course, you must incorporate principles discussed in the first week, such as having as high data-ink ratio (Tufte) and aligning with Cairo’s principles of truth, beauty, function, and insight.

Here are the assignment instructions:

 * State the region and the domain category that your data sets are about (e.g., **Ahmedabad, Gujarat, India** and **religious events or traditions**).
 * You must state a question about the domain category and region that you identified as being interesting.
 * You must provide at least two links to available datasets. These could be links to files such as CSV or Excel files, or links to websites which might have data in tabular form, such as Wikipedia pages.
 * You must upload an image which addresses the research question you stated. In addition to addressing the question, this visual should follow Cairo's principles of truthfulness, functionality, beauty, and insightfulness.
 * You must contribute a short (1-2 paragraph) written justification of how your visualization addresses your stated research question.

What do we mean by **religious events or traditions**?  For this category you might consider calen

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
%matplotlib notebook

In [17]:
df = pd.read_csv('ab40c054-5031-4376-b52e-9813e776f65e.csv')

In [18]:
df.head()

Unnamed: 0,Items Description,Duration,Andhra Pradesh,Arunachal Pradesh,Assam,Bihar,Chhattisgarh,Goa,Gujarat,Haryana,...,Telangana,Tripura,Uttar Pradesh,Uttarakhand,West Bengal1,Andaman & Nicobar Islands,Chandigarh,Delhi,Puducherry,All_India GDP
0,GSDP - CURRENT PRICES (` in Crore),2011-12,379402.0,11063.0,143175.0,247144.0,158074.0,42367.0,615606.0,297539.0,...,359433.0,19208.0,724049.0,115523.0,,3979.0,18768.0,343767.0,16818.0,8736039.0
1,GSDP - CURRENT PRICES (` in Crore),2012-13,411404.0,12547.0,156864.0,282368.0,177511.0,38120.0,724495.0,347032.0,...,401493.0,21663.0,822903.0,131835.0,,4421.0,21609.0,391238.0,18875.0,9946636.0
2,GSDP - CURRENT PRICES (` in Crore),2013-14,464272.0,14602.0,177745.0,317101.0,206690.0,35921.0,807623.0,400662.0,...,452186.0,25593.0,944146.0,149817.0,,5159.0,24787.0,443783.0,21870.0,11236635.0
3,GSDP - CURRENT PRICES (` in Crore),2014-15,526468.0,16761.0,198098.0,373920.0,234982.0,40633.0,895027.0,437462.0,...,511178.0,29667.0,1043371.0,161985.0,,5721.0,27844.0,492424.0,24089.0,12433749.0
4,GSDP - CURRENT PRICES (` in Crore),2015-16,609934.0,18784.0,224234.0,413503.0,260776.0,45002.0,994316.0,485184.0,...,575631.0,,1153795.0,184091.0,,,30304.0,551963.0,26533.0,13675331.0


In [19]:
df.shape

(11, 36)

In [21]:
df = df[df.Duration != '2016-17']

df = df.set_index('Items  Description')
df

Unnamed: 0_level_0,Duration,Andhra Pradesh,Arunachal Pradesh,Assam,Bihar,Chhattisgarh,Goa,Gujarat,Haryana,Himachal Pradesh,...,Telangana,Tripura,Uttar Pradesh,Uttarakhand,West Bengal1,Andaman & Nicobar Islands,Chandigarh,Delhi,Puducherry,All_India GDP
Items Description,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GSDP - CURRENT PRICES (` in Crore),2011-12,379402.0,11063.0,143175.0,247144.0,158074.0,42367.0,615606.0,297539.0,72720.0,...,359433.0,19208.0,724049.0,115523.0,,3979.0,18768.0,343767.0,16818.0,8736039.0
GSDP - CURRENT PRICES (` in Crore),2012-13,411404.0,12547.0,156864.0,282368.0,177511.0,38120.0,724495.0,347032.0,82820.0,...,401493.0,21663.0,822903.0,131835.0,,4421.0,21609.0,391238.0,18875.0,9946636.0
GSDP - CURRENT PRICES (` in Crore),2013-14,464272.0,14602.0,177745.0,317101.0,206690.0,35921.0,807623.0,400662.0,94764.0,...,452186.0,25593.0,944146.0,149817.0,,5159.0,24787.0,443783.0,21870.0,11236635.0
GSDP - CURRENT PRICES (` in Crore),2014-15,526468.0,16761.0,198098.0,373920.0,234982.0,40633.0,895027.0,437462.0,104369.0,...,511178.0,29667.0,1043371.0,161985.0,,5721.0,27844.0,492424.0,24089.0,12433749.0
GSDP - CURRENT PRICES (` in Crore),2015-16,609934.0,18784.0,224234.0,413503.0,260776.0,45002.0,994316.0,485184.0,,...,575631.0,,1153795.0,184091.0,,,30304.0,551963.0,26533.0,13675331.0
(% Growth over previous year),2012-13,8.43,13.41,9.56,14.25,12.3,-10.02,17.69,16.63,13.89,...,11.7,12.78,13.65,14.12,,11.13,15.14,13.81,12.23,13.86
(% Growth over previous year),2013-14,12.85,16.38,13.31,12.3,16.44,-5.77,11.47,15.45,14.42,...,12.63,18.14,14.73,13.64,,16.68,14.71,13.43,15.87,12.97
(% Growth over previous year),2014-15,13.4,14.79,11.45,17.92,13.69,13.12,10.82,9.18,10.14,...,13.05,15.92,10.51,8.12,,10.89,12.33,10.96,10.14,10.65
(% Growth over previous year),2015-16,15.85,12.07,13.19,10.59,10.98,10.75,11.09,10.91,,...,12.61,,10.58,13.65,,,8.84,12.09,10.15,9.99


In [24]:
# Two part for GSDP value and %Growth Value
df_gsdpcurrent = df.filter(like='GSDP', axis = 0)
df_gsdpgrowth = df.filter(like='Growth', axis = 0)
#Transpose for unpivoting to have states in column
df_gsdpcurrent = df_gsdpcurrent.set_index('Duration').T
df_gsdpgrowth = df_gsdpgrowth.set_index('Duration').T
df_gsdpcurrent.index.name = 'States'
df_gsdpgrowth.index.name = 'States'

df_gsdpcurrent = df_gsdpcurrent.add_prefix('GSDP_')
df_gsdpgrowth = df_gsdpgrowth.add_prefix('Percentage Growth')

df_gsdpcurrent.head(5)

Duration,GSDP_2011-12,GSDP_2012-13,GSDP_2013-14,GSDP_2014-15,GSDP_2015-16
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Andhra Pradesh,379402.0,411404.0,464272.0,526468.0,609934.0
Arunachal Pradesh,11063.0,12547.0,14602.0,16761.0,18784.0
Assam,143175.0,156864.0,177745.0,198098.0,224234.0
Bihar,247144.0,282368.0,317101.0,373920.0,413503.0
Chhattisgarh,158074.0,177511.0,206690.0,234982.0,260776.0


In [26]:

df_gsdpgrowth.head()

Duration,Percentage Growth2012-13,Percentage Growth2013-14,Percentage Growth2014-15,Percentage Growth2015-16
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Andhra Pradesh,8.43,12.85,13.4,15.85
Arunachal Pradesh,13.41,16.38,14.79,12.07
Assam,9.56,13.31,11.45,13.19
Bihar,14.25,12.3,17.92,10.59
Chhattisgarh,12.3,16.44,13.69,10.98


In [27]:
df_gsdpgrowth['Average Growth Percentage'] = df_gsdpgrowth.mean(axis=1)
df_gsdpgrowth=df_gsdpgrowth.sort_values(by='Average Growth Percentage', ascending = False)
df_gsdpgrowth=df_gsdpgrowth.round({'Average Growth Percentage': 2})
df_gsdpgrowth

Duration,Percentage Growth2012-13,Percentage Growth2013-14,Percentage Growth2014-15,Percentage Growth2015-16,Average Growth Percentage
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mizoram,15.2,23.1,12.3,,16.87
Nagaland,15.03,21.98,10.85,,15.95
Tripura,12.78,18.14,15.92,,15.61
Madhya Pradesh,20.71,14.91,10.11,12.86,14.65
Karnataka,14.56,18.24,12.7,11.42,14.23
Arunachal Pradesh,13.41,16.38,14.79,12.07,14.16
Bihar,14.25,12.3,17.92,10.59,13.76
Chhattisgarh,12.3,16.44,13.69,10.98,13.35
Haryana,16.63,15.45,9.18,10.91,13.04
Andaman & Nicobar Islands,11.13,16.68,10.89,,12.9


In [29]:
df_gsdpgrowth_avg=df_gsdpgrowth.filter(like='Average', axis=1)

df_gsdpgrowth_avg

Duration,Average Growth Percentage
States,Unnamed: 1_level_1
Mizoram,16.87
Nagaland,15.95
Tripura,15.61
Madhya Pradesh,14.65
Karnataka,14.23
Arunachal Pradesh,14.16
Bihar,13.76
Chhattisgarh,13.35
Haryana,13.04
Andaman & Nicobar Islands,12.9


In [30]:
import seaborn as sns

In [31]:
plt.figure(figsize=(15,10))
plot_gsdp_meangrowth = sns.barplot(x=df_gsdpgrowth['Average Growth Percentage'], y=df_gsdpgrowth.index, data=df_gsdpgrowth)
plt.xlabel("Average Growth Percentage")
plt.ylabel("States")
plt.title("Average Growth Rates of States over 2013 to 2016")
plt.show()

<IPython.core.display.Javascript object>

In [32]:
df_gsdpgrowth[['Average Growth Percentage']].head()

Duration,Average Growth Percentage
States,Unnamed: 1_level_1
Mizoram,16.87
Nagaland,15.95
Tripura,15.61
Madhya Pradesh,14.65
Karnataka,14.23


In [33]:
df_gsdpgrowth[['Average Growth Percentage']].tail()

Duration,Average Growth Percentage
States,Unnamed: 1_level_1
Odisha,10.71
Sikkim,10.49
Meghalaya,7.67
Goa,2.02
West Bengal1,


In [37]:
#creating a new dataframe with relevant values
df_totalgdp15_16 = df_gsdpcurrent.filter(items=['GSDP_2015-16'], axis=1)

#sorting based on GDP values
df_totalgdp15_16 = df_totalgdp15_16.sort_values(by='GSDP_2015-16', ascending = False)

#dropping rows with null values and all india GDP value from dataframe
df_totalgdp15_16 = df_totalgdp15_16.dropna()
df_totalgdp15_16 = df_totalgdp15_16.drop('All_India GDP', axis=0)
df_totalgdp15_16

Duration,GSDP_2015-16
States,Unnamed: 1_level_1
Tamil Nadu,1212668.0
Uttar Pradesh,1153795.0
Karnataka,1027068.0
Gujarat,994316.0
Andhra Pradesh,609934.0
Kerala,588337.0
Telangana,575631.0
Delhi,551963.0
Madhya Pradesh,543975.0
Haryana,485184.0


In [38]:
plt.figure(figsize=(15,10))
plot_totalgsdp = sns.barplot(x=df_totalgdp15_16['GSDP_2015-16'], y=df_totalgdp15_16.index, data=df_totalgdp15_16)
plt.xlabel("Total GDP of States")
plt.ylabel("States")
plt.title("GSDP for all States in 2015-2016")
plt.show()

<IPython.core.display.Javascript object>