# Assignment 4

Before working on this assignment please read these instructions fully. In the submission area, you will notice that you can click the link to **Preview the Grading** for each step of the assignment. This is the criteria that will be used for peer grading. Please familiarize yourself with the criteria before beginning the assignment.

This assignment requires that you to find **at least** two datasets on the web which are related, and that you visualize these datasets to answer a question with the broad topic of **sports or athletics** (see below) for the region of **Lucknow, Uttar Pradesh, India**, or **India** more broadly.

You can merge these datasets with data from different regions if you like! For instance, you might want to compare **Lucknow, Uttar Pradesh, India** to Ann Arbor, USA. In that case at least one source file must be about **Lucknow, Uttar Pradesh, India**.

You are welcome to choose datasets at your discretion, but keep in mind **they will be shared with your peers**, so choose appropriate datasets. Sensitive, confidential, illicit, and proprietary materials are not good choices for datasets for this assignment. You are welcome to upload datasets of your own as well, and link to them using a third party repository such as github, bitbucket, pastebin, etc. Please be aware of the Coursera terms of service with respect to intellectual property.

Also, you are welcome to preserve data in its original language, but for the purposes of grading you should provide english translations. You are welcome to provide multiple visuals in different languages if you would like!

As this assignment is for the whole course, you must incorporate principles discussed in the first week, such as having as high data-ink ratio (Tufte) and aligning with Cairo’s principles of truth, beauty, function, and insight.

Here are the assignment instructions:

 * State the region and the domain category that your data sets are about (e.g., **Lucknow, Uttar Pradesh, India** and **sports or athletics**).
 * You must state a question about the domain category and region that you identified as being interesting.
 * You must provide at least two links to available datasets. These could be links to files such as CSV or Excel files, or links to websites which might have data in tabular form, such as Wikipedia pages.
 * You must upload an image which addresses the research question you stated. In addition to addressing the question, this visual should follow Cairo's principles of truthfulness, functionality, beauty, and insightfulness.
 * You must contribute a short (1-2 paragraph) written justification of how your visualization addresses your stated research question.

What do we mean by **sports or athletics**?  For this category we are interested in sporting events or athletics broadly, please feel free to creatively interpret the category when building your research question!

## Tips
* Wikipedia is an excellent source of data, and I strongly encourage you to explore it for new data sources.
* Many governments run open data initiatives at the city, region, and country levels, and these are wonderful resources for localized data sources.
* Several international agencies, such as the [United Nations](http://data.un.org/), the [World Bank](http://data.worldbank.org/), the [Global Open Data Index](http://index.okfn.org/place/) are other great places to look for data.
* This assignment requires you to convert and clean datafiles. Check out the discussion forums for tips on how to do this from various sources, and share your successes with your fellow students!

## Example
Looking for an example? Here's what our course assistant put together for the **Ann Arbor, MI, USA** area using **sports and athletics** as the topic. [Example Solution File](./readonly/Assignment4_example.pdf)

## Location: India
## Domain Category: India's Population and Growth Rate

<ul> 
    <li>I will be showing the changes in population of each states of India from 1951 to 2011 and try to get basic insights on the like the most populated or highest growth rate etc.</li>
    <li>I will also be comparing the India's population and growth rate with other countries using another dataset and understanding some trends.</li>
</ul>


<p> Population of Indian state's by Years <a href = 'https://en.wikipedia.org/wiki/List_of_states_in_India_by_past_population'> WIKI LINK </a><br>
</p>

In [109]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as animation

In [85]:
pop = pd.read_excel('population.xlsx')
pop = pop.drop(['Rank'], axis = 1).dropna()
pop = pop.set_index('State or union territory').drop('India')
pop

Unnamed: 0_level_0,Population (1951 Census)[11],Population (1961 Census)[11],Population (1971 Census)[11],Population (1981 Census)[11],Population (1991 Census)[11],Population (2001 Census)[11],Population (2011 Census)[11]
State or union territory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Uttar Pradesh,60274800.0,70144160.0,83849775.0,105113300.0,132062800.0,166053600.0,199581477
Maharashtra,32002500.0,39554900.0,50412240.0,62782820.0,78937190.0,96752500.0,112372972
Bihar,29085900.0,34841490.0,42126800.0,52303000.0,64531200.0,82879910.0,103804630
West Bengal,26300670.0,34926000.0,44312017.0,54580650.0,68077970.0,80221300.0,91347736
Madhya Pradesh,18615700.0,23218950.0,30017180.0,38169500.0,48566800.0,60385090.0,72597565
Tamil Nadu,30119680.0,33687100.0,41199170.0,48408080.0,55859300.0,62111390.0,72138958
Rajasthan,15971130.0,20156540.0,25765810.0,34361860.0,44005990.0,56473300.0,68621012
Karnataka,19402500.0,23587910.0,29299015.0,37135710.0,44977200.0,52734986.0,61130704
Gujarat,16263700.0,20633305.0,26697488.0,34085800.0,41309580.0,50597200.0,60383628
Andhra Pradesh,31115000.0,35983480.0,43502710.0,53551030.0,66508170.0,75728400.0,49665533


In [190]:
%matplotlib notebook
fig, ax = plt.subplots(figsize = (10,8))

def update(curr):
    
    
    if curr == 6:
        a.event_source.stop()
        
    ax.cla()
    index = np.arange(len(pop))
    ax.bar(index, pop.iloc[:,curr].values)
    ax.set_xlabel('States', fontsize=10)
    ax.set_ylabel('Population', fontsize=10)
    plt.xticks(index, pop.index, fontsize=10, rotation=90)
    ax.set_title('Population in Indian states from 1950 to 2011 {}'.format(curr))
    plt.tight_layout()
    plt.show()


a = animation.FuncAnimation(fig, update,interval = 500, repeat = True)



<IPython.core.display.Javascript object>


<ul>
    <li>From above bar graph animation it is visible that Uttar Pradesh, Maharashtra and Bihar remains the highest populated states throught the time period of 1950 to 2011 </li>
    
</ul>

## Now we will see the growth rate of India and compare it with other top 6 most populated countries.


In [226]:
gr = pd.read_excel('growth rate.xlsx').set_index('Country')
gr.head(10)

Unnamed: 0_level_0,1985,1990,2000,2005,2010
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Norway,0.3,0.4,0.6,0.6,1.1
Australia,1.4,1.6,1.2,1.3,1.8
Switzerland,0.5,0.7,0.4,0.7,1.1
Netherlands,0.5,0.6,0.6,0.6,0.4
United States,1.0,1.0,1.2,0.9,0.9
Germany,-0.1,0.4,0.1,0.1,-0.2
New Zealand,0.8,0.8,1.0,1.4,1.1
Canada,1.1,1.4,0.9,1.0,1.1
Singapore,2.3,2.2,2.4,2.7,2.4
Denmark,0.0,0.1,0.4,0.3,0.5


In [227]:
c_name = ['China', 'India','United States','Indonesia', 'Brazil', 'Pakistan']

In [228]:
gr_6 = gr.loc[c_name]
gr_6

Unnamed: 0_level_0,1985,1990,2000,2005,2010
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
China,1.5,1.9,0.7,0.6,0.6
India,2.2,2.1,1.7,1.6,1.3
United States,1.0,1.0,1.2,0.9,0.9
Indonesia,2.2,1.9,1.5,1.4,1.4
Brazil,2.2,1.9,1.5,1.3,1.0
Pakistan,3.4,3.2,2.5,1.9,1.8


In [232]:
fig, ax = plt.subplots(figsize = (8,6))
s = len(gr_6)
x = [1985,1999,2000,2005,2010]
for i in range(s):
    plt.plot(x, gr_6.loc[c_name[i]].values, label = c_name[i])
    plt.legend()
    plt.xlabel('Years')
    plt.ylabel('% Growth')
    plt.title('')

plt.show()
plt.savefig('growthrate.png')

<IPython.core.display.Javascript object>

<ul>
    <li> The genral trend of growth rate is decresing in every country.</li>
    <li>Even though many believes that the world population is rising day by day due to increase of more births and growth rate but that is contradictory to what is visible in graph.  </li>
    <li> The main reason of population explosion is due to medical advancement and increase in quality of basic livelihood which led to high life expectancy in most of the country through out the world.
    <li>One of the import trend to note is that the sharpest decline in growth rate in BLUE(China) between 1995 and 2000 which is the result of one child policy in China.</li>
<ul>