Greeting everybody a happy and cheerful time, I'm going to start my first ever data exploratory kernel on this dataset. Kindly pardon and feel free to point out any errors. The main challenge in this dataset is that, **it is so rich and poplulated with varieties of indicators that it is very difficult to figure out the useful indicators** for the aim of our work. However, **big thanks to Mr. Krishna Ravikumar who in his kernel Choosing Topics To Explore...**  made it very easy to choose the correct indicators for this task. In the next few sections of this notebook, I'm going to exactly follow his path to select the correct indicators needed by me for this exploratary analysis.

The notebook in its whole will contain - 
* Choosing the right indicators
* Data Visualizations of those Indicators
* Conclusion

**Choosing The Right Indicators**

First of all lets import our required libraries for work and start a SQLite connection to the main SQLite database.


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3
import os
import warnings
warnings.filterwarnings('ignore')

conn = sqlite3.connect('../input/database.sqlite') #SQLite connection

Now if we give a glance at the dataset we will observe that the database contains information from the year 1960 to 2015. So we can see that every indicator is  presenting its value for each year. So if we just select all the indicator name of the year 1960 we can have all the indicators. That is what we do. **We select only the Indicator Name and its Code for the year 1960 from the rows with CountryCode as IND which is the countrycode for India** and store the values as list.

In [4]:
ind = pd.read_sql(
        """ 
            SELECT   IndicatorName, IndicatorCode 
            FROM     Indicators 
            WHERE    CountryCode = 'IND' AND Year = "1960"
        """, con=conn)
		
indicators = ind.drop_duplicates().values
#print(indicators) Uncomment this line to explicitly see all the indicator name and codes

Now for convenience of our work, we will transform all the indicator names into lower case letters and thereafter change it to a new DataFrame. Thereafter drop any duplicate values if there is any.

In [5]:
modified_indicators = []

for ele in indicators:
	indicator_name = ele[0].strip().lower()
	modified_indicators.append([indicator_name,ele[1]])
	
Indicators = pd.DataFrame(modified_indicators,columns=['IndicatorName','IndicatorCode'])

Indicators = Indicators.drop_duplicates()
print(Indicators.shape)

As we can see there are 216 rows of values only for one year and it is really difficult to visualize all those values so we need to pick up the right indicators. In this analysis we will be needing only trade and economy related indicators so we create the same dictionory as Mr. Krishna Ravikumar did. I changed it slightly as per my need. For user conventions, the whole dictionary is again supplied by me so that one can choose any one element from the dictionary as needed for their use.

In [6]:
key_word_dict = {}
key_word_dict['Demography'] = ['population','birth','death','fertility','mortality','expectancy']
key_word_dict['Food'] = ['food','grain','nutrition','calories']
key_word_dict['Trade'] = ['trade','import','export','good','agriculture','inventories','price','value','capital']
key_word_dict['Health'] = ['health','desease','hospital','mortality','doctor']
key_word_dict['Economy'] = ['income','gdp','gni','deficit','budget','market','stock','bond','infrastructure']
key_word_dict['Energy'] = ['fuel','energy','power','emission','electric','electricity']
key_word_dict['Education'] = ['education','literacy']
key_word_dict['Employment'] =['employed','employment','umemployed','unemployment']
key_word_dict['Rural'] = ['rural','village']
key_word_dict['Urban'] = ['urban','city']

Now since we only need the **Trade and Economy indicators we store all the Indicators, that have the words that are defined in the 'Trade' and 'Economy' element of the Dictionary, in a seperate list**. Thereafter again convert it into a final Dataframe that will be the main source of our Data Visualization.

In [7]:
final_indicator = []

feature = 'Trade'
for indicator_ele in Indicators.values:
    for ele in key_word_dict[feature]:
        word_list = indicator_ele[0].split()
        if ele in word_list or ele+'s' in word_list:
            final_indicator.append([indicator_ele[0],indicator_ele[1]])
            break
			
feature = 'Economy'
for indicator_ele in Indicators.values:
    for ele in key_word_dict[feature]:
        word_list = indicator_ele[0].split()
        if ele in word_list or ele+'s' in word_list:
            final_indicator.append([indicator_ele[0],indicator_ele[1]])
            break						

FinalIndicators = pd.DataFrame(final_indicator,columns=['IndicatorName','IndicatorCode'])	
print(FinalIndicators.shape)
#print(FinalIndicators) Uncomment this to see the 110 indicators 

The row count is now 110.From 216 to 110, so we reduced almost half of the key indicators that are irrelevant to our work. Now even in these 110 rows of data there will be same indicators given in different variations for example the Exports of Goods and Services indicator is given 3 times in three different units of measurement. One can easily choose any one unit of measurement as per their choice and continue their work to reduce further features.
    Now in order to compare the growth of India over China, we need to pick common indicators from China too. We pick the indicators in the same way and then compare the common indicators between the two.

In [8]:
chi = pd.read_sql(
        """ 
            SELECT   IndicatorName, IndicatorCode 
            FROM     Indicators 
            WHERE    CountryCode = 'CHN' AND Year = "1960"
        """, con=conn)

indicators = chi.drop_duplicates().values

for ele in indicators:
	indicator_name = ele[0].strip().lower()
	modified_indicators.append([indicator_name,ele[1]])
	
Indicators = pd.DataFrame(modified_indicators,columns=['IndicatorName','IndicatorCode'])

Indicators = Indicators.drop_duplicates()
#print(Indicators.shape)

final_indicator = []

feature = 'Trade'
for indicator_ele in Indicators.values:
    for ele in key_word_dict[feature]:
        word_list = indicator_ele[0].split()
        if ele in word_list or ele+'s' in word_list:
            final_indicator.append([indicator_ele[0],indicator_ele[1]])
            break
			
feature = 'Economy'
for indicator_ele in Indicators.values:
    for ele in key_word_dict[feature]:
        word_list = indicator_ele[0].split()
        if ele in word_list or ele+'s' in word_list:
            final_indicator.append([indicator_ele[0],indicator_ele[1]])
            break
			
FinalIndicatorschi = pd.DataFrame(final_indicator,columns=['IndicatorName','IndicatorCode'])

print(FinalIndicatorschi.shape)
#print(FinalIndicatorschi.values)

The indicator count came 111 from China almost same as India. Still we compare and make a list of common indicators to work on. The indicator count comes 110 that is all the indicators of India is present in China too.

In [9]:
commonindicators = []

for keys in FinalIndicators.values:
	if keys in FinalIndicatorschi.values:
		commonindicators.append(keys)
		
commonindicatorsdf = pd.DataFrame(commonindicators,columns=['IndicatorName','IndicatorCode'])

print(commonindicatorsdf.shape)
#print(commonindicatorsdf.values) Uncomment this line to view the common Indicators of India and China over trade and economy

**Data Visualizations of the Chosen Indicators**

Finally from the common indicators we isolate the Indicator code of those that we feel are most needed manually. We also omit those indicators that are given again in a different units of measurement. Therefore a final list is created with these indicator codes.

In [10]:
indicatorcodesofwork = ["'NV.AGR.TOTL.ZS'","'NE.GDI.STKB.CD'","'NE.EXP.GNFS.ZS'","'NE.RSB.GNFS.ZS'","'NY.GDP.MKTP.KD'","'NE.GDI.TOTL.ZS'","'NE.GDI.FTOT.ZS'","'NE.IMP.GNFS.ZS'","'NV.IND.TOTL.ZS'","'NV.IND.MANF.ZS'","'TG.VAL.TOTL.GD.ZS'","'NV.SRV.TETC.ZS'","'NE.TRD.GNFS.ZS'","'NY.GDP.MKTP.KN'","'NY.GDP.PCAP.KD'","'NY.GNP.PCAP.CN'","'SM.POP.TOTL.ZS'","'NY.GSR.NFCY.CD'","'NY.GDS.TOTL.ZS'","'NE.DAB.TOTL.ZS'","'NE.CON.PETC.ZS'"]

With these 21 Indicator codes, we plot several plots to visualize the key features one by one. But before plotting we set up a plot function to make the plotting easier and save writing same codes again and again.

In [11]:
def plot_indicator(code):
	indquer = "SELECT Year, Value, IndicatorName FROM  Indicators  WHERE  CountryCode = 'IND' AND IndicatorCode = " + code
	chnquer = "SELECT Year, Value, IndicatorName FROM  Indicators  WHERE  CountryCode = 'CHN' AND IndicatorCode = " + code

	ind = pd.read_sql(indquer, con=conn)
	chn = pd.read_sql(chnquer, con=conn)

	plt.plot(ind['Year'].values,ind['Value'].values,label="India")
	plt.plot(chn['Year'].values,chn['Value'].values,label="China")

	plt.title(ind['IndicatorName'].iloc[0])
	plt.legend(loc=1)
	plt.show()

We hereby plot all the key features.

**Agriculture Value Added (% of GDP)**

This is rather the only Indicator where India takes a lead with respect to China. It shows although both steeply suffered a loss in  agricultural output from the year 1975 but India's fall is comparatively less than that of China.

In [12]:
plot_indicator(indicatorcodesofwork[0])

**Changes in inventories (current US $)**

The difference between the two countries in this indicator starts from the year 1985 considerably. China leads India in this respect with a major difference.

In [13]:
plot_indicator(indicatorcodesofwork[1])

**Exports of Goods and Services (% of GDP)**

China even here leads India by several years in terms of Trade and interestingly **the graph is a bit similar to the previous i.e. Chnages in Inventories and Exports of Goods and Services so there must be some correlations between Exports of Goods and Services** as in both the graphs the major lead of China over India starts in the year 1990 and attains almost similar slopes till 2010 where they both sees a fall.

In [14]:
plot_indicator(indicatorcodesofwork[2])

**External Balance on goods and services (% of GDP)**

In [15]:
plot_indicator(indicatorcodesofwork[3])

**GDP at market prices (constant 2005 US $)**

The major difference after 1990 clearly shows Chinese economy blossomed after 1990 and India considerably didn't do well. Interestingly all these indicators showing a similar result might be because these indicators are co-related to each other. 

In [16]:
plot_indicator(indicatorcodesofwork[4])

**Gross Capital Formation (% of GDP)**

In this aspect China had always led India by a huge difference except as can be seen in year 1962-1963 when it became same to India but India seems to be developing slowly from 2000 till 2012 when it again gained a fall.  

In [17]:
plot_indicator(indicatorcodesofwork[5])

**Gross fixed capital formation (% of GDP)**

We obtained a similar result as we did in the previous indicator except the 1990 steep fall of China.

In [18]:
plot_indicator(indicatorcodesofwork[6])

**Imports of goods and services (% of GDP)**

A very important aspect to compare, the graph shows till 1980 India had a greater imports but after that Imports of China increased till late 2005 when India again took the lead. A similar curve to that of GDP at Market Prices, Exports of Goods and Services (% of GDP) and Changes in inventories.

In [19]:
plot_indicator(indicatorcodesofwork[7])

**Industry Value Added **

Needless to explain the graph has a huge difference with China leading.

In [20]:
plot_indicator(indicatorcodesofwork[8])

**Manufacturing, value added**

A similar graph to the previous indicator.

In [21]:
plot_indicator(indicatorcodesofwork[9])

**Merchandise Trade (% of GDP)**

Again like Goods and Services the clear difference between the two countries pops up from 1980. The curve is very similoar to those of Goods and Services(Imports and Exports) along with GDP at Market Prices and Change in Inventories.

In [22]:
plot_indicator(indicatorcodesofwork[10])

**Services, etc., value added (% of GDP)**

The services levied in India had been considerably high from 1960 till now.

In [23]:
plot_indicator(indicatorcodesofwork[11])

**Trade (% of GDP)**

In terms of Trade (% of GDP) India lags China at least by 10 years as also can be seen in the work of Krishna Ravikumar. I would also like to bring to notice that here again the curve is a lot similar to the others that were pointed out. Clearly shows how all these indicators affect each other very much. Here also at exact 1980-1985 period, India is toppled by China

In [24]:
plot_indicator(indicatorcodesofwork[12])

**GDP (constant LCU)**

In [25]:
plot_indicator(indicatorcodesofwork[13])

**GDP per capita (constant 2005 US $)**

This is again one graph that pinmarks the exact change of economy at 1985 where the two curves representing Chinese and Indian economy intersects and thereby a steep rise of Chinese economy. Although the nature of the curves not so similar like those pointed out previously to be similar, the Indicator can be concluded to be interdependent with those others found co-related.  

In [26]:
plot_indicator(indicatorcodesofwork[14])

**GNI per capita (current LCU)**

In [27]:
plot_indicator(indicatorcodesofwork[15])

**Net income from abroad (current US $)**

A very diverse graph where India is running as per with China. Surely this indicator doesn't matter in determining our aim.

In [29]:
plot_indicator(indicatorcodesofwork[17])

**Gross domestic savings (% of GDP)**

Again another graph where India needs to develop much as China is giving a huge lead from very beginning. 
This graph is very similar to that of Manufacturing, value added and Industry value added.

In [30]:
plot_indicator(indicatorcodesofwork[18])

**Household Final Consumption Expenditure, etc. (% of GDP)**

In [33]:
plot_indicator(indicatorcodesofwork[20])

**Gross National Expenditure (% of GDP)**

In [32]:
plot_indicator(indicatorcodesofwork[19])

**CONCLUSION**

From the above data visulaizations, we can draw the following conclusions :-

* **India lags China in terms of Trade and Economy by many years (at least 10)**.
* The period **1980-1985 sees a massive boost in trade and economy of China** and from then on China begins to advance over India in trade and economy and is still ahead.
* The indicators **Changes in Inventories, Exports of Goods and Services, Imports of Goods and services, GDP at market prices, Merchandise Trade, GDP per Capita and Trade (% of GDP) are very much inter-realted to each other and show a same pattern of growth**.
* The indicators **Manufacturing, value added, Gross domestic savings and Industry, value added are similar in pattern of curve**. 
* India needs to lower its Services which is much higher than that of China.
* **India in terms of Agricultural value added leads China** and hence focus must be given on that to make a better advancement in it. 
* The massive difference between China and India as seen in Manufacturing, value added, Gross domestic savings and Industry, value added must be adjusted.
* The **steep fall seen at the end of the curve of Trade (% of GDP) must be taken care of and all the curves that are pointed to be inter-related has to be treated similarly**. 