**Indian Trade Data**


* In this notebook I have tried to understand what are the products in which India makes profit on making Import and Export.
* I have also tried understand the time series pattern and tried to understand the reason behind it

**Loading the libraries**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns #statistical data visualization
import matplotlib.pyplot as plt #visualization library
from statsmodels.graphics.tsaplots import plot_acf #Auto-Correlation Plots
from statsmodels.graphics.tsaplots import plot_pacf #Partial-Auto Correlation Plots

**Reading the data**

In [None]:
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
import_df = pd.read_csv("/kaggle/input/india-trade-data/2018-2010_import.csv")
export_df = pd.read_csv("/kaggle/input/india-trade-data/2018-2010_export.csv")

**Data Exploration**

In [None]:
import_df.head()

In [None]:
export_df.head()

**Cleaning the Dataset**

Checking whether the dataset has any missing values and dropping those values

In [None]:
import_df.isnull().sum()

In [None]:
import_df =import_df.dropna()
import_df = import_df.reset_index(drop=True)

In [None]:
export_df.isnull().sum()

In [None]:
export_df = export_df.dropna()
export_df = export_df.reset_index(drop=True)

**Analysis on India's Import and Export**

Understanding India's **import vs export ** across various years in different countries

An **import** is a good brought into a jurisdiction, especially across a national border, from an external source.

An **export** is a function of international trade whereby goods produced in one country are shipped to another country for future sale or trade.

**Import** is a place where we spend money and **Export** is a place where we gain profit

 A country importing more than it’s export, runs a **trade deficit**. 
 
 Loss/Profit = Amount gained in Export - Amount spend in Import


In [None]:
importing_countries=import_df[['country']].nunique()
exporting_countries=export_df[['country']].nunique()
print("India imports from:",importing_countries,"countries")
print("India exports to:",exporting_countries,"countries")

In [None]:
import_group=import_df.groupby(['country','year']).agg({'value':'sum'})
export_group=export_df.groupby(['country','year']).agg({'value':'sum'})

In [None]:
export_group.groupby(['country'])
import_temp=import_group.groupby(['country']).agg({'value':'sum'})
export_temp=export_group.groupby(['country']).agg({'value':'sum'}).loc[import_temp.index.values]

In [None]:
data_1=import_group.groupby(['country']).agg({'value':'sum'}).sort_values(by='value').tail(10)
data_2=export_temp
data_3=data_2-data_1

In [None]:
data_1.columns=['Import']
data_2.columns=['Export']
data_3.columns=['Loss / Profit']

In [None]:
df=pd.DataFrame(index=data_1.index.values)
#df=pd.concat([data_1,data_2,data_3])
df['Import']=data_1
df['Export']=data_2
df['Loss / Profit']=data_3

In [None]:
df

**Observation**
This graphs shows how much money does India makes after calculating the cost of **import-export** of top 10 counties in which India makes high imports

In [None]:
fig, ax = plt.subplots(figsize=(15,7))
df.plot(kind='bar',ax=ax)
ax.set_xlabel('Countries')
ax.set_ylabel('Value of transactions (in million US$)')

Of these 10 countries,                
                * In USA, India makes a profit(more export than import)
                * All the other countries India ends up with loss                 

Countrwise Importing and Exporting

Top Country's contirbuting towards India's Import

In [None]:
df_import = import_df.groupby('country').agg({'value':'sum'}).sort_values(by='value', ascending = False).head(10)
df_import.plot(kind='bar')


Top Country's contributing to  Export

In [None]:
df_export = export_df.groupby('country').agg({'value':'sum'}).sort_values(by='value', ascending = False).head(10)


df_export.plot(kind='bar')

**HSCode**
HS stands for Harmonized System. It was developed by the WCO (World Customs Organization) as a multipurpose international product nomenclature that describes the type of good that is shipped. Today, customs officers must use HS code to clear every commodity that enters or crosses any international borders.

**The HS code can be described as follows:**
       * It is a six-digit identification code.
       * It has 5000 commodity groups.
       * Those groups have 99 chapters.
       * Those chapters have 21 sections.
       * It’s arranged in a legal and logical structure.
       * Well-defined rules support it to realize uniform classification worldwide

**Creating a Macro level View for commodities**

Since there are total 99 chapters and 5000 commodity it will be difficult for us to understand which commodities we are importing and exporting more.

So to over this I am creating a new dataframe on **Sections of HSCode** which will more clear view(Macro View) about the areas of Import and Export 

I have formed the dataframe based on information obtained from http://www.cybex.in/HS-Codes/Default.aspx and https://www.dgft.org/itc_hs_code.html


In [None]:
HSCode=pd.DataFrame()
HSCode['Start']=[1,6,15,16,25,28,39,41,44,47,50,64,68,71,72,84,86,90,93,94,97]
HSCode['End']=[5,14,15,24,27,38,40,43,46,49,63,67,70,71,83,85,89,92,93,96,98]
HSCode['Sections']=['Animals & Animal Products',
'Vegetable Products',
'Animal Or Vegetable Fats',
'Prepared Foodstuffs',
'Mineral Products',
'Chemical Products',
'Plastics & Rubber',
'Hides & Skins',
'Wood & Wood Products',
'Wood Pulp Products',
'Textiles & Textile Articles',
'Footwear, Headgear',
'Articles Of Stone, Plaster, Cement, Asbestos',
'Pearls, Precious Or Semi-Precious Stones, Metals',
'Base Metals & Articles Thereof',
'Machinery & Mechanical Appliances',
'Transportation Equipment',
'Instruments - Measuring, Musical',
'Arms & Ammunition',
'Miscellaneous',
'Works Of Art',]

In [None]:
HSCode

Getting the Top 10 sections where most in which import is made and then finding the Profit/Loss made out of it

In [None]:
import_df['Sections']=import_df["HSCode"]
export_df['Sections']=export_df["HSCode"]
for i in range(0,len(HSCode)):
    import_df.loc[(import_df["Sections"] >= HSCode['Start'][i]) & (import_df["Sections"] <= HSCode['End'][i]),"Sections"]=i
    export_df.loc[(export_df["Sections"] >= HSCode['Start'][i]) & (export_df["Sections"] <= HSCode['End'][i]),"Sections"]=i
    

In [None]:
import_group=import_df.groupby(['Sections','year']).agg({'value':'sum'})
export_group=export_df.groupby(['Sections','year']).agg({'value':'sum'})

In [None]:
import_temp=import_group.groupby(['Sections']).agg({'value':'sum'})
export_temp=export_group.groupby(['Sections']).agg({'value':'sum'}).loc[import_temp.index.values]

In [None]:
data_1=import_group.groupby(['Sections']).agg({'value':'sum'}).sort_values(by='value').tail(10)
data_2=export_temp
data_3=data_2-data_1
data_1.columns=['Import']
data_2.columns=['Export']
data_3.columns=['Loss / Profit']
df=pd.DataFrame(index=data_1.index.values)
#df=pd.concat([data_1,data_2,data_3])
df['Import']=data_1
df['Export']=data_2
df['Loss / Profit']=data_3

In [None]:
HSCode['Sections'][data_1.index.values]

In [None]:
df.index=HSCode['Sections'][data_1.index.values]
fig, ax = plt.subplots(figsize=(15,7))
df.plot(kind='bar',ax=ax)
ax.set_xlabel('Sections')
ax.set_ylabel('Value of transactions (in million US$)')

From the top 10 importing sections,
            * India is making profit in Vegetable Products and Transportation Equipements
            * In the remaining sections from the list India makes loss over here

Top Sections India Imports

In [None]:
data_1.index=HSCode['Sections'][data_1.index.values]
data_1.plot(kind='bar')

Top 10 Sections India Exports

In [None]:
data_2=export_group.groupby(['Sections']).agg({'value':'sum'}).sort_values(by='value').tail(10)
data_2.index=HSCode['Sections'][data_2.index.values]
data_2.plot(kind='bar')

**Year wise Trend of Each Categories**

In [None]:
Import_ =import_df.groupby(['year']).agg({'value':'sum'})
Export_ =export_df.groupby(['year']).agg({'value':'sum'})
Deficit_=Export_ -Import_
Time_Series=pd.DataFrame(index=Import_.index.values)
Time_Series['Import']=Import_
Time_Series['Export']=Export_
Time_Series['Loss / Profit']=Deficit_

In [None]:
Time_Series

In [None]:
fig, ax = plt.subplots(figsize=(15,7))
Time_Series.plot(ax=ax,marker='o')
ax.set_xlabel('Years')
ax.set_ylabel('Value of transactions (in million US$)')

In [None]:
Time_Series.index.name = 'Year'
Time_Series.reset_index(inplace=True)

In [None]:
Time_Series

Bar plot to show India's Loss in each year

In [None]:
# Plotting bar plot for yearwise Trend
sns.barplot(x = 'Year', y = 'Loss / Profit', data = Time_Series)
plt.show()

**Observations:**

    * The change in policies in the year 2016 and people's tendencies to buy more foreign products and depend on brands has caused a huge spike in the import bill, which tend to make India a deficit Country.
    * New Initiative taken by Goverment as "Skill India" , "Make In India", "Startup India" can help to boost the Export if the work is implemented on ground reality.
    * Also, people of the country should concentrate more on using products that are made in India
    
    

**Breaking Down India's Import and Export**

Since USA and China are the highest contributer's to India's import and export we will try to breakdown their contributions

In [None]:
China_df=import_df.groupby(['country'])
China_df=China_df.get_group('CHINA P RP') 
USA_df=export_df.groupby(['country'])
USA_df=USA_df.get_group('U S A')

In [None]:
import pylab as pl
China=China_df.groupby(['year']).agg({'value':'sum'})
USA=USA_df.groupby(['year']).agg({'value':'sum'})
contribution=pd.DataFrame(index=China.index.values)
contribution["USA's export value"]=USA
contribution["China's import value"]=China
contribution.plot(marker='o')
pl.suptitle("China's import and USA's export contributions trend")


In [None]:
USA_export=USA_df.groupby(['year','Commodity']).agg({'value':'sum'}).sort_values(by='value').tail(10)
China_import=China_df.groupby(['year','Commodity']).agg({'value':'sum'}).sort_values(by='value').tail(10)

In [None]:
China_import.plot.barh()#(kind='bar')
pl.suptitle("China's Top imported product Yearwise")

In [None]:
USA_export.plot.barh()#(kind='bar')
pl.suptitle("USA's Top imported product Yearwise")

Thank you **Shubham singh Gharsele** for your kernal