![Image](https://1et31t3azwc3jxcfm1s69wb8-wpengine.netdna-ssl.com/wp-content/uploads/2018/04/Indian-apparel-industry.gif)

**Little About the Domain**

Foreign trade Includes Import and Export.
**Importing** means buying foreign goods and services by citizens, businesses and government of a country. No matter, how they are sent to the country. They can be shipped, sent it by e-mail, or even hand carried in personal luggage on a plane. A country importing more than it’s export, runs a **trade deficit**. Whereas, a country importing less than it’s exports, create a **trade surplus**.

**Exporting** means goods and services which are produced in one country are purchased in another country. It is produced domestically and sold to someone in a foreign country. Most countries want to increase their exports as it increases the GDP of the country.

In India At the level of Central Government it is administered by the Ministry of Commerce and Industry.

Prior to the 1991 economic liberalisation,India was a closed economy due to the average tariffs exceeding 200 percent and the extensive quantitative restrictions on imports. Foreign investment was strictly restricted to only allow Indian ownership of businesses. Since the liberalisation, India's economy has improved mainly due to increased foreign trade

[More on Quora](https://www.quora.com/What-is-import-and-export)
[More on Wikis](https://en.wikipedia.org/wiki/Foreign_trade_of_India)

**Loading Libraries**

we will import libraries for data processing and preparing charts 

In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# charts
import seaborn as sns 
import matplotlib.pyplot as plt
import squarify #TreeMap

# import graph objects as "go"

import plotly.graph_objs as go

%matplotlib inline

#ignore warning 
import warnings
warnings.filterwarnings("ignore")

# Input data files are available in the "../input/" directory.
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


**Data Loading**

we need to load two files one for import and other for export . The Files contain import and export data from 2010 to 2018.

In [None]:
data_import = pd.read_csv("/kaggle/input/india-trade-data/2018-2010_import.csv")
data_export = pd.read_csv("/kaggle/input/india-trade-data/2018-2010_export.csv")

**Data Sneak Peek**

we will take a quick look on what data we have . 

In [None]:
data_export.head(5)

In [None]:
data_import.head(5)

In both the File we have 5 columns each.
* HSCode - HS stands for Harmonized System. It was developed by the WCO (World Customs Organization) as a multipurpose international product nomenclature that describes the type of good that is shipped
HS Code Structure

    The HS code can be described as follows:

    It is a six-digit identification code.
    It has 5000 commodity groups.
    Those groups have 99 chapters.
    Those chapters have 21 sections.
    It’s arranged in a legal and logical structure.
    Well-defined rules support it to realize uniform classification worldwide.
    
   the HSCode in column is 99 chapters 
   
   [Reference](https://www.tradefinanceglobal.com/freight-forwarding/what-is-an-hs-code/)
   [HSCode List  ](http://www.cybex.in/HS-Codes/Default.aspx)
   
* Commodity - the column contain chapter wise commodity category. In each commodity Category there are various commodities.

A **commodity** is an economic good or service that has full or substantial fungibility: that is, the market treats instances of the good as equivalent or nearly so with no regard to who produced them. 

[Reference](https://en.wikipedia.org/wiki/Commodity)

* Value - values for export and import of commodities in million US $.
* Country - Country Imported From/ Exported To
* Year - Year in which comodities where Imported/Exported which is in between 2010 to 2018.

In [None]:
data_export.describe()

* HSCode shows data is under 1 to 99 which is correct as we discussed above.
* Value is showing a huge outlier as 75 % data is below 3.7 and maximum is 19805 seems some items category are very expensive .we will research it on this in later.also min is zero as some export may be two small to roundoff in two decimal.
* Year is under 2010 to 2018 Perfect.

In [None]:
data_import.describe()

* HSCode shows data is under 1 to 99 which is correct as we discussed above.
* Value is showing a huge outlier as 75 % data is below 4.9 and maximum is 32781 seems some items category are very expensive .also min is zero as some import may be two small. 
* Year is under 2010 to 2018 Perfect.



In [None]:
data_export.info()

In [None]:
data_import.info()

* Export files  have 137023 rows of data
* Import files around have 67799 rows of data
* Value contain null value for both import and export


**Code Cleanup**

so there is need of code cleanup to better visualize the data.
lets find out the key area to cleaup


In [None]:
data_import.isnull().sum()

In [None]:
data_import[data_import.value==0].head(5)

In [None]:
data_import[data_import.country == "UNSPECIFIED"].head(5)

In [None]:
print("Duplicate imports : "+str(data_import.duplicated().sum()))
print("Duplicate exports : "+str(data_export.duplicated().sum()))


From the Above data check we find out columns to cleanup.
* Value column  has Null values.
* Value Column has zero value.
* Country column has unspecified value.
* Duplicate imports rows

there can be various way to handle it but for now we are deleting the rows .
Discalmer:- As we are deleting rows some of the analysis may be imapcted but it is ok for educational purpose.

In [None]:
def cleanup(data_df):
    #setting country UNSPECIFIED to nan
    data_df['country']= data_df['country'].apply(lambda x : np.NaN if x == "UNSPECIFIED" else x)
    #ignoring where import value is 0 . 
    data_df = data_df[data_df.value!=0]
    data_df.dropna(inplace=True)
    data_df.year = pd.Categorical(data_df.year)
    data_df.drop_duplicates(keep="first",inplace=True)
    return data_df

In [None]:
data_import = cleanup(data_import)
data_export = cleanup(data_export)

In [None]:
data_import.isnull().sum()

 ** Commodity Analysis**

* Commodity Import Count

In [None]:
print("Import Commodity Count : "+str(len(data_import['Commodity'].unique())))
print("Export Commodity Count : "+str(len(data_export['Commodity'].unique())))

So why is there just 98 Commodity when there 99 chapters.(This is not deleted in code cleanup)
So to findout where is the missing commodity. I printed all HSCode and find out that 77 is missing.
* HSCode 77 is actually reserved for Possible Future Use.

[HSCode-77 ](http://www.cybex.in/HS-Codes/Reserved-Possible-Future-Use-Chapter-77.aspx)

* Commodity count based on different import (country/year)
Lets count the most popular import Commodity . popular just as no of transactions(country/year) are more for this category.

In [None]:
df = pd.DataFrame(data_import['Commodity'].value_counts())
df.head(20)

In [None]:
print("No of Country were we are importing Comodities are "+str(len(data_import['country'].unique())))
print("No of Country were we are Exporting Comodities are "+str(len(data_export['country'].unique())))

So India is doing Trade  with around 246 Countries. 

According http://www.world-country.com/ there are 247 countries and territories . 192 are only recognised by United Nation.

We cannot ignored the possibilty that data can be in short form or same country represented multiple time different way.

 **Import And Export Year Wise**

In [None]:
df3 = data_import.groupby('year').agg({'value':'sum'})

df4 = data_export.groupby('year').agg({'value':'sum'})


In [None]:
df3['deficit'] = df4.value - df3.value
df3

In [None]:

# create trace1 
trace1 = go.Bar(
                x = df3.index,
                y = df3.value,
                name = "Import",
                marker = dict(color = 'rgba(0,191,255, 1)',
                             line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df3.value)
# create trace2 
trace2 = go.Bar(
                x = df4.index,
                y = df4.value,
                name = "Export",
                marker = dict(color = 'rgba(1, 255, 130, 1)',
                              line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df4.value)

trace3 = go.Bar(
                x = df3.index,
                y = df3.deficit,
                name = "Trade Deficit",
                marker = dict(color = 'rgba(220, 20, 60, 1)',
                              line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df3.deficit)


data = [trace1, trace2, trace3]
layout = go.Layout(barmode = "group")
fig = go.Figure(data = data, layout = layout)
fig.update_layout(
    title=go.layout.Title(
        text="Yearwise Import/Export/Trade deficit",
        xref="paper",
        x=0
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Courier New, monospace",
                size=18,
                color="#7f7f7f"
            )
        )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Value",
            font=dict(
                family="Courier New, monospace",
                size=18,
                color="#7f7f7f"
            )
        )
    )
)




fig.show()

* Import is always more than the export creating a trade defecit which we can see in red bar graph.
* In 2011 2012 showing a huge huge trade deficit and after which it gradually decreases till 2016 and then increased in 2017 and 2018.

 **Import And Export Country Wise**

In [None]:
df5 = data_import.groupby('country').agg({'value':'sum'})
df5 = df5.sort_values(by='value', ascending = False)
df5 = df5[:10]

df6 = data_export.groupby('country').agg({'value':'sum'})
df6 = df6.sort_values(by='value', ascending = False)
df6 = df6[:10]

In [None]:
sns.set(rc={'figure.figsize':(15,6)})
ax1 = plt.subplot(121)

sns.barplot(df5.value,df5.index).set_title('Country Wise Import')

ax2 = plt.subplot(122)
sns.barplot(df6.value,df6.index).set_title('Country Wise Export')
plt.tight_layout()
plt.show()

* China has biggest market in india  followed by UAE,Saudi Arabia and USA
* USA is our biggest importer followed by UAE and China Republic.

**Trade Defecit/Surplus Top 5 country  **
* China - very Huge Trade Deficit
* UAE - little Trade Surplus
* Saudi Arabia- Huge Trade Deficit
* USA - little Trade Surplus
* Swizerland - not making even in the top export graph showing sign of Huge Trade Deficit.

 **Import And Export Year Wise Trend**

In [None]:
fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x=df3.index, y=df3.value, name='Import',mode='lines+markers',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x=df4.index, y=df4.value, name = 'Export',mode='lines+markers',
                         line=dict(color='royalblue', width=4)))
fig.update_layout(
    title=go.layout.Title(
        text="Yearwise Import/Export",
        xref="paper",
        x=0
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Courier New, monospace",
                size=18,
                color="#7f7f7f"
            )
        )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Value",
            font=dict(
                family="Courier New, monospace",
                size=18,
                color="#7f7f7f"
            )
        )
    )
)

fig.show()

* slowdown in trade between 2014 - 2015 .
* In 2016 and 2013 it shows export was on decrease but the import was growing.
* Export shows downward trend after 2011-2012 till 2016 after which the export increased again.
* Import showing upward trend 2010 - 2011 then till 2014 shows sideway trend then a sharp decline in 2015 and upward trend there after. 

In [None]:
df3 = data_import.groupby('Commodity').agg({'value':'sum'})
df3 = df3.sort_values(by='value', ascending = False)
df3 = df3[:10]

df4 = data_export.groupby('Commodity').agg({'value':'sum'})
df4 = df4.sort_values(by='value', ascending = False)
df4 = df4[:10]

In [None]:
sns.set(rc={'figure.figsize':(15,10)})
#ax1 = plt.subplot(121)
sns.barplot(df3.value,df3.index).set_title('Commodity Wise Import')
plt.show()
#ax2 = plt.subplot(122)
sns.barplot(df4.value,df4.index).set_title('Commodity Wise Import')
plt.show()

* Top exported categories are also the top imoported categories but there is huge trade deficit category wise. 
* Vehicle Other Than railway...and Pharmaceutical product HScode Chapters shows a trade surplus 


1. **let's Analyse Expensive Import **

As we have seen there is huge difference in distribution we will analyse some of the expensive imports

In [None]:
expensive_import = data_import[data_import.value>1000]
expensive_import.head(10)

** Import Value Vs HSCode(Commodity Code)**

In [None]:
#fig, ax = plt.subplots(1,1,figsize=(18,6)) 
plt.figure(figsize=(20,9))
#plt.rcParams['figure.figsize']=(23,10)
# the size of A4 paper
#fig.set_size_inches(11.7, 8.27)
ax = sns.boxplot(x="HSCode", y="value", data=expensive_import).set_title('Expensive Imports HsCode distribution')
plt.show()


* HSCode Chapter 27, 71 tops the expensive  import as already seen (Mineral Fuels and expensive jewellery)
* HSCode Chapter 15,29, 84, 85  also has a expensive imports.

In [None]:
df =expensive_import.groupby(['HSCode']).agg({'value': 'sum'})
df = df.sort_values(by='value')

In [None]:
 
value=np.array(df)
commodityCode=df.index
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (10.0, 6.0)
squarify.plot(sizes=value, label=commodityCode, alpha=.7 )
plt.axis('off')
plt.title("Expensive Imports HsCode Share")
plt.show()

* The share of the HSCode category 27, 71, 85, 84 in expensive Category has most valuable imports

**Country Analysis** 

In [None]:
len(expensive_import['country'].unique())

In [None]:
df1 = expensive_import.groupby(['country']).agg({'value': 'sum'})
df1 = df1.sort_values(by='value')

In [None]:
value=np.array(df1)
country=df1.index
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (10.0, 10.0)
squarify.plot(sizes=value, label=country, alpha=.7 )
plt.title("Expensive Imports Countrywise Share")
plt.axis('off')
plt.show()

* Country Wise for Expensive Items China, Saudi Arab, UAE, Swizerland has most share followed by USA, Iraq 

**Conclusive Comments**
* Need of the Hour is to reduce the trade deficit
* New Initiative taken by Goverment as "Skill India" , "Make In India", "Startup India" can help to boost the Export if the work is implemented on ground reality. 
* Bilateral ties between countries helps to reduce export duty which help the local company compete in global market.
* As India Primarily is a Agricultural Country training and Guiding and providing export quality crops/medicinal plant can help to boost agricultural export.


   **Thank you for taking Out Time to read this. Please Upvote Comment to keep me motivated**

![](http://pluspng.com/img-png/animated-thank-you-png-for-powerpoint-copyright-2018-animations-media-960.gif)