DATA ENGINEERING PIPELINE - INTRASTAT DATA TRANSFORMATION

Aim:
Write a production ready data engineering pipeline using python and pandas.

Overview:
Intrastat is a system that collects information relating to the trade of goods. This script will transform mock invoice data from a fictious company into a submissable Swedish intrastat declaration.

Task:

Below outlines the steps to be performed:
    
    1) Import the necessary libraries for the project.
    2) Define the functions that will faciliate the data engineering.
    3) Parse the mock data into a pandas dataframe.
    4) Parse 2023 intrastat commodity code list into pandas dataframe.
    5) Verify mock data is correct using commodity code list.
    6) Cleanse and transform data using pandas library functions.
    7) Display the content as a pandas data frame.
    8) Export the content to a excel submissable to Swedish stats authority.

In [81]:
import pandas as pd # Data analysis library.
import numpy as np # Array and matrice libary
import ssl # Secure sockets layer package.
import urllib # Url handling module.
import sys # Runtime environment handling module.

In [95]:
df_src = pd.read_excel('Intrastat Dispatches Data Sample.xlsx', dtype = str)
df_cn8 = pd.read_csv('CN8 Codes.csv',dtype=str)
df_out = pd.merge(df_src, df_cn8, how='left', left_on='Commodity Code', right_on='CN8')

In [96]:
df_out['Check'] = np.where(df_out['Commodity Code'] == df_out['CN8'], 'True', 'False')

In [97]:
df_out

Unnamed: 0,Commodity Code,Description_x,Mass (grams),Net (EUR),Quantity,Shipping Date,Ship From,Ship To,County of Origin,Mode of Transport,Incoterms,Transaction,Partner VAT,CN8,SU,Description_y,Check
0,61012010,Men's or boys' overcoats,0.595,190,1,04-01-2023,SE,DE,China,Rail,DAP,B2C,Private Customer,61012010,p/st,"Men's or boys' overcoats, car coats, capes, cl...",True
1,61012090,Men's or boys' overcoats,0.678,150,1,01-01-2023,SE,NL,China,Sea,DDP,B2B,NL999999999999,61012090,p/st,"Men's or boys' anoraks, incl. ski jackets, win...",True
2,61013010,Men's or boys' overcoats,0.704,175,1,05-01-2023,SE,ES,China,Road,DAP,B2C,Private Customer,61013010,p/st,"Men's or boys' overcoats, car coats, capes, cl...",True
3,61019080,Men's or boys' overcoats,0.844,135,1,07-01-2023,SE,FR,China,Air,DDP,B2B,FR999999999999,61019080,p/st,"Men's or boys' anoraks, incl. ski jackets, win...",True
4,61021010,Women's or girls' overcoats,0.461,145,1,02-01-2023,SE,NL,China,Road,DDP,B2B,NL999999999999,61021010,p/st,"Women's or girls' overcoats, car coats, capes,...",True
5,61021090,Women's or girls' overcoats,0.589,160,1,03-01-2023,SE,BE,China,Air,DDP,B2B,BE999999999999,61021090,p/st,"Women's or girls' anoraks, incl. ski jackets, ...",True
6,61022090,Women's or girls' overcoats,0.533,155,1,05-01-2023,SE,PT,China,Sea,DAP,B2C,Private Customer,61022090,p/st,"Women's or girls' anoraks, incl. ski jackets, ...",True
7,61029010,Women's or girls' overcoats,0.406,180,1,06-01-2023,SE,IT,China,Rail,DAP,B2C,Private Customer,61029010,p/st,"Women's or girls' overcoats, car coats, capes,...",True
