# Danone - IT Meetup Opole # 2

### This Jupyter notebook explains the steps followed to obtain the orders details and the raw material used on the production of different recipes.
### The original source of the data is an Excel file, manually prepared using paper documents and existing systems in the factories.

![Danone](../img/Danone.png)

_______________

In [1]:
# Prearing working environment
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter, column_index_from_string
import pandas as pd
import sys
sys.path.append(r'/home/hack/utils')
# Loading internal file with DB connections details
from db_connections import my_engine

## I'm using sqlalchemy for MS SQL Server + pyodbc
## engine = create_engine('mssql+pyodbc://user:password@server/database?driver=ODBC+Driver+17+for+SQL+Server', fast_executemany=True)
## Since SQLAlchemy 1.3.0, released 2019-03-04, sqlalchemy now supports engine = create_engine(sqlalchemy_url, fast_executemany=True) for the mssql+pyodbc dialect.


### The original Excel file was not prepared in a standard table form. Some minor adjustments, using Excel functionalities, were performed to achieve a conventional dataframe-like table structure.

![order_details_modifications](img/orders_details.png)

In [9]:
# Loading the excel file:
orders = pd.read_excel(r'../excel_files/orders_data.xlsx', sheet_name='GENERAL', skiprows = 1, usecols = 'A:Q')

# renaming original columns names in Polish to a standard English version - easier to work with
orders.rename(columns={'ID':'id',
                       'Linia':'line_dd',
                       'Numer zlecenia':'process_order',
                       'Partia SAP':'process_order_sap',
                       'Partia MES':'process_order_mes',
                       'Materiał':'material_code',
                       'Receptura':'recipe',
                       'Data aktywacji':'activation_date',
                       'Data zamknięcia':'closing_date',
                       'Materiał2':'material_sub',
                       'Partia SAP3':'process_order_sap3',
                       'Usage percentage':'usage_pct',
                       'Tłuszcz [%]':'fat_pct',
                       'Granulacja [um] 200um though':'particles_grp1',
                       'Granulacja [um]\n200um retained':'particles_grp2',
                       'Granulacja [um]\n500um retained':'particles_grp3',
                       'Wilgotność [g/100g]':'humidity'}, inplace=True)

#### Logistic information defining the process order and production scheduling

In [3]:
orders_details = orders.loc[:,['id','line_dd','process_order','process_order_sap','material_code','recipe','activation_date','closing_date']]
orders_details = orders_details.drop_duplicates()

#### Information about the raw material(s) used on the recipe

In [4]:
raw_material_used = orders.loc[:,['id','material_sub','process_order_sap3','usage_pct']]
raw_material_used = raw_material_used.drop_duplicates()

#### Characteristics of the INput materials into the recipe processing

In [5]:
raw_material_in = orders.loc[:,['id','process_order_sap3','fat_pct','particles_grp1','particles_grp2','particles_grp3','humidity']]
raw_material_in = raw_material_in.drop_duplicates()

### For the sake of this excercise, and since the data used here is small, we are saving the data as "df.csv" it is accessible in this repository in the *processed_data* folder.

In [6]:
orders_details.to_csv("../processed_data/orders_details.csv")
raw_material_used.to_csv("../processed_data/raw_material_used.csv")
raw_material_in.to_csv("../processed_data/raw_material_in.csv")

### For the Hackathon the data will be available through a MS SQL Server.

In [7]:
# orders_details - witing to DB using pandas
orders_details.to_sql('orders_details', my_engine, if_exists='replace', index=False)

# raw_material_used - witing to DB using pandas
raw_material_used.to_sql('raw_material_used', my_engine, if_exists='replace', index=False)

# raw_material_in - witing to DB using pandas
raw_material_in.to_sql('raw_material_in', my_engine, if_exists='replace', index=False)

### We have the data already in the DB, and thanks to panda we didnt have to create the tables by ourselves. However, to have the proper data model we need to define the necessary primary and foreing keys

In [8]:
connection = my_engine.connect()
# First we need to declare the column used as primary key as NOT NULL
connection.execute("ALTER TABLE orders_details ALTER COLUMN id INT NOT NULL")
# Now it is possible to alter the table and set the _id_ as primary key
connection.execute("ALTER TABLE orders_details ADD CONSTRAINT orders_details_pk PRIMARY KEY (id)")

## For raw_material_used we set-up orders_details.id as Foreing key
# declaring as not nulleable the columns to be used as primary key
connection.execute("ALTER TABLE raw_material_used ALTER COLUMN id INT NOT NULL")
connection.execute("ALTER TABLE raw_material_used ALTER COLUMN material_sub INT NOT NULL")
connection.execute("ALTER TABLE raw_material_used ALTER COLUMN process_order_sap3 INT NOT NULL")
# to avoid creating artificial columns, a group of exisitng columns is used as primary key
# we could have kept index=True when sending the data to DB, and used as id. 
connection.execute("ALTER TABLE raw_material_used ADD CONSTRAINT raw_material_used_pk PRIMARY KEY (id,material_sub,process_order_sap3)")
# finally, the foreing key is created
connection.execute("ALTER TABLE raw_material_used ADD CONSTRAINT fk_raw_material_used_id FOREIGN KEY (id) REFERENCES orders_details (id) ON DELETE NO ACTION")

## For raw_material_in we set-up orders_details.id as Foreing key
# declaring as not nulleable the columns to be used as primary key
connection.execute("ALTER TABLE raw_material_in ALTER COLUMN id INT NOT NULL")
connection.execute("ALTER TABLE raw_material_in ALTER COLUMN process_order_sap3 INT NOT NULL")
# as before, we use a combination of columns as our primary
connection.execute("ALTER TABLE raw_material_in ADD CONSTRAINT raw_material_in_pk PRIMARY KEY (id,process_order_sap3)")
connection.execute("ALTER TABLE raw_material_used ADD CONSTRAINT fk_raw_material_in_id FOREIGN KEY (id) REFERENCES orders_details (id) ON DELETE NO ACTION")

connection.close()

<sqlalchemy.engine.result.ResultProxy at 0x7fd531ebf048>