## Manufacturing Process Description
---
By Leonardo Talero

Our manufacturing plant operates on an automated production line overseen by skilled engineers. Materials are selected based on the product requirement, which then undergoes various processing stages, including shaping, assembling, and quality control. Each product's details, like its weight, production cost, sale price, and other relevant metrics, are captured in real-time and logged into our system. 

We use different machinery for varied tasks, and they are operated in shifts to ensure 24/7 production. Every item manufactured is assigned a unique product code, batch number, and undergoes strict quality control to ascertain its defect rate.

Below is an in-depth explanation of each column in our DataFrame:

### Columns:

1. **Engineer**: 
    - Description: Represents the engineer overseeing the production.
    - Type: Categorical
    - Values: Engineer_A, Engineer_B, Engineer_C

2. **Production_Cost**:
    - Description: The total cost incurred to manufacture the product.
    - Type: Continuous (Float)
    - Range: 10 to 200 currency units

3. **Sale_Price**:
    - Description: The price at which the product will be sold in the market.
    - Type: Continuous (Float)

4. **Hours_Spent**:
    - Description: Time taken in hours to produce the item.
    - Type: Continuous (Float)
    - Average: Around 5 hours

5. **Weight_kg**:
    - Description: Weight of the product in kilograms.
    - Type: Continuous (Float)
    - Average: Around 15kg

6. **Material_Used**:
    - Description: The primary material used for manufacturing the product.
    - Type: Categorical
    - Values: Steel, Plastic, Aluminium, Copper, Rubber

7. **Batch_Number**:
    - Description: Represents the production batch number.
    - Type: Integer
    - Range: Sequentially increasing

8. **Product_Code**:
    - Description: A unique code assigned to each product.
    - Type: String
    - Format: "P" followed by a unique number (e.g., P1001)

9. **Defect_Rate(%)**:
    - Description: The percentage of defective items in a batch.
    - Type: Continuous (Float)
    - Range: 0% to 1%

10. **Manufacture_Date**:
    - Description: The date when the product was manufactured.
    - Type: Date

11. **Expiration_Date**:
    - Description: The date when the product is deemed unfit for use/sale.
    - Type: Date

12. **Machine_ID**:
    - Description: ID of the machine that was used to produce the item.
    - Type: Integer
    - Range: 1 to 20

13. **Production_Shift**:
    - Description: The shift during which the product was manufactured.
    - Type: Categorical
    - Values: Morning, Afternoon, Night



In [5]:
pip install openpyxl


Collecting openpyxl
  Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.0/250.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2
Note: you may need to restart the kernel to use updated packages.


In [6]:
import pandas as pd
import numpy as np
import openpyxl

def create_manufacturing_df(n_rows=200):
    # Ensure reproducibility
    np.random.seed(42)
    
    # Engineer names column
    engineers = ['Engineer_A', 'Engineer_B', 'Engineer_C']
    engineer_col = np.random.choice(engineers, size=n_rows)
    
    # Columns with uniform distribution values
    production_cost = np.random.uniform(10, 200, n_rows)
    sale_price = production_cost + np.random.uniform(5, 50, n_rows)
    
    # Columns with normal distribution values
    hours_spent = np.random.normal(5, 2, n_rows)  # Mean = 5 hours, Std Dev = 2
    weight = np.random.normal(15, 4, n_rows)  # Mean = 15kg, Std Dev = 4kg
    
    # Manufacturing-specific columns
    materials = ['Steel', 'Plastic', 'Aluminium', 'Copper', 'Rubber']
    material_used = np.random.choice(materials, size=n_rows)
    
    batch_number = np.arange(1, n_rows+1)
    product_code = ["P" + str(1000 + i) for i in batch_number]
    
    defect_rate = np.random.uniform(0, 1, n_rows) # between 0 and 1 percent
    
    manufacture_date = pd.date_range(start="2021-01-01", periods=n_rows, freq="D")
    expiration_date = manufacture_date + pd.to_timedelta(np.random.randint(30, 365, n_rows), unit="D") # Expiry between 1 month to 1 year

    machine_id = np.random.randint(1, 20, n_rows)  # Machines numbered between 1 and 20

    shifts = ['Morning', 'Afternoon', 'Night']
    production_shift = np.random.choice(shifts, size=n_rows)
    
    # Create the DataFrame
    df = pd.DataFrame({
        'Engineer': engineer_col,
        'Production_Cost': production_cost,
        'Sale_Price': sale_price,
        'Hours_Spent': hours_spent,
        'Weight_kg': weight,
        'Material_Used': material_used,
        'Batch_Number': batch_number,
        'Product_Code': product_code,
        'Defect_Rate(%)': defect_rate * 100,
        'Manufacture_Date': manufacture_date,
        'Expiration_Date': expiration_date,
        'Machine_ID': machine_id,
        'Production_Shift': production_shift
    })
    
    return df

# Test the function
df = create_manufacturing_df()
print(df.head())
df.to_excel("manufacturing.xlsx")

     Engineer  Production_Cost  Sale_Price  Hours_Spent  Weight_kg  \
0  Engineer_C        86.728873   99.618735     3.520762  18.047676   
1  Engineer_A        22.329527   28.101777     3.701505  15.850623   
2  Engineer_C        58.243929   97.595328     3.302959  18.471396   
3  Engineer_C        56.906452   98.217536     3.127510  11.253944   
4  Engineer_A       142.297812  162.881506     5.666347  11.497900   

  Material_Used  Batch_Number Product_Code  Defect_Rate(%) Manufacture_Date  \
0     Aluminium             1        P1001       16.637759       2021-01-01   
1       Plastic             2        P1002       77.098429       2021-01-02   
2        Copper             3        P1003       70.857200       2021-01-03   
3         Steel             4        P1004       47.357389       2021-01-04   
4        Copper             5        P1005        5.792046       2021-01-05   

  Expiration_Date  Machine_ID Production_Shift  
0      2021-11-29           4        Afternoon  
1     