# Script Purpose
The following script generates a CSV file that contains random combinations of articles (Article) and machine names (MachineName) to an order number (OrderNumber) and serial number (SerialNumber). This CSV file is created to ensure that different use cases for SELECT and UPDATE operations in Azure Cosmos DB and Azure SQL Database can be performed under identical conditions. This also ensures that the order of the data records is identical between the Azure SQL Database and Azure Cosmos DB databases.

To ensure that both databases have the same number and type of data records, this script generates a total of 100.000 entries consisting of three different bicycle types (e-bike, road bike, mountain bike) and three different inspection machines (InspectionMachine1, InspectionMachine2, InspectionMachine3) for one order number. Each line in the CSV file represents a data record with an order number, a bike type and a machine name. Finally, the data is saved in a CSV file with the name sequence_of_inserting_data.csv.

To run fair and consistent experiments, it is crucial that both Azure Cosmos DB and Azure SQL Database contain the same amount of item records and machine names and that the order of the records is identical in both databases. This ensures that the SELECT and UPDATE queries can be performed on both databases under the same conditions.

In [None]:
import csv
import random

# Define list of articles
article_names = ['EBike', 'Roadbike', 'Mountainbike']
machine_names = ['InspectionMachine1', 'InspectionMachine2', 'InspectionMachine3']

# Name of CSV file
csv_file = 'sequence_of_inserting_data.csv'

# Function for generating random data and writing to the CSV
def generate_sequence_data():
    with open(csv_file, mode='w', newline='') as file:
        writer = csv.writer(file)
        
        # Write the header
        writer.writerow(['OrderNumber', 'SerialNumber', 'ArticleName', 'MachineName'])
        
        # Initialize variables
        max_orders = 20000
        serial_number = 1
        
        for order_number in range(1, max_orders + 1):
            formatted_order_number = f'{order_number:07d}'
            
            # Randomly select article and machine for this order
            article_name = random.choice(article_names)
            machine_name = random.choice(machine_names)
            
            for _ in range(5):  # 5 serial numbers per order number
                # Generate serial number
                formatted_serial_number = f'{serial_number:07d}'
                
                # Write to CSV
                writer.writerow([formatted_order_number, formatted_serial_number, article_name, machine_name])
                
                serial_number += 1  # Increment serial number
    
    print('Dataset generated!')

In [None]:
generate_sequence_data()