# File Type Demo

In data science, there a ton of different file types to learn. In this demo, we'll cover some of the more popular types.

In [24]:
import pandas as pd
import uuid 
import datetime
import random

### Data creation

First, we'll start by making all of our data files for a Taleggio production (our stinky cheese de jour).

In [39]:
# Supplier table: supplier_uuid, supplier_name, supplier_address, supplier_phone, supplier_email, supplier_website, supplier_notes
supplier_table = pd.DataFrame({
    'supplier_uuid': [uuid.uuid4() for _ in range(10)],
    'supplier_name': ['Supplier A', 'Supplier B', 'Supplier C', 'Supplier D', 'Supplier E', 'Supplier F', 'Supplier G', 'Supplier H', 'Supplier I', 'Supplier J'],
    'supplier_address': ['123 Main St', '456 Oak Ave', '789 Pine Rd', '101 Elm St', '123 Maple Ave', '456 Pine St', '789 Oak St', '101 Pine St', '123 Oak St', '456 Pine St'],
    'supplier_phone': ['123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890', '123-456-7890'],
    'supplier_email': ['supplierA@example.com', 'supplierB@example.com', 'supplierC@example.com', 'supplierD@example.com', 'supplierE@example.com', 'supplierF@example.com', 'supplierG@example.com', 'supplierH@example.com', 'supplierI@example.com', 'supplierJ@example.com'],
    'supplier_website': ['https://www.supplierA.com', 'https://www.supplierB.com', 'https://www.supplierC.com', 'https://www.supplierD.com', 'https://www.supplierE.com', 'https://www.supplierF.com', 'https://www.supplierG.com', 'https://www.supplierH.com', 'https://www.supplierI.com', 'https://www.supplierJ.com']
})

receiving_table = pd.DataFrame({
    'milk_lot_uuid': [uuid.uuid4() for _ in range(10)],
    'supplier_uuid': supplier_table['supplier_uuid'],
    'milk_lot_date': [datetime.datetime.now() for _ in range(10)],
    'milk_lot_quantity': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
    'milk_lot_price': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
    'milk_lot_notes': ['Supplier A notes', 'Supplier B notes', 'Supplier C notes', 'Supplier D notes', 'Supplier E notes', 'Supplier F notes', 'Supplier G notes', 'Supplier H notes', 'Supplier I notes', 'Supplier J notes']
})

acidification_table = pd.DataFrame({
    'lot_uuid': [uuid.uuid4() for _ in range(10)],
    'milk_lot_uuid': receiving_table['milk_lot_uuid'],
    'culture_uuid': [uuid.uuid4() for _ in range(10)],
    'culture_name': ['Culture A', 'Culture B', 'Culture C', 'Culture D', 'Culture E', 'Culture F', 'Culture G', 'Culture H', 'Culture I', 'Culture J'],
    'culture_proportion': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
    'rennet_uuid': [uuid.uuid4() for _ in range(10)],
    'rennet_name': ['Rennet A' for _ in range(10)],
    'rennet_quantity': [random.randint(50, 60) for _ in range(10)], # ml
    'profile': [
        [
            (
                round(interval * 0.25, 2),  # time in hours (15 min intervals)
                round(
                    start_pH - (start_pH - end_pH) * (interval / num_intervals),
                    2
                )
            )
            for interval in range(num_intervals + 1)
        ]
        for start_pH, end_pH, duration_hours in [
            (
                round(random.uniform(6.6, 6.8), 2),  # typical fresh milk pH
                round(random.uniform(4.9, 5.2), 2),  # target pH after acidification
                round(random.uniform(1.8, 2.2), 2)   # limit to ~2 hours
            )
            for _ in range(10)
        ]
        for num_intervals in [int(duration_hours * 4)]  # 4 intervals per hour (15 min)
    ],
})

acidification_table

Unnamed: 0,lot_uuid,milk_lot_uuid,culture_uuid,culture_name,culture_proportion,rennet_uuid,rennet_name,rennet_quantity,profile
0,7c3f0186-e402-4e36-8773-9d93ae91c976,5d069495-459b-4231-82d6-f6efc6f6f0d5,404bc62e-a727-4746-ab9d-8792ed45f039,Culture A,0.1,dd6912d0-5959-4b58-a8ea-c5a1eac0802a,Rennet A,59,"[(0.0, 6.66), (0.25, 6.44), (0.5, 6.22), (0.75..."
1,861b6a42-7a78-40fe-8d24-2177d3a0f16f,13a03641-8f6c-46b5-a612-53a0e57a5a98,9ee6326f-dcfc-435c-80ec-012829cef9c9,Culture B,0.2,fce7d486-76b8-4853-91b7-9c1d6b310ad2,Rennet A,57,"[(0.0, 6.66), (0.25, 6.47), (0.5, 6.28), (0.75..."
2,362da66a-30e1-49e5-b9ea-97b286cc726c,0bdfde70-57b8-48cd-b420-8f656153a537,e1f78e52-7f22-45de-b0ec-958902523a7f,Culture C,0.3,5e2d40d6-db74-4553-8da4-ae6946e102e1,Rennet A,55,"[(0.0, 6.69), (0.25, 6.44), (0.5, 6.2), (0.75,..."
3,c90a2e50-f4f3-44eb-b82e-e0697d8b98bc,149cfd49-f2cc-4b23-99e5-a66459fd0d62,c4e78a5a-62e5-4bf9-8750-784ba20a3b38,Culture D,0.4,acce649c-43af-46bf-849d-477bc440c12e,Rennet A,50,"[(0.0, 6.76), (0.25, 6.5), (0.5, 6.25), (0.75,..."
4,22e52237-a791-4a81-ae05-6df983d668a1,309df45a-b2b2-481e-b3c5-7db692dfa0a1,46f73408-283f-49b2-a85d-33023043015e,Culture E,0.5,5e3950f0-1913-467a-9d24-7487f52b8445,Rennet A,56,"[(0.0, 6.67), (0.25, 6.46), (0.5, 6.25), (0.75..."
5,8aaef030-5c57-438f-848f-71e48f6daf5c,377036cc-5592-4a2c-9d7d-69e53dfd3ba9,86a01424-df1b-4e23-b717-912feff214b7,Culture F,0.6,940ef96d-6fd2-445a-b58e-27df283794a5,Rennet A,55,"[(0.0, 6.75), (0.25, 6.52), (0.5, 6.29), (0.75..."
6,0c9af978-67ea-4ac9-b5f5-3f2ddef392db,2946ba31-5a5e-438d-a94d-e951425a7427,bee69c3e-7760-485c-b970-f0df44bd3a63,Culture G,0.7,a5534a65-bd2c-47a8-a7a8-4aa1806a651d,Rennet A,60,"[(0.0, 6.64), (0.25, 6.43), (0.5, 6.22), (0.75..."
7,bd726491-ad45-4ee0-b22a-30dba4e23f5b,1e09fefa-72de-4910-ace2-3b23671a45aa,6474cb4f-8583-4dac-a5e3-af25de466f6f,Culture H,0.8,8ad7a650-0f8a-4dcd-82fd-c031de26bc46,Rennet A,54,"[(0.0, 6.73), (0.25, 6.54), (0.5, 6.35), (0.75..."
8,21704601-7231-4fcc-9033-7d0f7c29aae2,064f49d8-83f4-411c-926b-a68629a09473,0b8eacf9-37b4-46c3-a9d7-54f353646390,Culture I,0.9,d6e13c15-9cbc-421f-87b9-43722dc0c9a7,Rennet A,55,"[(0.0, 6.8), (0.25, 6.59), (0.5, 6.38), (0.75,..."
9,4c379e85-884e-417f-aeb9-5e260748f48e,a45bb9af-9ef1-4177-9325-d9e60b3b83c6,151af7ea-bd7b-413f-826c-ab8c13967eeb,Culture J,1.0,cc11cad8-b0be-440b-b860-5c9cadc47a09,Rennet A,58,"[(0.0, 6.68), (0.25, 6.46), (0.5, 6.24), (0.75..."


### txt

**Set the fpath/fnames**: 
Here we'll use txt files as system/user instructions for a GenAI project. First, let's set the fpaths.
Fpath is a shorthand reference to the file path, just like fname is shorthand for file name. 
Because the fpath is the same for all files, we'll just add it to (with '+') to the fnames.

In [None]:
# Note: In this case ("./" means the current directory, which is where the file is located)
file_fpath = './data/'

system_fname = 'system.txt' 
user_fname = 'user.txt'

system_fpath = file_fpath + system_fname
user_fpath = file_fpath + user_fname

**Read the files**: With "Open as file", we give the fpath with the read-only parameter 'r', and assign it to the variable "file" .
We then read the file and assign it to the variable "txt_content". This is a round-about way, but its a common pattern.

In [None]:

with open(system_fpath, 'r') as file:
    system_instructions = file.read()

with open(user_fpath, 'r') as file:
    user_instructions = file.read()

# List of stinky cheeses
stinky_cheeses = [
    "Roquefort",
    "Stilton",
    "Cheddar",
    "Brie",
    "Camembert",
]

# By adding f in front of the string, we can use python's f-string formatting to insert the value of txt_content into the string.
print('System Instructions:', system_instructions)
print(f'User Instructions: {user_instructions}: {stinky_cheeses}')

# Now we can use the system and user instructions to generate a response. Thanks txt files.

System Instructions: "You are an assistant to help me choose the stinkiest cheese."
User Instructions: "Here the available cheeses": ['Roquefort', 'Stilton', 'Cheddar', 'Brie', 'Camembert']
