Data Transformation & Cleaning

**Goal:** Convert hierarchical JSON data into relational tables and perform data cleaning.

**Input:** `mustika_rasa_full.json`  
**Outputs:** 
1. `df_recipes` (Recipe Metadata)
2. `df_ingredients` (Ingredient Details)

In [52]:
import pandas as pd
import json
import os

# 1. Setup Paths
BASE_DIR = os.getcwd()
INPUT_FILE = os.path.join(BASE_DIR, "mustika_rasa_full.json")

# 2. Load JSON Data
try:
    with open(INPUT_FILE, 'r', encoding='utf-8') as f:
        raw_data = json.load(f)
    print(f"Successfully loaded {len(raw_data)} recipes.")
except FileNotFoundError:
    print("Error: JSON file not found. Please ensure 'mustika_rasa_full.json' is in this folder.")

Successfully loaded 1783 recipes.


## 1. Flattening the Data
We will transform the nested JSON into two flat lists to create our relational tables.

In [53]:
recipes_rows = []
ingredients_rows = []

ing_pk_counter = 1

for recipe in raw_data:
    # --- A. RECIPE TABLE ---
    rec_id = recipe.get('recipe_id')
    
    # Flatten instructions to single string
    instructions_clean = "\n".join(recipe.get('instructions', []) or [])
    
    # Keep tree structure as JSON string for reference
    ing_json_str = json.dumps(recipe.get('ingredient_groups', []), ensure_ascii=False)
    
    recipes_rows.append({
        'id': rec_id,
        'title_original': recipe.get('title_original'),
        'title_normalized': recipe.get('title_normalized'),
        'source_page': recipe.get('_source_page') or recipe.get('page_number'),
        'region': recipe.get('region'),
        'category': recipe.get('category'),
        'ingredient_json': ing_json_str,
        'instruction': instructions_clean
    })
    
    # --- B. INGREDIENTS TABLE ---
    groups = recipe.get('ingredient_groups', [])
    if groups:
        for group in groups:
            g_name = group.get('group_name', 'utama')
            
            for item in group.get('ingredients', []):
                ingredients_rows.append({
                    'id': f"ING_{str(ing_pk_counter).zfill(6)}",
                    'recipe_id': rec_id,
                    'ingredient_group': g_name,
                    'ingredient_original_name': item.get('item_original'),
                    'ingredient_normalized_name': item.get('item_normalized'),
                    'ingredient_quantity': item.get('quantity'),
                    'ingredient_unit': item.get('unit')
                })
                ing_pk_counter += 1

# Create DataFrames
df_recipes = pd.DataFrame(recipes_rows)
df_ingredients = pd.DataFrame(ingredients_rows)

print(f"Recipes Table: {df_recipes.shape}")
print(f"Ingredients Table: {df_ingredients.shape}")

Recipes Table: (1783, 8)
Ingredients Table: (13818, 7)


## 2. Cleaning Recipes Dataset

In [54]:
# Inspect Recipes
display(df_recipes.head(5))
print(df_recipes.info())

Unnamed: 0,id,title_original,title_normalized,source_page,region,category,ingredient_json,instruction
0,MR_187_01,ARON,Aron,187,Tengger,Staple,"[{""group_name"": ""utama"", ""original_header"": ""B...","Bidji jagung direndam air 12 jam atau lebih, s..."
1,MR_187_02,AREM AREM,Arem Arem,187,,Savory Snack,"[{""group_name"": ""kulit"", ""original_header"": ""B...","Beras dicuci, dikaru dan digarami.\nJika sudah..."
2,MR_188_01,AREM AREM ARON,Arem Arem Aron,188,,Jajanan,"[{""group_name"": ""Bahan Utama (Aron)"", ""origina...",Aron direndam dengan garam selama ± 5 menit.\n...
3,MR_189_01,AREM AREM DJAGUNG,Arem Arem Jagung,189,,Jajanan,"[{""group_name"": ""bahan utama"", ""original_heade...","Beras djagung ditjutji bersih, direndam 1 mala..."
4,MR_190_01,DJAGUNG BOSE,Jagung Bose,190,Timor,Makanan Pokok,"[{""group_name"": ""utama"", ""original_header"": ""B...",Jagung dicuci lalu direndam semalam dalam air....


<class 'pandas.DataFrame'>
RangeIndex: 1783 entries, 0 to 1782
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   id                1783 non-null   str  
 1   title_original    1783 non-null   str  
 2   title_normalized  1783 non-null   str  
 3   source_page       1783 non-null   int64
 4   region            972 non-null    str  
 5   category          1783 non-null   str  
 6   ingredient_json   1783 non-null   str  
 7   instruction       1783 non-null   str  
dtypes: int64(1), str(7)
memory usage: 111.6 KB
None


In [55]:
#check recipes title 
#this data is cleaned through manual edit on json file, ~ 22 recipes was edited manually
df_recipes[df_recipes['title_original'].isnull()]

Unnamed: 0,id,title_original,title_normalized,source_page,region,category,ingredient_json,instruction


In [56]:
#check region 
region = df_recipes['region'].value_counts().reset_index()
region.columns = ['region', 'count']
region.to_csv('region.csv')

In [57]:
region_map = {
    # --- JAWA TENGAH ---
    'Rembang': ('Rembang', 'Jawa Tengah'),
    'Purwokerto': ('Purwokerto', 'Jawa Tengah'),
    'Wonosobo': ('Wonosobo', 'Jawa Tengah'),
    'Banyumas': ('Banyumas', 'Jawa Tengah'),
    'Banjumas': ('Banyumas', 'Jawa Tengah'),
    'Bajumas': ('Banyumas', 'Jawa Tengah'),
    'Tegal': ('Tegal', 'Jawa Tengah'),
    'Solo': ('Solo (Surakarta)', 'Jawa Tengah'),
    'Jawa Tengah': ('Jawa Tengah', 'Jawa Tengah'),
    'Djawa Tengah': ('Jawa Tengah', 'Jawa Tengah'),
    'Brebes': ('Brebes', 'Jawa Tengah'),
    'Pati': ('Pati', 'Jawa Tengah'),
    'Kedu': ('Kedu', 'Jawa Tengah'),
    'Magelang': ('Magelang', 'Jawa Tengah'),
    'Cilacap (Tjilatjap)': ('Cilacap', 'Jawa Tengah'),
    'Cilacap': ('Cilacap', 'Jawa Tengah'),
    'Klaten': ('Klaten', 'Jawa Tengah'),
    'Purworedjo': ('Purworejo', 'Jawa Tengah'),
    
    # --- JAWA TIMUR ---
    'Madura': ('Madura', 'Jawa Timur'),
    'Malang': ('Malang', 'Jawa Timur'),
    'Madiun': ('Madiun', 'Jawa Timur'),
    'Jawa Timur': ('Jawa Timur', 'Jawa Timur'),
    'Surabaya': ('Surabaya', 'Jawa Timur'),
    'Pamekasan': ('Pamekasan', 'Jawa Timur'),
    'Magetan': ('Magetan', 'Jawa Timur'),
    'Sumenep': ('Sumenep', 'Jawa Timur'),
    'Tengger': ('Tengger', 'Jawa Timur'),
    'Sumberrejo': ('Sumberejo', 'Jawa Timur'), # Likely Bojonegoro area
    'Pacitan': ('Pacitan', 'Jawa Timur'),
    'Patjitan': ('Pacitan', 'Jawa Timur'),
    'Jawa Tengah - Jawa Timur': ('Jawa Tengah/Timur', 'Jawa Timur'), # Grouping to Jatim/Jateng border
    'Jawa Tengah/Timur': ('Jawa Tengah/Timur', 'Jawa Tengah/Timur'),
    'Sedayu': ('Sedayu', 'Jawa Timur'), # Assuming Gresik context, though Bantul exists

    # --- JAWA BARAT & BANTEN ---
    'Jawa Barat': ('Jawa Barat', 'Jawa Barat'),
    'Djawa Barat': ('Jawa Barat', 'Jawa Barat'),
    'Sukabumi': ('Sukabumi', 'Jawa Barat'),
    'Cianjur': ('Cianjur', 'Jawa Barat'),
    'Tjiandjur': ('Cianjur', 'Jawa Barat'),
    'Bandung': ('Bandung', 'Jawa Barat'),
    'Bogor': ('Bogor', 'Jawa Barat'),
    'Cirebon': ('Cirebon', 'Jawa Barat'),
    'Priangan': ('Priangan', 'Jawa Barat'),
    'Ciamis': ('Ciamis', 'Jawa Barat'),
    'Banten': ('Banten', 'Banten'), # Separated from Jabar for modern context

    # --- DKI JAKARTA ---
    'Jakarta': ('Jakarta', 'DKI Jakarta'),
    'Djakarta': ('Jakarta', 'DKI Jakarta'),
    'Pasarminggu': ('Pasar Minggu', 'DKI Jakarta'),

    # --- DIY YOGYAKARTA ---
    'Jogjakarta': ('Yogyakarta', 'DI Yogyakarta'),
    'Yogyakarta': ('Yogyakarta', 'DI Yogyakarta'),

    # --- BALI & NUSA TENGGARA ---
    'Bali': ('Bali', 'Bali'),
    'Sumbawa': ('Sumbawa', 'Nusa Tenggara Barat'),
    'Lombok': ('Lombok', 'Nusa Tenggara Barat'),
    'Timor': ('Timor', 'Nusa Tenggara Timur'),
    'Flores': ('Flores', 'Nusa Tenggara Timur'),

    # --- SUMATERA ---
    'Palembang': ('Palembang', 'Sumatera Selatan'),
    'Padang': ('Padang', 'Sumatera Barat'),
    'Sumatera Barat': ('Sumatera Barat', 'Sumatera Barat'),
    'Sumatera Barat: Singkarak': ('Singkarak', 'Sumatera Barat'),
    'Batak': ('Batak', 'Sumatera Utara'),
    'Tapanuli': ('Tapanuli', 'Sumatera Utara'),
    'Medan': ('Medan', 'Sumatera Utara'),
    'Atjeh': ('Aceh', 'Aceh'),
    'Aceh': ('Aceh', 'Aceh'),
    'Lampung': ('Lampung', 'Lampung'),
    'Riau': ('Riau', 'Riau'),
    'Duri': ('Duri', 'Riau'), # Assuming Duri Riau
    'Kotagadang': ('Koto Gadang', 'Sumatera Barat'),
    'Singkarak': ('Singkarak', 'Sumatera Barat'),
    'Minangkabau': ('Minangkabau', 'Sumatera Barat'),
    'Minang': ('Minangkabau', 'Sumatera Barat'),
    'Pariaman': ('Pariaman', 'Sumatera Barat'),
    'Bukittinggi': ('Bukittinggi', 'Sumatera Barat'),
    'Pajakumbuh': ('Payakumbuh', 'Sumatera Barat'),
    'Kajutanam': ('Kayu Tanam', 'Sumatera Barat'),
    'Sumatra': ('Sumatera', 'Sumatera'),
    'Kerinci': ('Kerinci', 'Jambi'),

    # --- KALIMANTAN ---
    'Bandjarmasin': ('Banjarmasin', 'Kalimantan Selatan'),
    'Banjarmasin': ('Banjarmasin', 'Kalimantan Selatan'),
    'Samarinda': ('Samarinda', 'Kalimantan Timur'),
    'Kalimantan': ('Kalimantan', 'Kalimantan'),

    # --- SULAWESI ---
    'Menado': ('Manado', 'Sulawesi Utara'),
    'Manado': ('Manado', 'Sulawesi Utara'),
    'Minahasa': ('Minahasa', 'Sulawesi Utara'),
    'Sulawesi Utara': ('Sulawesi Utara', 'Sulawesi Utara'),
    'Makasar': ('Makassar', 'Sulawesi Selatan'),
    'Makassar': ('Makassar', 'Sulawesi Selatan'),
    'Sulawesi Selatan': ('Sulawesi Selatan', 'Sulawesi Selatan'),
    'Bugis': ('Bugis', 'Sulawesi Selatan'),
    'Toraja': ('Toraja', 'Sulawesi Selatan'),
    'Toradja': ('Toraja', 'Sulawesi Selatan'),
    'Djeneponto': ('Jeneponto', 'Sulawesi Selatan'),
    'Palopo': ('Palopo', 'Sulawesi Selatan'),
    'Gorontalo': ('Gorontalo', 'Gorontalo'),
    'Mandar': ('Mandar', 'Sulawesi Barat'),
    'Buton': ('Buton', 'Sulawesi Tenggara'),
    'Poso': ('Poso', 'Sulawesi Tengah'),
    'Sulawesi Utara/Tengah': ('Sulawesi', 'Sulawesi'),

    # --- MALUKU & PAPUA ---
    'Irian Barat': ('Papua', 'Papua'),
    'Ambon': ('Ambon', 'Maluku'),
    'Maluku': ('Maluku', 'Maluku'),
    'Ternate': ('Ternate', 'Maluku Utara'),

    # --- OTHER / UNKNOWN / FOREIGN ---
    'Jawa': ('Jawa', 'Pulau Jawa'), # General
    'Italia Utara (Serving Suggestion)': ('Italia Utara', 'Luar Negeri'),
    'Jalisco': ('Jalisco', 'Luar Negeri'),
}

def map_region(raw_name):
    clean, prov = region_map.get(raw_name, (raw_name, 'Unknown'))
    return pd.Series([clean, prov])

df_recipes[['region_clean', 'province_group']] = df_recipes['region'].apply(map_region)

In [58]:
df_recipes['province_group'].value_counts()

province_group
Unknown                811
Jawa Tengah            289
Jawa Timur              98
Sumatera Barat          77
Bali                    76
Sulawesi Selatan        59
Jawa Barat              58
Sumatera Selatan        51
Kalimantan Selatan      40
Sumatera Utara          33
Sulawesi Utara          25
Nusa Tenggara Barat     21
Papua                   17
DI Yogyakarta           17
Banten                  16
Nusa Tenggara Timur     14
Maluku                  14
DKI Jakarta             13
Aceh                    13
Lampung                  6
Kalimantan Timur         5
Gorontalo                5
Sulawesi Barat           4
Kalimantan               4
Sulawesi Tenggara        3
Riau                     3
Pulau Jawa               3
Luar Negeri              2
Maluku Utara             1
Sumatera                 1
Sulawesi                 1
Jambi                    1
Sulawesi Tengah          1
Jawa Tengah/Timur        1
Name: count, dtype: int64

In [59]:
#clean category
food_index = pd.read_csv('food_index.csv')

# fixing sambal category
food_index[food_index['recipes_original_name'].str.lower().str.startswith('sambal') & (food_index['category'].str.lower() != 'SAMBAL SAMBALAN ')]['category'] = 'SAMBAL SAMBALAN'

display(food_index.head())

food_index['category'].value_counts()


/var/folders/lr/kb1ct1jn4kb5f9k6k7w4tjpw0000gp/T/ipykernel_23143/3949553283.py:5: ChainedAssignmentError: A value is being set on a copy of a DataFrame or Series through chained assignment.
Such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy (due to Copy-on-Write).

Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.

See the documentation for a more detailed explanation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html#chained-assignment
  food_index[food_index['recipes_original_name'].str.lower().str.startswith('sambal') & (food_index['category'].str.lower() != 'SAMBAL SAMBALAN ')]['category'] = 'SAMBAL SAMBALAN'


Unnamed: 0,recipes_original_name,category
0,Sambal Goreng Kering,LAUK PAUK GORENGAN
1,Tempe,LAUK PAUK GORENGAN
2,Seng Geseng,LAUK PAUK GORENGAN
3,Serundeng,LAUK PAUK GORENGAN
4,Serundeng Ikan Mudjair,LAUK PAUK GORENGAN


category
DJADJANAN                        649
LAUK PAUK BASAH TIDAK BERKUAH    454
LAUK PAUK BASAH - BERKUAH        253
LAUK PAUK GORENGAN               125
LAUK PAUK BAKAR                   70
SAMBAL SAMBALAN                   64
MAKANAN UTAMA                     46
MINUMAN                           30
Name: count, dtype: int64

In [None]:
# 1. Rename the existing AI-generated category to avoid collision
df_recipes.rename(columns={'category': 'ai_category'}, inplace=True)

# 2. Create temporary lowercase columns for robust matching
df_recipes['join_key'] = df_recipes['title_original'].str.lower().str.strip()
food_index['join_key'] = food_index['recipes_original_name'].str.lower().str.strip()

# 3. Merge on the lowercase keys
df_recipes = df_recipes.merge(
    food_index[['join_key', 'category']], 
    on='join_key', 
    how='left'
)

# 4. Cleanup: Remove the temporary key
df_recipes.drop(columns=['join_key'], inplace=True)

In [77]:
df_recipes['category'].unique()

<StringArray>
[                'MAKANAN UTAMA',                       'Unknown',
     'LAUK PAUK BASAH - BERKUAH',               'SAMBAL SAMBALAN',
 'LAUK PAUK BASAH TIDAK BERKUAH',               'LAUK PAUK BAKAR',
            'LAUK PAUK GORENGAN',                     'DJADJANAN',
                       'MINUMAN']
Length: 9, dtype: str

In [72]:
import pandas as pd
import re

# --- 1. DEFINITIONS ---

CATEGORIES = {
    'MAKANAN UTAMA': ['nasi', 'bubur', 'lontong', 'ketupat', 'tortilla', 'sagu', 'jagung', 'djagung', 'tiwul'],
    'LAUK PAUK BASAH - BERKUAH': ['sayur', 'sajur', 'sop', 'soto', 'gulai', 'kare', 'kari', 'lodeh', 'asem', 'brongkos', 'rawon', 'semur', 'garang asam', 'gangan', 'pindang'],
    'LAUK PAUK BASAH TIDAK BERKUAH': ['pepes', 'botok', 'gadon', 'oseng', 'tumis', 'urap', 'pecel', 'petjel', 'karedok', 'gudeg', 'gudek', 'rendang', 'kalio', 'sambal goreng', 'sambel goreng', 'dendeng', 'terik', 'abon'],
    'LAUK PAUK BAKAR': ['sate', 'saté', 'ayam bakar', 'ikan bakar', 'panggang', 'klotok'],
    'LAUK PAUK GORENGAN': ['goreng', 'perkedel', 'pekedel', 'dadar', 'martabak', 'lumpia', 'risoles', 'risolles', 'pastel', 'tahu', 'tempe', 'keripik', 'kerupuk', 'rempeyek', 'bakwan'],
    'SAMBAL SAMBALAN': ['sambal', 'sambel', 'saos', 'bumbu', 'petis', 'dabu-dabu'],
    'DJADJANAN': ['kue', 'kué', 'cake', 'bolu', 'lapis', 'dodol', 'jenang', 'djenang', 'wajik', 'wadjid', 'getuk', 'gethuk', 'klepon', 'onde', 'apem', 'serabi', 'puding', 'poding', 'agar', 'kolak', 'pisang', 'ubi', 'singkong', 'tapai', 'tape', 'empek', 'pempek', 'tekwan', 'batagor', 'siomay'],
    'MINUMAN': ['es ', 'wedang', 'bajigur', 'bandrek', 'sirup', 'jus', 'kopi', 'teh', 'cendol', 'tjendol', 'dawet']
}

def clean_title(text):
    # Ensure input is string or null
    if pd.isna(text): return None
    text = str(text) # Force string conversion just in case
    
    # 1. Remove artifacts
    text = re.sub(r'\[.*?\]', '', text) 
    text = re.sub(r'\(.*?(?:Lanjutan|Continuation|Sambungan|Inferred).*?\)', '', text, flags=re.IGNORECASE)
    
    # 2. Remove noise words
    noise_words = ["(Lanjutan)", "(Continuation)", "Resep Ikan", "Recipe", "Untitled", "Unknown"]
    for word in noise_words:
        text = text.replace(word, "")

    # 3. Clean formatting
    text = text.replace('\n', ' ')
    text = re.sub(r'\s+', ' ', text)
    text = text.strip(" -.,")
    
    if len(text) < 3: return None
    return text

def map_category(title):
    # FIX: Strict type check. If it's not a string (e.g. float/NaN), return Unknown immediately.
    if not isinstance(title, str): 
        return "Unknown"
    
    title_lower = title.lower()
    
    for cat, keywords in CATEGORIES.items():
        for key in keywords:
            if re.search(r'\b' + re.escape(key) + r'\b', title_lower):
                return cat
            if key in title_lower:
                return cat
                
    return "Unknown"

# --- 2. EXECUTE ON DATAFRAME SUBSET ---

# Create a mask for rows where category is missing
missing_mask = df_recipes['category'].isnull()

print(f"Rows with missing category before processing: {missing_mask.sum()}")

if missing_mask.sum() > 0:
    # 1. Get the original titles for these rows
    target_titles = df_recipes.loc[missing_mask, 'title_original']
    
    # 2. Apply cleaning
    cleaned_titles = target_titles.apply(clean_title)
    
    # 3. Apply mapping (Now safe against floats/NaNs)
    new_categories = cleaned_titles.apply(map_category)
    
    # 4. Update the main dataframe
    df_recipes.loc[missing_mask, 'category'] = new_categories

    print("✅ Processing complete.")
    
    # Show results
    result_mask = (missing_mask) & (df_recipes['category'] != "Unknown")
    print(f"Successfully mapped: {result_mask.sum()}")
    print(f"Still Unknown: {(df_recipes.loc[missing_mask, 'category'] == 'Unknown').sum()}")
    
    print("\n--- Sample of newly mapped rows ---")
    print(df_recipes.loc[missing_mask].head()[['title_original', 'category']])
else:
    print("No missing categories found!")

Rows with missing category before processing: 235
✅ Processing complete.
Successfully mapped: 124
Still Unknown: 111

--- Sample of newly mapped rows ---
                  title_original                   category
24       [Implied: Nasi Djagung]                    Unknown
26                   NASI KEBULI              MAKANAN UTAMA
38           SAGU KELAPA (Ambon)              MAKANAN UTAMA
45    TORTILLA\n(a la Indonesia)              MAKANAN UTAMA
49  ASEM ASEM TAHU (Pasarminggu)  LAUK PAUK BASAH - BERKUAH


In [None]:
# Inspect Ingredients
display(df_ingredients.head(5))
print(df_ingredients.info())

Unnamed: 0,id,recipe_id,ingredient_group,ingredient_original_name,ingredient_normalized_name,ingredient_quantity,ingredient_unit
0,ING_000001,MR_187_01,utama,djagung putih pipilan,jagung putih pipilan,,
1,ING_000002,MR_187_02,kulit,beras,beras,1.0,liter
2,ING_000003,MR_187_02,kulit,garam,garam,1.0,sendok makan
3,ING_000004,MR_187_02,kulit,daun pisang batu,daun pisang batu,2.0,pelepah
4,ING_000005,MR_187_02,kulit,biting,biting,,secukupnya


<class 'pandas.DataFrame'>
RangeIndex: 13818 entries, 0 to 13817
Data columns (total 7 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   id                          13818 non-null  str   
 1   recipe_id                   13818 non-null  str   
 2   ingredient_group            13818 non-null  str   
 3   ingredient_original_name    13818 non-null  str   
 4   ingredient_normalized_name  13818 non-null  str   
 5   ingredient_quantity         13189 non-null  object
 6   ingredient_unit             13608 non-null  str   
dtypes: object(1), str(6)
memory usage: 755.8+ KB
None


## 3. Data Cleaning Section
Use this section to fix common OCR issues:
1. `None` values in titles.
2. Weird characters in units (e.g., "lt." vs "liter").
3. Standardize regions.

In [None]:
# Example: Check for missing Normalized Titles
missing_titles = df_recipes[df_recipes['title_normalized'].isnull()]
print(f"Recipes with missing titles: {len(missing_titles)}")
missing_titles.head()

Recipes with missing titles: 0


Unnamed: 0,id,title_original,title_normalized,source_page,region,ai_category,ingredient_json,instruction,region_clean,province_group,category,final_category


In [None]:
# Example: Normalize Units (Basic clean)
print("Top 20 Unique Units:")
print(df_ingredients['ingredient_unit'].value_counts().head(100))

# TODO: Add replacement logic here
# df_ingredients['ingredient_unit'] = df_ingredients['ingredient_unit'].replace({...})

Top 20 Unique Units:
ingredient_unit
sendok makan            1776
buah                    1749
sendok teh              1434
butir                   1109
kilogram                1045
                        ... 
rantang                    1
tunas                      1
sampai ikan terendam       1
juring                     1
kalo                       1
Name: count, Length: 100, dtype: int64


## 4. Export to CSV
Save the raw tables to disk before deep cleaning.

In [None]:
df_recipes.to_csv("df_recipes.csv", index=False)
df_ingredients.to_csv("df_ingredient_recipes.csv", index=False)