# 🍷 Wine & Food NLP with spaCy

This project builds two lightweight NLP models using [spaCy](https://spacy.io) for handling natural language queries related to wine and food. It supports:

- **Intent Classification**: Distinguish between wine or food recommendation queries.
- **Named Entity Recognition (NER)**: Extract key details like wine name, price, and tasting descriptions.

---
<br>

### Text Classification Model (Intent Detection)
Example Inputs:
- “Recommend a red wine under 300 HKD that pairs with grilled lamb.” → recommend_wine
- “What should I cook for dinner to go with a chilled bottle of Sancerre?” → recommend_food


Output Example
- {'recommend_wine': 0.01, 'recommend_food': 0.99}
<br>

---



In [2]:
import spacy
from spacy.training.example import Example

# prepare training data with annotations
TRAIN_DATA = [
    ("What wine goes well with spicy Thai green curry?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Suggest a red wine under 300 HKD that pairs with grilled lamb.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("I have a bottle of Amarone — what foods would pair well with it?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("What should I cook for dinner to go with a chilled bottle of Sancerre?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("Suggest a celebratory wine that works with oysters and has high acidity.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("I'm cooking mushroom risotto and want something medium-bodied and earthy to go with it.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Pair a bold Napa Cabernet Sauvignon with sushi.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("What are the best dishes to serve with a 2020 Puligny-Montrachet Chardonnay?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("Can you suggest a full-course meal to go with a vintage Champagne?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("What kind of food works well with a sweet Riesling from Mosel?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}})
]

# create blank NLP pipeline and add labels
nlp = spacy.blank("en")
textcat = nlp.add_pipe("textcat")
textcat.add_label("recommend_wine")
textcat.add_label("recommend_food")

# train
optimizer = nlp.begin_training()
for i in range(20):
    losses = {}
    for text, annotation in TRAIN_DATA:
        example = Example.from_dict(nlp.make_doc(text), annotation)
        nlp.update([example], sgd=optimizer, losses=losses)
    print(f"Epoch {i+1}, Losses: {losses}")

# save text categorization model
nlp.to_disk("textcat_model")

Epoch 1, Losses: {'textcat': 2.526319146156311}
Epoch 2, Losses: {'textcat': 1.409869372844696}
Epoch 3, Losses: {'textcat': 0.19619316095486283}
Epoch 4, Losses: {'textcat': 0.0028260856815904845}
Epoch 5, Losses: {'textcat': 4.436661708950851e-05}
Epoch 6, Losses: {'textcat': 6.002350332323658e-06}
Epoch 7, Losses: {'textcat': 2.4403353080515444e-06}
Epoch 8, Losses: {'textcat': 1.5463688960437594e-06}
Epoch 9, Losses: {'textcat': 1.172892222456312e-06}
Epoch 10, Losses: {'textcat': 9.58459636990483e-07}
Epoch 11, Losses: {'textcat': 8.091244829699917e-07}
Epoch 12, Losses: {'textcat': 6.934848819639683e-07}
Epoch 13, Losses: {'textcat': 5.992860696579783e-07}
Epoch 14, Losses: {'textcat': 5.209723639154618e-07}
Epoch 15, Losses: {'textcat': 4.551069885394554e-07}
Epoch 16, Losses: {'textcat': 3.992590853485467e-07}
Epoch 17, Losses: {'textcat': 3.516552560256514e-07}
Epoch 18, Losses: {'textcat': 3.1099377117982385e-07}
Epoch 19, Losses: {'textcat': 2.76002538157627e-07}
Epoch 20, L

<br>

To use the trained model, load it and pass a query.

In [3]:
nlp_textcat = spacy.load("textcat_model")

query = "What should I cook to go with my Champagne?" # test a query out!
doc = nlp_textcat(query)
print(doc.cats)

{'recommend_wine': 0.0015047071501612663, 'recommend_food': 0.99849534034729}


<br>

### Retraining
Next we would want to retrain the model further without startin from scratch.We will also implement a version control to track each trained version so that we have a back-up in case something goes wrong with the current training batch.

In [4]:
NEW_TRAIN_DATA = [
    # recommend_wine intents
    ("What bottle of red would go best with barbecued pork ribs?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("I'm hosting a seafood dinner — any wine suggestions?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Which wine should I pick for a spicy Sichuan hot pot?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Any affordable white wines that go well with grilled chicken?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Looking for a bold red to serve with steak — thoughts?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Can you match a wine to eggplant parmesan?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Recommend a crisp wine under 200 HKD for sushi night.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Need a wine pairing for my lasagna tonight.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("Find me a sparkling wine that works with fried chicken.", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),
    ("What's a good wine for Korean BBQ?", {"cats": {"recommend_wine": 1.0, "recommend_food": 0.0}}),

    # recommend_food intents
    ("I just opened a Gewürztraminer — what food should I serve?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("Suggest dishes that pair with dry Riesling.", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("What kind of appetizers would go well with Lambrusco?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("Looking for dessert ideas to complement a late harvest wine.", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("Give me some food ideas to match with chilled rosé.", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("What can I cook that complements a bottle of Chablis?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("What recipes go well with a rustic Pinot Noir?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("I'm planning dinner around a Bordeaux — suggestions?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("What sides pair well with vintage port?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}}),
    ("Any good meals to make with an oaky Chardonnay in mind?", {"cats": {"recommend_wine": 0.0, "recommend_food": 1.0}})
]

In [7]:
import os
import shutil
import datetime

path_textcat = "textcat_model" # path to text classification model

if os.path.exists(path_textcat):
    print("Loading existing model ... ")
    nlp = spacy.load(path_textcat)

    textcat = nlp.get_pipe("textcat")
    optimizer = nlp.resume_training()

    for i in range(10): 
        losses = {}
        for text, annotation in NEW_TRAIN_DATA:
            example = Example.from_dict(nlp.make_doc(text), annotation)
            nlp.update([example], sgd = optimizer, losses = losses)
        print(f"Epoch {i+1}, Losses: {losses}")

    nlp.to_disk(path_textcat)
    print("model saved")

    # backup model
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    backup_dir = f"textcat_model_backup_{timestamp}"
    shutil.copytree("textcat_model", backup_dir)
    print(f"backup saved at {backup_dir}")

else:
    print("model not found")

Loading existing model ... 
Epoch 1, Losses: {'textcat': 11.89623646778827}
Epoch 2, Losses: {'textcat': 0.4476291006139945}
Epoch 3, Losses: {'textcat': 0.057997701739168406}
Epoch 4, Losses: {'textcat': 4.821435435786725e-05}
Epoch 5, Losses: {'textcat': 1.6220896863927692e-05}
Epoch 6, Losses: {'textcat': 8.728327720142204e-06}
Epoch 7, Losses: {'textcat': 5.55360944343164e-06}
Epoch 8, Losses: {'textcat': 3.815379679750208e-06}
Epoch 9, Losses: {'textcat': 2.7234868316128313e-06}
Epoch 10, Losses: {'textcat': 2.0169729149088766e-06}
model saved
backup saved at textcat_model_backup_20250711_155821


In [8]:
# Code for retrieving backup:
# shutil.rmtree("textcat_model")
# shutil.copytree("textcat_model_backup_20250711_150102", "textcat_model")


<br>

### Named Entity Recognition (NER) Model
Extracted Fields:
- wine_name: “Silver Oak 2019”
- price: “$120”
- description_phrase: “vanilla and cherries”

Output Example
- wine_name : 2020 Opus One
- price : $299
- description_phrase : chocolate, spice, and tobacco

---

**Quick Notes**: 
- A problem you will likely encounter when training a NER with multiple labels is the quality of annotations
- Annotation indices (i.e., start index and end index) should match with the start and end indices of the token/s
- For example, "Try Château Margaux for $350." "Château Margaux" has a valid token span from start=4 to end=19. An error would occur if you try to index end=18.
- Remember this when generating training data using LLMs, as they are prone to inaccurate indexing. Use Prodigy to help label your training data

In [9]:
import spacy
from spacy.training.example import Example
from spacy.tokens import DocBin

nlp = spacy.blank("en")

In [36]:
import spacy
import re

# function for checking misaligned indices
def fix_misaligned_entities(train_data) -> None:
    nlp = spacy.blank("en")
    fixed_data = []

    for i, (text, annots) in enumerate(train_data):
        doc = nlp(text)
        new_entities = []
        errors_found = False

        print(f"\n🔎 Sample {i+1}: {text}")
        for start, end, label in annots.get("entities", []):
            span = doc.char_span(start, end, label=label, alignment_mode="strict")
            if span is None:
                print(f"❌ Misaligned span: '{text[start:end]}' -> ({start}, {end}) for label {label}")
                print(f"Text: {text}")
                print(f"Expected label: {label}")
                user_input = input("📝 Enter the correct entity value as it appears in the text: ").strip()

                # Try to find the exact match in the text
                match = re.search(re.escape(user_input), text)
                if match:
                    corrected_start = match.start()
                    corrected_end = match.end()
                    print(f"✅ Matched '{match.group()}' at ({corrected_start}, {corrected_end})")
                    new_entities.append((corrected_start, corrected_end, label))
                else:
                    print("⚠️ Entity value not found in text. Skipping this entity.")
                    errors_found = True
            else:
                print(f"✅ Valid span: {span.text} ({span.start_char}, {span.end_char})")
                new_entities.append((span.start_char, span.end_char, label))

        fixed_data.append((text, {"entities": new_entities}))

    return None


def extract_wine_query_fields(doc):
    fields = {
        "wine_name": None,
        "price": None,
        "description_phrase": None,
        "pairing_item": [],
        "region": None,
        "occasion": None
    }

    for ent in doc.ents:
        print(f"Detected: {ent.text} ({ent.label_})")
        if ent.label_ == "pairing_item":
            fields["pairing_item"].append(ent.text)
        elif ent.label_ in fields:
            fields[ent.label_] = ent.text

    if not fields["pairing_item"]:
        fields["pairing_item"] = None

    return fields

In [33]:
# Training data for NER
NER_DATA = [
    ("Can you recommend a red wine under $40 for a dinner party?",
     {"entities": [
         (20, 23, "wine_type"),          # "red"
         (36, 38, "max_price")           # "$40"
     ]}),

    ("I need a white wine priced above 20 USD for seafood.",
     {"entities": [
         (9, 14, "wine_type"),          # "white"
         (33, 35, "min_price")           # "20 USD"
     ]}),

    ("Looking for Champagne or Prosecco to celebrate a birthday.",
     {"entities": [
         (12, 21, "wine_name"),          # "Champagne"
         (25, 33, "wine_name")           # "Prosecco"
     ]}),

    ("Suggest a wine under $25 that goes well with grilled chicken.",
     {"entities": [
         (22, 24, "max_price")           # "$25"
     ]}),

    ("Is there a full-bodied red under 100 HKD?",
     {"entities": [
         (23, 26, "wine_type"),          # "red"
         (33, 36, "max_price")           # "100 HKD"
     ]}),

    ("I’m in the mood for an oaked Chardonnay.",
     {"entities": [
         (29, 39, "wine_name")           # "oaked Chardonnay"
     ]}),

    ("Any bold Syrah or Malbec options for under $60?",
     {"entities": [
         (9, 14, "wine_name"),           # "Syrah"
         (18, 24, "wine_name"),          # "Malbec"
         (44, 46, "max_price")           # "$60"
     ]}),

    ("Looking for a wine that costs at least 30 USD, preferably red.",
     {"entities": [
         (39, 41, "min_price"),          # "30 USD"
         (58, 61, "wine_type")           # "red"
     ]}),

    ("I’d like a sparkling wine around 50 dollars.",
     {"entities": [
         (11, 20, "wine_type"),          # "sparkling wine"
         (33, 35, "max_price")           # "50 dollars"
     ]}),

    ("Do you have a Pinot Noir or Merlot from California?",
     {"entities": [
         (14, 24, "wine_name"),          # "Pinot Noir"
         (28, 34, "wine_name")           # "Merlot"
     ]}),
]
    



In [37]:
fix_misaligned_entities(NER_DATA)


🔎 Sample 1: Can you recommend a red wine under $40 for a dinner party?
✅ Valid span: red (20, 23)
✅ Valid span: 40 (36, 38)

🔎 Sample 2: I need a white wine priced above 20 USD for seafood.
✅ Valid span: white (9, 14)
✅ Valid span: 20 (33, 35)

🔎 Sample 3: Looking for Champagne or Prosecco to celebrate a birthday.
✅ Valid span: Champagne (12, 21)
✅ Valid span: Prosecco (25, 33)

🔎 Sample 4: Suggest a wine under $25 that goes well with grilled chicken.
✅ Valid span: 25 (22, 24)

🔎 Sample 5: Is there a full-bodied red under 100 HKD?
✅ Valid span: red (23, 26)
✅ Valid span: 100 (33, 36)

🔎 Sample 6: I’m in the mood for an oaked Chardonnay.
✅ Valid span: Chardonnay (29, 39)

🔎 Sample 7: Any bold Syrah or Malbec options for under $60?
✅ Valid span: Syrah (9, 14)
✅ Valid span: Malbec (18, 24)
✅ Valid span: 60 (44, 46)

🔎 Sample 8: Looking for a wine that costs at least 30 USD, preferably red.
✅ Valid span: 30 (39, 41)
✅ Valid span: red (58, 61)

🔎 Sample 9: I’d like a sparkling wine around 

In [14]:
labels = ["wine_name", "min_price", "max_price", "wine_type"]

# create blank NER pipeline and add labels
nlp_ner = spacy.blank("en")
ner = nlp_ner.add_pipe("ner")

for label in labels:
    ner.add_label(label)

# train model
optimizer = nlp_ner.begin_training()
for i in range(30):
    losses = {}
    for text, annotations in NER_DATA:
        example = Example.from_dict(nlp_ner.make_doc(text), annotations)
        nlp_ner.update([example], drop=0.1, losses=losses)
    print(f"Epoch {i+1}, losses: {losses}")

nlp_ner.to_disk("ner_model")

Epoch 1, losses: {'ner': np.float32(95.93794)}
Epoch 2, losses: {'ner': np.float32(37.91222)}
Epoch 3, losses: {'ner': np.float32(26.550133)}
Epoch 4, losses: {'ner': np.float32(15.288527)}
Epoch 5, losses: {'ner': np.float32(8.584821)}
Epoch 6, losses: {'ner': np.float32(2.083242)}
Epoch 7, losses: {'ner': np.float32(1.3231918)}
Epoch 8, losses: {'ner': np.float32(10.869421)}
Epoch 9, losses: {'ner': np.float32(2.0257463)}
Epoch 10, losses: {'ner': np.float32(1.1665642)}
Epoch 11, losses: {'ner': np.float32(0.1765414)}
Epoch 12, losses: {'ner': np.float32(0.0050940295)}
Epoch 13, losses: {'ner': np.float32(9.595718e-05)}
Epoch 14, losses: {'ner': np.float32(2.6863596e-05)}
Epoch 15, losses: {'ner': np.float32(1.0165762e-05)}
Epoch 16, losses: {'ner': np.float32(1.7917098e-05)}
Epoch 17, losses: {'ner': np.float32(2.601948e-06)}
Epoch 18, losses: {'ner': np.float32(5.1935003e-06)}
Epoch 19, losses: {'ner': np.float32(3.6625895e-06)}
Epoch 20, losses: {'ner': np.float32(1.1618387e-06)}


### Retraining


In [41]:
NEW_NER_TRAIN_DATA = [

    ("I would like to try the Alain Chavy Puligny Montrachet 2020, a white wine.",
     {"entities": [(24, 59, "wine_name"), (63, 68, "wine_type")]}),  # "Alain Chavy Puligny Montrachet 2020", "white"

    ("Do you have any sparkling wines like Angelina Brut Reserve Sparkling Wine 2012?",
     {"entities": [(37, 78, "wine_name"), (16, 25, "wine_type")]}),  # "Angelina Brut Reserve Sparkling Wine 2012", "sparkling"

    ("Can you recommend a red wine such as Badia a Coltibuono Chianti Classico Riserva 2019?",
     {"entities": [(37, 85, "wine_name"), (20, 23, "wine_type")]}),  # "Badia a Coltibuono Chianti Classico Riserva 2019", "red"

    ("I'm interested in trying the Andre Beaufort a Polisy Brut Reserve, Champagne NV.",
     {"entities": [(29, 79, "wine_name")]}),  # "Andre Beaufort a Polisy Brut Reserve, Champagne NV"

    ("Looking for a Rosé wine from Domaine Leflaive Macon Ige 2022.",
     {"entities": [(14, 18, "wine_type"), (29, 60, "wine_name")]}),  # "Rosé", "Domaine Leflaive Macon Ige 2022"

    ("Is the Bollinger PN AYC18 2018 a red or sparkling wine?",
     {"entities": [(7, 30, "wine_name"), (33, 36, "wine_type"), (40, 49, "wine_type")]}),  # "Bollinger PN AYC18 2018", "red", "sparkling"

    ("I want to buy a fortified wine like Cockburns 20 Years Old Tawny Port N.V.",
     {"entities": [(16, 25, "wine_type"), (36, 74, "wine_name")]}),  # "fortified", "Cockburns 20 Years Old Tawny Port N.V."

    ("Suggest a white wine such as Domaine Jean Fery Bourgogne Blanc 2021.",
     {"entities": [(10, 15, "wine_type"), (29, 67, "wine_name")]}),  # "white", "Domaine Jean Fery Bourgogne Blanc 2021"

    ("Do you have any Rosé options like Domaine Patrice Rion Les Charmes, Chambolle-Musigny 1er Cru, 2014?",
     {"entities": [(16, 20, "wine_type"), (34, 99, "wine_name")]}),  # "Rosé", "Domaine Patrice Rion Les Charmes, Chambolle-Musigny 1er Cru, 2014"

    ("I'm looking for a sparkling wine similar to Champagne Laurent Lequart Reserve, Extra Brut.",
     {"entities": [(18, 27, "wine_type"), (44, 89, "wine_name")]}),  # "sparkling", "Champagne Laurent Lequart Reserve, Extra Brut"

    ("Can you find a red wine like Chateau Haut-Batailley 1989?",
     {"entities": [(15, 18, "wine_type"), (29, 56, "wine_name")]}),  # "red", "Chateau Haut-Batailley 1989"

    ("I want to taste the Domaine Jean Pascal et Fils Puligny-Montrachet Rouge 2022, a red wine.",
     {"entities": [(20, 77, "wine_name"), (81, 84, "wine_type")]}),  # "Domaine Jean Pascal et Fils Puligny-Montrachet Rouge 2022", "red"

    ("Are there any white wines from Grgich Hills Chardonnay 2020?",
     {"entities": [(14, 19, "wine_type"), (31, 59, "wine_name")]}),  # "white", "Grgich Hills Chardonnay 2020"

    ("Looking for a fortified wine like Kopke Vintage Port 2002.",
     {"entities": [(14, 23, "wine_type"), (34, 57, "wine_name")]}),  # "fortified", "Kopke Vintage Port 2002"

    ("I’m interested in the Rosé wine La Petite Odyssee Petillant Naturel Rose 2020.",
     {"entities": [(22, 26, "wine_type"), (32, 77, "wine_name")]}),  # "Rosé", "La Petite Odyssee Petillant Naturel Rose 2020"

    ("Can you recommend a sparkling wine such as Dom Perignon Vintage Luminous 2012?",
     {"entities": [(20, 29, "wine_type"), (43, 77, "wine_name")]}),  # "sparkling", "Dom Perignon Vintage Luminous 2012"

    ("I want to try the Domaine Derey Freres Marsannay Blanc 2022, a white wine.",
     {"entities": [(18, 59, "wine_name"), (63, 68, "wine_type")]}),  # "Domaine Derey Freres Marsannay Blanc 2022", "white"

    ("Do you have any red wines like Domaine Jean Fery Vosne-Romanée 2021?",
     {"entities": [(16, 19, "wine_type"), (31, 67, "wine_name")]}),  # "red", "Domaine Jean Fery Vosne-Romanée 2021"

    ("Suggest a white wine such as Domaine Leflaive Pouilly Fuissé 2021.",
     {"entities": [(10, 15, "wine_type"), (29, 65, "wine_name")]}),  # "white", "Domaine Leflaive Pouilly Fuissé 2021"

    ("Looking for a sparkling wine like Champagne Egly Ouriet Les Premices Brut NV.",
     {"entities": [(14, 23, "wine_type"), (34, 76, "wine_name")]}),  # "sparkling", "Champagne Egly Ouriet Les Premices Brut NV"
]


In [42]:
fix_misaligned_entities(NEW_NER_TRAIN_DATA)


🔎 Sample 1: I would like to try the Alain Chavy Puligny Montrachet 2020, a white wine.
✅ Valid span: Alain Chavy Puligny Montrachet 2020 (24, 59)
✅ Valid span: white (63, 68)

🔎 Sample 2: Do you have any sparkling wines like Angelina Brut Reserve Sparkling Wine 2012?
✅ Valid span: Angelina Brut Reserve Sparkling Wine 2012 (37, 78)
✅ Valid span: sparkling (16, 25)

🔎 Sample 3: Can you recommend a red wine such as Badia a Coltibuono Chianti Classico Riserva 2019?
✅ Valid span: Badia a Coltibuono Chianti Classico Riserva 2019 (37, 85)
✅ Valid span: red (20, 23)

🔎 Sample 4: I'm interested in trying the Andre Beaufort a Polisy Brut Reserve, Champagne NV.
✅ Valid span: Andre Beaufort a Polisy Brut Reserve, Champagne NV (29, 79)

🔎 Sample 5: Looking for a Rosé wine from Domaine Leflaive Macon Ige 2022.
✅ Valid span: Rosé (14, 18)
✅ Valid span: Domaine Leflaive Macon Ige 2022 (29, 60)

🔎 Sample 6: Is the Bollinger PN AYC18 2018 a red or sparkling wine?
✅ Valid span: Bollinger PN AYC18 2018 (

In [43]:
import os 
import shutil 
import datetime

path_ner = "ner_model" # path to text classification model

if os.path.exists(path_ner):
    print("Loading existing ner model ... ")
    nlp = spacy.load(path_ner)

    ner = nlp.get_pipe("ner")
    optimizer = nlp.resume_training()

    for i in range(10): 
        losses = {}
        for text, annotation in NEW_NER_TRAIN_DATA:
            example = Example.from_dict(nlp.make_doc(text), annotation)
            nlp.update([example], sgd = optimizer, losses = losses)
        print(f"Epoch {i+1}, Losses: {losses}")

    nlp.to_disk(path_ner)
    print("model saved")

    # backup model
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    backup_dir = f"ner_backup/ner_model_backup_{timestamp}"
    shutil.copytree("ner_model", backup_dir)
    print(f"backup saved at {backup_dir}")

else:
    print("model not found")


Loading existing ner model ... 
Epoch 1, Losses: {'ner': np.float32(56.504868)}
Epoch 2, Losses: {'ner': np.float32(65.8593)}
Epoch 3, Losses: {'ner': np.float32(12.761689)}
Epoch 4, Losses: {'ner': np.float32(0.6523823)}
Epoch 5, Losses: {'ner': np.float32(1.0217045e-05)}
Epoch 6, Losses: {'ner': np.float32(1.4043627e-06)}
Epoch 7, Losses: {'ner': np.float32(3.3545675e-06)}
Epoch 8, Losses: {'ner': np.float32(7.0549225e-07)}
Epoch 9, Losses: {'ner': np.float32(3.0891246e-07)}
Epoch 10, Losses: {'ner': np.float32(2.2962588e-07)}
model saved
backup saved at ner_backup/ner_model_backup_20250711_170825


In [44]:
text = "I’d like a sparkling wine around 50 dollars."
doc = nlp_ner(text)

print([(ent.text, ent.label_) for ent in doc.ents])


[('sparkling', 'wine_type'), ('50', 'max_price')]


### Current Problems
- issues with alignment of labelling, which leads to improper learning -> understand problem
- lack of training samples for each field -> automate training data generation using spacy project
- consider possible problem that each combination of field is registered as unique


### How to Use
- Train each model separately using Jupyter or Python script.
- Use spacy.load() to load and run predictions.
- You can optionally combine both models into a single pipeline or deploy via FastAPI.

### Next Steps
- Add more examples for robust predictions
- Combine TextCat and NER in one spaCy pipeline
- Deploy with Flask or FastAPI for real-time use
- Use Label Studio or Prodigy to label at scale