In [None]:
# !pip install requests==2.32.5

In [37]:
import json
import requests

## Intro

In this notebook I will showcase base model as well as finetuned model for domain generation using various prompts.

The structure of this notebook is:
- define prompts
- define generation functions
- load some sample test data
- experiment with domain generation
- conclusion and final observations

## Prompts

A combination of prompts will be used to generate domain names:
- basic prompt
- fewshot prompt with examples
- seed words prompt - accepts additional keywords to guide generation
- no instruction, just business description (for finetuned model)

In [74]:
GENERATE_DOMAIN_PROMPT_BASE = """You work in a web domain creation service. Generate a fitting domain name based on business description

Output instructions:
- output in the language of the input
- output just the domain name and nothing else

Business description:
{business_description}

Domain name:
"""

In [78]:
GENERATE_DOMAIN_PROMPT_SEED_WORDS = """You work in a web domain creation service. Generate a fitting domain name based on business description and a list of 1 or more seed words. Try to match the intention or emotion behind the seed word(s) as best you can.

Output instructions:
- output in the language of the input
- output just the domain name and nothing else

Business description:
{business_description}

Seed words:
{seed_words}

Domain name:
"""

In [79]:
GENERATE_DOMAIN_PROMPT_FEWSHOT = """You work in a web domain creation service. Generate a fitting domain name based on business description. Answer in the language of the input. Output just the domain name.

Output instructions:
- output in the language of the input
- output just the domain name and nothing else

Example business descriptions to domain names:
Business description:
Global marketing-language agency specializing in multilingual brand messaging, marketing translation, transcreation, and cultural adaptation to localize campaigns and communications across international markets.
Domain name:
wordbank.com

Business description:
Nuo 1947 metų padedame jums puoselėti sodus, daržus ir namų augalus. Esame ne tik didžiausia sėklų ir sodo prekių tiekėja Baltijos šalyse, bet ir bendruomenė, kuri dalinasi meile gamtai bei auginimo džiaugsmu.
Domain name:
zaliastotele.lt

Business description:
Tai moderni odontologijos klinika su dantų technikų laboratorija. Naujausių skaitmeninių technologijų dėka klinikoje siūlomos tokios unikalios procedūros kaip "Implantacija be pjūvio", "Protezavimas per valandą", "Pasimatuok šypseną", kurios padeda išgauti aukščiausios kokybės paslaugas per maksimaliai trumpą laiką.
Domain name:
skaitmeninessypsenos.lt

Business description:
{business_description}

Domain name:
"""

## Inputs

I collected some input data of real lithuanian companies. I split business descriptions in both english and lithuanian to see how model handles both.

`business_description_en` / `business_description_lt` fields use descriptions that were retrieved using AI web search

`alternate_description_lt` are shorter and simpler descriptions that I copied from companies websites 

We will input different descriptions using different prompts and see what we get

In [10]:
with open('test_data.json') as f:
    test_data = json.load(f)

In [11]:
test_data[0]

{'domain_name': 'ekomedis.lt',
 'business_description_en': 'Manufacturer and builder of wooden structures — prefabricated log and frame houses, garden cottages, saunas and outdoor wooden products, plus installation and construction services.',
 'business_description_lt': 'Gamina ir stato karkasinius bei rąstinius namus, pirtis, sodo namelius ir pavėsines; teikia statybos, montavimo ir individualių medinių konstrukcijų paslaugas.',
 'alternate_description_lt': 'Karkasinių, skydinių, mūrininių namų statybos valdymas nuo projektavimo iki pilno įrengimo Kaune.'}

### Helper functions

In [None]:
# FastAPI endpoints
GENERATE_ENDPOINT = 'http://localhost:8001/generate'
LIST_MODELS_ENDPOINT = 'http://localhost:8001/list-models'

In [106]:
seed_words = ['inexpensive, practical', 'premium, expert', 'animal, aggresive']

def _generate(prompt, model_id):
    response = requests.post(GENERATE_ENDPOINT, json={'prompt': prompt, 'model_id': model_id})
    return response.text

def _generate_every_prompt(description, model_id):
    results = {}
    
    result = _generate(GENERATE_DOMAIN_PROMPT_BASE.format(business_description=description), model_id=model_id)
    results['base'] = result
    
    result = _generate(GENERATE_DOMAIN_PROMPT_FEWSHOT.format(business_description=description), model_id=model_id)
    results['fewshot'] = result

    
    for i, seed_word_pair in enumerate(seed_words):
        result = _generate(GENERATE_DOMAIN_PROMPT_SEED_WORDS.format(business_description=description, seed_words=seed_word_pair), model_id=model_id)
        results[f'seed_words_{i}'] = result

    return results

def _print_generation_results(results):
    print(f'{"base:":<50} {results["base"]}')
    print(f'{"fewshot:":<50} {results["fewshot"]}')
    for i in range(len(seed_words)):
        prefix = f'seed words ({seed_words[i]}):'
        print(f'{prefix:<50} {results[f"seed_words_{i}"]}')  

In [107]:
def generate(domain, description, model_id):
    print(f'domain: {domain}')
    print(f'desc: {description[:50]}...')
    print('-'*25)
    _print_generation_results(_generate_every_prompt(description, model_id))

## Available models

**We have 2 models: base (llama 3.1 8b), as well as finetuned with lora adapter**

In [18]:
requests.get(LIST_MODELS_ENDPOINT).json()

['unsloth/Llama-3.1-8B-Instruct', 'domain_lora']

## Generation results

**Base (non-finetuned) model - english descriptions**

In [108]:
generate(test_data[0]['domain_name'], test_data[0]['business_description_en'], 'unsloth/Llama-3.1-8B-Instruct')

domain: ekomedis.lt
desc: Manufacturer and builder of wooden structures — pr...
-------------------------
base:                                              "timbercraft.co"
fewshot:                                           "mediena.lt"
seed words (inexpensive, practical):               "woodwisehomes.com"
seed words (premium, expert):                      "timbercraft.pro"
seed words (animal, aggresive):                    "fiercelodge.com"


In [116]:
generate(test_data[1]['domain_name'], test_data[1]['business_description_en'], 'unsloth/Llama-3.1-8B-Instruct')

domain: manobustas.lt
desc: Property management firm specializing in multi‑apa...
-------------------------
base:                                              "ApartmentGuardians.com"
fewshot:                                           "Apartmentcare.co"
seed words (inexpensive, practical):               "propertyguardian.com"
seed words (premium, expert):                      "premiumpropertypros.com"
seed words (animal, aggresive):                    "defendproperty.com"


**Base (non-finetuned) model - lithuanian descriptions**

In [117]:
generate(test_data[2]['domain_name'], test_data[2]['alternate_description_lt'], 'unsloth/Llama-3.1-8B-Instruct')

domain: autoaibe.lt
desc: Prekyba automobilių dalimis, akumuliatoriais, alyv...
-------------------------
base:                                              "autoremontas.lt"
fewshot:                                           "Prekyba automobilių dalimis, akumuliatoriais, alyvom ir tepalais, autochemija, automobilių priežiūros priemonėmis, aksesuarais ir t.t\nautotechnika.lt"
seed words (inexpensive, practical):               "autochemija.lt"
seed words (premium, expert):                      "premiumautoekspertas.lt"
seed words (animal, aggresive):                    "autoagresorius.lt"


In [118]:
generate(test_data[3]['domain_name'], test_data[3]['alternate_description_lt'], 'unsloth/Llama-3.1-8B-Instruct')

domain: veloklinika.lt
desc: Dviračių parduotuvė ir servisas miesto centre! Orb...
-------------------------
base:                                              "dviračiupaslaugos.lt"
fewshot:                                           "Dviračių parduotuvė ir servisas miesto centre! \ndviratiscentras.lt"
seed words (inexpensive, practical):               "dviračiupaslaugos.lt"
seed words (premium, expert):                      "dviračiųpremijus.lt"
seed words (animal, aggresive):                    "dviračiųbėdros.lt"


**Finetuned model - english descriptions**

In [127]:
generate(test_data[0]['domain_name'], test_data[0]['business_description_en'], 'domain_lora')

domain: ekomedis.lt
desc: Manufacturer and builder of wooden structures — pr...
-------------------------
base:                                              "kotinrakennukset.fi"
fewshot:                                           "woodenhousebuilder.com"
seed words (inexpensive, practical):               "woodenhomebuilds.com"
seed words (premium, expert):                      "woodenstructures.co"
seed words (animal, aggresive):                    "wolfswoodhouses.com"


In [128]:
generate(test_data[1]['domain_name'], test_data[1]['business_description_en'], 'domain_lora')

domain: manobustas.lt
desc: Property management firm specializing in multi‑apa...
-------------------------
base:                                              "ApexLivingSpaces.com"
fewshot:                                           "Apartmentfocus.ca"
seed words (inexpensive, practical):               "affordableapartmentadmin.com"
seed words (premium, expert):                      "expertpropertypros.com"
seed words (animal, aggresive):                    "apexpropertypro.com"


**Finetuned model - lithuanian descriptions**

In [129]:
generate(test_data[2]['domain_name'], test_data[2]['alternate_description_lt'], 'domain_lora')

domain: autoaibe.lt
desc: Prekyba automobilių dalimis, akumuliatoriais, alyv...
-------------------------
base:                                              "autochemija.lt"
fewshot:                                           "automobiliai.lt"
seed words (inexpensive, practical):               "auksoauto.lt"
seed words (premium, expert):                      "autochemijaexpertai.lt"
seed words (animal, aggresive):                    "motauto.lt"


In [130]:
generate(test_data[3]['domain_name'], test_data[3]['alternate_description_lt'], 'domain_lora')

domain: veloklinika.lt
desc: Dviračių parduotuvė ir servisas miesto centre! Orb...
-------------------------
base:                                              "dviraciai.lt"
fewshot:                                           "Dviratispaslaugos.lt"
seed words (inexpensive, practical):               "dviračiupaslaugos.lt"
seed words (premium, expert):                      "dviračiaiexpertai.lt"
seed words (animal, aggresive):                    "dviračiai.lt"


**Finetuned model - english descriptions - no instruction**

testing the model's capacity to generate a domain without telling it what to do

In [123]:
for item in test_data:
    print(f'{"domain":<15}', item['domain_name'])
    print(f'{"generated":<15}', _generate(item['business_description_en'], 'domain_lora'))

domain          ekomedis.lt
generated       "mysticloghomes.com"
domain          manobustas.lt
generated       "propertymanagement.ca"
domain          autoaibe.lt
generated       "autoakcijos.lt"
domain          veloklinika.lt
generated       "vilkabiciu.com"
domain          fabijoniskiubaseinas.lt
generated       "fabijoniskes.lt"


**Finetuned model - lithuanian descriptions - no instruction**

In [126]:
for item in test_data:
    print(f'{"domain":<15}', item['domain_name'])
    print(f'{"generated":<15}', _generate(item['alternate_description_lt'], 'domain_lora'))

domain          ekomedis.lt
generated       "karkasiniu-namai.lt"
domain          manobustas.lt
generated       "realizacija.lt"
domain          autoaibe.lt
generated       "autodalis.lt"
domain          veloklinika.lt
generated       "velobike.lt"
domain          fabijoniskiubaseinas.lt
generated       "vilniusplaukimas.lt"


### Conclusion

- Output format can be inconsistent in base model (extra text generated, spaces in domain). This behaviour is better in finetuned model
- Finetuned model learned to output domain names just given the business description
- Outputs can often look similar or repeat after running same input many times (with default temperature of 0.7). To get more original or unique domains, a better method is needed, like giving some word as a "base" to generate from. I tried similar approach with seed words, it can help slightly
- Finetuned model seems to sometimes generate domains that are formatted well but do not make sense semantically or are of different language
- Overall quality of the domain names is pretty similar between differents prompts or models