# Use AI To Create Data for Your Apps!


<img src="https://weaviate.io/assets/images/weaviate-nav-logo-dark-f1b0e1c7144039f8231759f8ed84ee2a.svg" alt="drawing" width="200"/>


In this notebook, we'll use AI to create data that we can use to populate our applications! This is great to showcase or demo what is possible with the apps you create. The data that you generate can be as realistic or as creative as you want it to be to make your application and projects more even more engaging! 

You'll learn how to...
1. Use OpenAI to generate JSON objects of your design
2. Create a **local Embedded Weaviate** instance defined in your code
3. Define a schema in Weaviate to vectorize specific properties within your dataset
4. Generate new content using **Weaviate's Generative Search module** through the power of **RAG**
5. Make image inferenrces through a **Stable Diffusion** model to create images of your JSON objects
6. Migrate your data to **Weaviate Cloud Services** to access your vectors in the cloud

Let's get started!

## Set up OpenAI API

- Ensure that you have a OpenAI API Key for the following section. 
- You can get a OpenAI API Key from [OpenAI](https://openai.com/)

In [4]:
import os
import json
import random
import openai
from IPython.display import Image
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
openai.api_key = OPENAI_API_KEY


In [2]:
luxury_amenities = [
    "Personalized Butler Service",
    "Spa and Wellness Centers",
    "Private Infinity Pools",
    "Helipad or Private Jet Access",
    "Michelin Starred Restaurants",
    "Luxury Car Service",
    "Rooftop Bars or Lounges",
    "Private Beach Access",
    "Custom Bedding and Pillow Menus",
    "State-of-the-art Technology",
    "Personal Chefs",
    "24/7 Room Service",
    "Exclusive Experiences",
    "Personal Shopping Concierge",
    "Private Cinema",
    "Temperature-controlled Wine Fridges",
    "Fully-equipped Gyms and Personal Trainers",
    "Yoga and Meditation Classes",
    "Pet Services",
    "Cultural Immersion Activities"
]

We can also specify some landmarks

In [3]:
sf_landmarks = [
    "Golden Gate Bridge",
    "Alcatraz Island",
    "Fisherman's Wharf",
    "Cable Cars",
    "Lombard Street",
    "Union Square",
    "Coit Tower",
    "Palace of Fine Arts",
    "Golden Gate Park",
    "The Painted Ladies",
    "Ghirardelli Square",
    "Chinatown",
    "San Francisco Museum of Modern Art (SFMOMA)",
    "Ferry Building",
    "The Castro",
    "Twin Peaks",
    "Pier 39",
    "Haight-Ashbury",
    "The Mission District",
    "Transamerica Pyramid"
]

Now let's have ChatGPT Generate a JSON object with fictitious details including the randomized selections from above

In [4]:
def generate_hotel():
    print("Generating a new hotel")
    
    amenity = luxury_amenities[random.randint(0, len(luxury_amenities)-1)]
    landmark = sf_landmarks[random.randint(0, len(sf_landmarks)-1)]
    print(f"--- Randomized Amenity: {amenity}")
    print(f"--- Randomized Landmark: {landmark}")
    
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "user",
                 "content": f"Generate a fictitious hotel in the form of the JSON object below. Ensure these fictitious hotels are placed throughout San Francisco with several landmarks from the city referenced in the JSON object. Name the hotel something related to the following San Francisco landmark: {landmark}.  Additionally include this amenity: {amenity} in the topAmenity object referenced in the JSON Object. Here's the JSON Object template you'll populate" + ''':                                                                
                    {
                        "name": "<string>",
                        "address": "<string>",
                        "proximityToAttractions": "<string>",
                        "neighborhoodSafety": "<string>",
                        "accessibility": "<string>",
                        "nightlyRate": "<string>",
                        "hiddenFees": "<string>"
                        "type": "<string>",
                        "size": "<string>",
                        "wifi": "<string>",
                        "breakfast": "<string>",
                        "poolAndSpa": "<string>",
                        "climateControl": "<string>",
                        "topAmenity": "<string>",
                        "overallRating": "<string>",
                        "recentReview": "<string>",
                        "serviceAndStaff": "<string>",
                        "cancellation": "<string>",
                        "checkIn": "<string>",
                        "checkOut": "<string>",
                        "restaurant": "<string>",
                        "roomService": "<string>",
                        "familyFriendly": "<string>",
                        "petPolicy": "<string>",
                        "shuttleService": "<string>",
                        "parking": "<string>",
                        "safetyAndSecurity": "<string>",
                        "specialOffers": "<string>",
                        "ambianceAndDecor": "<string>",
                        "environmentalInitiatives": "<string>",
                        "businessAmenities": "<string>",
                        "viewAndSurroundings": "<string>",
                        "noiseLevel": "<string>"
                    }
    '''}])
    print("Done\n\n")
    return json.loads(completion['choices'][0]['message']['content'])

Now that we've defined our generate_hotel function, let's try it! 

In [5]:
hotel_dict = generate_hotel()
print(json.dumps(hotel_dict, indent=4))

Generating a new hotel
--- Randomized Amenity: Temperature-controlled Wine Fridges
--- Randomized Landmark: Golden Gate Park
Done


{
    "name": "Golden Gate Park Hotel",
    "address": "1234 Park Avenue, San Francisco, CA 94121",
    "proximityToAttractions": "Located in the heart of Golden Gate Park, just steps away from the California Academy of Sciences and the Japanese Tea Garden.",
    "neighborhoodSafety": "Safe neighborhood with regular police patrols and high security measures in place.",
    "accessibility": "Conveniently located near public transportation routes and major highways for easy access.",
    "nightlyRate": "$250",
    "hiddenFees": "No hidden fees.",
    "type": "Luxury boutique hotel",
    "size": "100 rooms and suites",
    "wifi": "Complimentary high-speed WiFi throughout the hotel.",
    "breakfast": "Deluxe continental breakfast included for all guests.",
    "poolAndSpa": "Outdoor pool and spa.",
    "climateControl": "Individually controlled climate in al

---

## Set up Embedded Weaviate  

Embedded Weaviate allows you to run a Weaviate vector database locally embedded into your application that runs in memory. The data you write into Embedded Weaviate is stored locally on disk in case you shut your application down and start it up at a later time. You can learn more about Embedded Weaviate [here](https://weaviate.io/developers/weaviate/installation/embedded).

**Note**:

Embedded Weaviate does not use Weaviate Cloud Services, but if you'd like to run your Weaviate Cluster in the cloud. You can skip this section and move onto the next.

In [6]:
import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
  embedded_options=EmbeddedOptions(),
  additional_headers = {
        "X-OpenAI-Api-Key": OPENAI_API_KEY 
    }
)


Started /Users/aj/.cache/weaviate-embedded: process ID 36880


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-11-02T10:47:53-07:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-11-02T10:47:53-07:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"hotel_2PDpU1nsqatD","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-02T10:47:53-07:00","took":261041}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2023-11-02T10:47:53-07:00"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:6666","time":"2023-11-02T10:47:53-07:00"}


**NOTE:** Don't run the following unless you want to detele the hotel class from the vector database


In [32]:
# Delete the class only if you want to clear your database of the Hotel Schema
client.schema.delete_class('Hotel')

In [33]:
# Secondly, 
schema = {
    "vectorizer": "text2vec-openai",
    "classes": [
        {
            "class": "Hotel",
            "vectorizer": "text2vec-openai",
            "vectorizeClassName": True,
            "description": "A hotel",
            "properties": [
                {
                    "name": "name",
                    "description": "The name of the Hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "address",
                    "description": "The address of the Hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "proximityToAttractions",
                    "description": "Proximity to nearby attractions",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "neighborhoodSafety",
                    "description": "Safety of the neighborhood the hotel is located in",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "accessibility",
                    "description": "Accessibility features and information",
                    "dataType": ["text"]
                },
                {
                    "name": "nightlyRate",
                    "description": "The rate per night for staying at the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "hiddenFees",
                    "description": "Any undisclosed fees associated with the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "type",
                    "description": "Type or category of the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "size",
                    "description": "Size or capacity of the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "wifi",
                    "description": "Wi-Fi availability and features",
                    "dataType": ["text"]
                },
                {
                    "name": "breakfast",
                    "description": "Details about breakfast offerings",
                    "dataType": ["text"]
                },
                {
                    "name": "poolAndSpa",
                    "description": "The pool and spa details of the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "climateControl",
                    "description": "Climate control features in the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "topAmenity",
                    "description": "The top amenity of the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "overallRating",
                    "description": "The overall rating of the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "recentReview",
                    "description": "The most recent review for the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "serviceAndStaff",
                    "description": "Details about the service and staff quality",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "cancellation",
                    "description": "The cancellation policies of the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "checkIn",
                    "description": "The check-in time and procedures",
                    "dataType": ["text"]
                },
                {
                    "name": "checkOut",
                    "description": "The check-out time and procedures",
                    "dataType": ["text"]
                },
                {
                    "name": "restaurant",
                    "description": "Information about the hotel's restaurant",
                    "dataType": ["text"]
                },
                {
                    "name": "roomService",
                    "description": "Details about room service offerings",
                    "dataType": ["text"]
                },
                {
                    "name": "familyFriendly",
                    "description": "Describes how family friendly the hotel is",
                    "dataType": ["text"]
                },
                {
                    "name": "petPolicy",
                    "description": "The pet policies of the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "shuttleService",
                    "description": "Details about any shuttle services offered",
                    "dataType": ["text"]
                },
                {
                    "name": "parking",
                    "description": "Parking options and details",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "safetyAndSecurity",
                    "description": "Measures taken for safety and security in the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "specialOffers",
                    "description": "Any special offers or discounts available",
                    "dataType": ["text"]
                },
                {
                    "name": "ambianceAndDecor",
                    "description": "Information about the ambiance and decor of the hotel",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "environmentalInitiatives",
                    "description": "Efforts taken by the hotel for environmental conservation",
                    "dataType": ["text"],
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False
                        }
                    }
                },
                {
                    "name": "businessAmenities",
                    "description": "Business amenities of the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "viewAndSurroundings",
                    "description": "Details about the views and surroundings of the hotel",
                    "dataType": ["text"]
                },
                {
                    "name": "noiseLevel",
                    "description": "Information about noise levels in and around the hotel",
                    "dataType": ["text"]
                }
            ],
            "moduleConfig": {
                "text2vec-openai": {},
                "generative-openai": {
                    "model": "gpt-4"  
                }  # Ensure the `generative-openai` module is used for generative queries
            },
        }
    ]
}

client.schema.create(schema)


{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"hotel_2PDpU1nsqatD","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-01T23:31:16-07:00","took":37250}


Now, we can create any number of hotels, use Weaviate to help create the embeddings from those objects, and finally we can store those in Weaviate itself for vector search and generative AI applications, with just a few lines of code. Check it out!

In [12]:

for i in range (5):
    hotel_data = generate_hotel()
    client.data_object.create(hotel_data, "Hotel")


Generating a new hotel
--- Randomized Amenity: Helipad or Private Jet Access
--- Randomized Landmark: San Francisco Museum of Modern Art (SFMOMA)
Done


Generating a new hotel
--- Randomized Amenity: Temperature-controlled Wine Fridges
--- Randomized Landmark: Chinatown
Done


Generating a new hotel
--- Randomized Amenity: State-of-the-art Technology
--- Randomized Landmark: Ferry Building
Done


Generating a new hotel
--- Randomized Amenity: Private Infinity Pools
--- Randomized Landmark: Cable Cars
Done


Generating a new hotel
--- Randomized Amenity: Pet Services
--- Randomized Landmark: Pier 39
Done






In [9]:
# The following is tuned for someone travelling for business. You can modify the prompt create emails for other purposes also!

generate_prompt = "Explain in an email why this hotel: {name}, could be good for people travelling for business. Here's some business related amenity: {businessAmenities}. If there are some good ways to kick back and relax, you can reference those too: {topAmenity}."

response = (
  client.query
  .get("Hotel", ["name", "address", "proximityToAttractions"])
  .with_generate(single_prompt=generate_prompt)
  .with_near_text({
    "concepts": ["business"]
  })
  .with_limit(2)
).do()

print(response)


{'data': {'Get': {'Hotel': [{'_additional': {'generate': {'error': None, 'singleResult': "Subject: Why Mission Inn is the Perfect Choice for Your Business Travel Needs\n\nDear [Recipient's Name],\n\nI hope this email finds you well. I am writing to introduce you to the Mission Inn, a hotel that perfectly caters to the needs of business travelers like yourself. \n\nOne of the key features of the Mission"}}, 'address': '123 Mission Street, San Francisco, CA 94110', 'name': 'Mission Inn', 'proximityToAttractions': 'Located in the heart of the Mission District, within walking distance to Dolores Park, 16th Street Mission BART Station, and Valencia Street'}, {'_additional': {'generate': {'error': None, 'singleResult': "Subject: Discover the Perfect Blend of Business and Leisure at Alcatraz Island Hotel\n\nDear [Recipient's Name],\n\nI hope this email finds you well. I am writing to introduce you to the Alcatraz Island Hotel, a unique destination that perfectly combines business and leisure 

In [10]:
for hotel in response['data']['Get']['Hotel']:
    print(hotel['_additional']['generate']['singleResult'])
    print('\n\n----------------------------------------------------\n\n')

Subject: Why Mission Inn is the Perfect Choice for Your Business Travel Needs

Dear [Recipient's Name],

I hope this email finds you well. I am writing to introduce you to the Mission Inn, a hotel that perfectly caters to the needs of business travelers like yourself. 

One of the key features of the Mission


----------------------------------------------------


Subject: Discover the Perfect Blend of Business and Leisure at Alcatraz Island Hotel

Dear [Recipient's Name],

I hope this email finds you well. I am writing to introduce you to the Alcatraz Island Hotel, a unique destination that perfectly combines business and leisure amenities to cater to the needs of our business travelers.

Our hotel is


----------------------------------------------------




## Now Let's have some fun!

### Generate Descriptions

Now that we're able to generate hotels and apply Weaviate's Generative Search over the results to produce the emails above, it would be super cool to 


In [16]:
hotel_dict = generate_hotel()
print(json.dumps(hotel_dict, indent=4))
print(hotel_dict['name'])

Generating a new hotel
--- Randomized Amenity: Rooftop Bars or Lounges
--- Randomized Landmark: Coit Tower
Done


{
    "name": "Coit Tower Hotel",
    "address": "1234 Telegraph Hill Blvd, San Francisco, CA 94133",
    "proximityToAttractions": "Located near Coit Tower and Fisherman's Wharf",
    "neighborhoodSafety": "Safe neighborhood with low crime rates",
    "accessibility": "Easy access to public transportation and major highways",
    "nightlyRate": "$200",
    "hiddenFees": "No hidden fees",
    "type": "Boutique Hotel",
    "size": "Medium-sized hotel with 100 rooms",
    "wifi": "Free high-speed WiFi throughout the hotel",
    "breakfast": "Complimentary continental breakfast",
    "poolAndSpa": "Outdoor pool and spa area on the rooftop",
    "climateControl": "Individual climate control in each room",
    "topAmenity": "Rooftop Bars and Lounges",
    "overallRating": "4.5 stars",
    "recentReview": "Beautiful hotel with amazing views of the city",
    "serviceAndStaff": "F

In [7]:
description = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": f'''In 30 words or less describe the the following JSON in a discriptive manner: {hotel_dict}'''}])

hotel_description = description.choices[0].message.content
print(hotel_description)

The Golden Gate Park Hotel is a luxury boutique hotel located in San Francisco. It offers elegant rooms, a variety of amenities, convenient access to attractions, and a safe neighborhood environment. The hotel provides excellent service, flexible cancellation policies, and exclusive offers for guests.


In [8]:
imagery = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": f'''Write descriptive imagery behind the following JSON object to help visualize the entity: {hotel_dict}'''}])

print(imagery.choices[0].message.content)

The Golden Gate Park Hotel is a luxury boutique hotel located at 1234 Park Avenue in San Francisco, CA 94121. As you approach the hotel, you are immediately captivated by its elegant and contemporary decor with a touch of nature-inspired elements. The hotel stands tall amidst stunning views of Golden Gate Park and the surrounding cityscape.

Upon entering the hotel, you are greeted by the friendly and attentive staff who provide exceptional service throughout your stay. The lobby exudes a sense of luxury and sophistication, with temperature-controlled wine fridges lining the walls, displaying the finest selection of wines.

The hotel boasts 100 rooms and suites, all of which are soundproofed for a peaceful and relaxing stay. Each room is adorned with individually controlled climate settings, ensuring your desired comfort. The rooms are spacious and beautifully furnished with elegant and contemporary decor.

Complimentary high-speed WiFi is available throughout the hotel, allowing you t

## Set up Replicate API

- Ensure that you have a Replicate API Key for the following section. 
- You can get a Replicate API Key from [Replicate](https://replicate.com/)

In [9]:
import replicate

REPLICATE_API_KEY = os.getenv('REPLICATE_API_KEY')
replicate = replicate.Client(api_token=REPLICATE_API_KEY)

In [10]:
image_url = replicate.run(
    "stability-ai/sdxl:8beff3369e81422112d93b89ca01426147de542cd4684c244b673b105188fe5f",
    input={"prompt": f"a sleek, modern, luxury hotel room in San Francisco with a {hotel_dict['topAmenity']} and view of {hotel_dict['proximityToAttractions']}"}
)

Image(url=image_url[0], width=600, height=600) 



In [11]:
image_url2 = replicate.run(
    "stability-ai/sdxl:8beff3369e81422112d93b89ca01426147de542cd4684c244b673b105188fe5f",
    input={"prompt": imagery.choices[0].message.content}
)

Image(url=image_url2[0], width=600, height=600) 



## Migrate to Weaviate Cloud Services

The notebook thus far has taken you through the storage of your hotel vectors in Embedded Weaviate. Embedded Weaviate runs locally on your computer, but if you want to access the data in the cloud, you can use Weaviate Cloud Services to host a Weaviate Cluster for you. Alternatively, you can deploy your own instance through docker compose or kubernetes. 

In the following section, we'll store your data in a Weaviate Cluster. Head on over to the [Weaviate Console](https://console.weaviate.cloud/) and create an account and then create a free cluster. Once created you'll get an API Admin Key and a Weaviate cluster URL endpoint. Add these keys to the following code block in the constants `WEAVIATE_API_KEY` and `WEAVIATE_CLUSTER_URL`, respectively.

In [8]:
import weaviate

WEAVIATE_API_KEY = os.getenv('WEAVIATE_API_KEY')
WEAVIATE_CLUSTER_URL = os.getenv('WEAVIATE_CLUSTER_URL')
auth_config = weaviate.AuthApiKey(api_key=WEAVIATE_API_KEY)

client = weaviate.Client(
  url=WEAVIATE_CLUSTER_URL,
  auth_client_secret=auth_config,
    additional_headers = {
        "X-OpenAI-Api-Key": OPENAI_API_KEY 
    }
)


https://qlmrgibyt3tkorfpnlh0a.c0.us-west2.gcp.weaviate.cloud


Now we create the schema in the WCS cluster by calling `schema.create` on the new client.

In [50]:
# If you want to delete old classes you can uncomment the following
# client.schema.delete_class('Hotel')

client.schema.create(schema)

And now we create our Hotel objects with with the following call to `generate_hotel`!

In [None]:
for i in range(5):
    hotel_data = generate_hotel()
    client.data_object.create(hotel_data, "Hotel")