Utilisation de pydantic pour forcer le LLM à mieux respecter un format qui sera transformé grâce à des mécanismes de LangChain.

In [1]:
import os
os.environ["OPENAI_API_KEY"] = "voc-8162499801266773377505669655d3c05508.40840521"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

In [2]:
#from langchain_openai import OpenAI
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

In [3]:
model_name="gpt-3.5-turbo-instruct"
#model_name="gpt-4o-mini"
temperature = 0.0
llm = OpenAI(
    model_name=model_name, temperature=temperature, max_tokens=3500
)

In [4]:
class PropertyAdvertClass(BaseModel):
    location: str = Field(
        description = "location in USA including the name the neighborhood"
    )
    style: str = Field(
        description = "style of construction"
    )
    rooms: int = Field(
        description = "number of rooms"
    )
    bedrooms: int = Field(
        description = "number of bedrooms"
    )
    bathrooms: int = Field(
        description = "number of bathrooms"
    )
    floors: int = Field(
        description = "number of floors"
    )
    house_size: int = Field(
        description = "surface area in square feet"
    )
    price: int = Field(
        description = "price in dollars"
    )
    property_description : str = Field(
        description = "a detailed description of the property"
    )
    neighborhood_description : str = Field(
        description = "the neighborhood description"
    )

class ListOfAdvertsClass(BaseModel):
    adverts_list: list[PropertyAdvertClass]

```python
complete_advert : str = Field(
    description = "the complete detailled description of this advertisement, including the neightborhood location, style, rooms, bedrooms, bathrooms, floors, house_size, price, and property and nieghtborhood descriptions"
)

description = "the complete detailled description of this advertisement, including the neightborhood location, style, rooms, bedrooms, bathrooms, floors, house_size, price, and property and nieghtborhood descriptions"

```

In [5]:
parser = PydanticOutputParser(pydantic_object=ListOfAdvertsClass)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"PropertyAdvertClass": {"properties": {"location": {"description": "location in USA including the name the neighborhood", "title": "Location", "type": "string"}, "style": {"description": "style of construction", "title": "Style", "type": "string"}, "rooms": {"description": "number of rooms", "title": "Rooms", "type": "integer"}, "bedrooms": {"description": "number of bedrooms", "title": "Bedrooms", "type": "integer"}, "bathrooms": {"description": "number of bathrooms", "title": "Bathrooms", "type": "integer"}, "floors": {"descriptio

In [6]:
gen_prompt = PromptTemplate(
    template="{question}.{context}\n{format_instructions}",
    input_variables=["question", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

In [7]:
# each information and descriptions of an advertisement must be repeated in a new complete description set in the 'complete_advert' property.

num_ads = 2
adverts_query = f"""
    generate {num_ads} real estate advertisements for middle-class buyers, each respecting the output schema. be creative in your descriptions but consistent and realistic.
"""

#chain = prompt | llm | parser

prompt = gen_prompt.format(question=adverts_query, context="the following is a list of properties for sale in the USA.")
print(prompt)

generated_adverts = llm.invoke(prompt)
print(generated_adverts)


    generate 2 real estate advertisements for middle-class buyers, each respecting the output schema. be creative in your descriptions but consistent and realistic.
.the following is a list of properties for sale in the USA.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"PropertyAdvertClass": {"properties": {"location": {"description": "location in USA including the name the neighborhood", "title": "Location", "type": "string"}, "style": {"description": "style of construction", "title": "Style", "type": "string"}, "rooms": {"description": "number of rooms", "title": "Rooms", "

In [8]:
generated_adverts = parser.parse(generated_adverts)

In [9]:
print(">", generated_adverts.adverts_list[0].property_description, end="\n\n")
print(">", generated_adverts.adverts_list[0].neighborhood_description, end="\n\n")

> This charming Victorian home is located in the heart of Brooklyn. With 3 spacious bedrooms and 2 full bathrooms, this home is perfect for a growing family. The beautiful hardwood floors and original crown molding add character and charm to the home. The backyard is perfect for entertaining and the neighborhood is filled with friendly neighbors and great schools.

> The neighborhood of Brooklyn is known for its diverse community and vibrant culture. With plenty of restaurants, shops, and parks, there is always something to do. The area is also home to some of the best schools in the city, making it a great place for families.



In [10]:
import json
filename = "generated_adverts_b.jsonl"
with open(filename, "w") as save_file:
    for advert in generated_adverts.adverts_list:
        json.dump(advert.model_dump(mode="json"), save_file)
        save_file.write('\n')
save_file.close()

In [11]:
with open(filename, "r") as file:
    for line in file:
        data_entry = json.loads(line)
        # Process each data_entry as a Python dict
        print(data_entry)

{'location': 'Brooklyn, New York', 'style': 'Victorian', 'rooms': 6, 'bedrooms': 3, 'bathrooms': 2, 'floors': 2, 'house_size': 2000, 'price': 500000, 'property_description': 'This charming Victorian home is located in the heart of Brooklyn. With 3 spacious bedrooms and 2 full bathrooms, this home is perfect for a growing family. The beautiful hardwood floors and original crown molding add character and charm to the home. The backyard is perfect for entertaining and the neighborhood is filled with friendly neighbors and great schools.', 'neighborhood_description': 'The neighborhood of Brooklyn is known for its diverse community and vibrant culture. With plenty of restaurants, shops, and parks, there is always something to do. The area is also home to some of the best schools in the city, making it a great place for families.'}
{'location': 'Chicago, Illinois', 'style': 'Craftsman', 'rooms': 5, 'bedrooms': 2, 'bathrooms': 1, 'floors': 1, 'house_size': 1500, 'price': 350000, 'property_des

In [12]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

In [13]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

In [14]:
from uuid import uuid4
from langchain_core.documents import Document

documents = []
for i, advert in enumerate(generated_adverts.adverts_list, start=1):
    metadata = {}
    metadata["source"] = "generated_adverts"
    metadata["id"] = i
    metadata["location"] = advert.location
    metadata["style"] = advert.style
    metadata["rooms"] = advert.rooms
    metadata["bedrooms"] = advert.bedrooms
    metadata["bathrooms"] = advert.bathrooms
    metadata["floors"] = advert.floors
    metadata["house_size"] = advert.house_size
    metadata["price"] = advert.price

    page_content = advert.property_description + advert.neighborhood_description
    documents.append(
        Document(page_content=page_content, metadata=metadata)
        )
    
uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)

['f472494c-a48a-4c2d-b672-a39109a574a8',
 '2f1b00ce-7d5e-43cf-b80d-2e190fa4d2d9']

In [15]:
request = "I am looking for a nice town at Chicago with at least 2 bedrooms"

In [16]:
results = vector_store.similarity_search(
    request,
    k=2,
    filter={"source": "generated_adverts"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* This cozy Craftsman home is located in the quiet suburbs of Chicago. With 2 bedrooms and 1 bathroom, this home is perfect for a small family or a couple looking to downsize. The open floor plan and large windows allow for plenty of natural light, making the home feel spacious and inviting. The backyard is perfect for gardening and the neighborhood is peaceful and safe.The suburbs of Chicago offer a peaceful and family-friendly atmosphere. With plenty of parks and community events, there is always something to do. The area is also known for its great schools and low crime rates, making it a great place to raise a family. [{'bedrooms': 2, 'price': 350000, 'house_size': 1500, 'rooms': 5, 'style': 'Craftsman', 'id': 2, 'bathrooms': 1, 'source': 'generated_adverts', 'floors': 1, 'location': 'Chicago, Illinois'}]
* This charming Victorian home is located in the heart of Brooklyn. With 3 spacious bedrooms and 2 full bathrooms, this home is perfect for a growing family. The beautiful hardwoo

In [17]:
results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=2, filter={"source": "generated_adverts"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

* [SIM=1.830079] This cozy Craftsman home is located in the quiet suburbs of Chicago. With 2 bedrooms and 1 bathroom, this home is perfect for a small family or a couple looking to downsize. The open floor plan and large windows allow for plenty of natural light, making the home feel spacious and inviting. The backyard is perfect for gardening and the neighborhood is peaceful and safe.The suburbs of Chicago offer a peaceful and family-friendly atmosphere. With plenty of parks and community events, there is always something to do. The area is also known for its great schools and low crime rates, making it a great place to raise a family. [{'source': 'generated_adverts', 'id': 2, 'rooms': 5, 'price': 350000, 'bedrooms': 2, 'location': 'Chicago, Illinois', 'floors': 1, 'bathrooms': 1, 'style': 'Craftsman', 'house_size': 1500}]
* [SIM=1.859442] This charming Victorian home is located in the heart of Brooklyn. With 3 spacious bedrooms and 2 full bathrooms, this home is perfect for a growing

In [18]:
results = vector_store.similarity_search_by_vector(
    embedding=embeddings.embed_query(request), k=2
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* This cozy Craftsman home is located in the quiet suburbs of Chicago. With 2 bedrooms and 1 bathroom, this home is perfect for a small family or a couple looking to downsize. The open floor plan and large windows allow for plenty of natural light, making the home feel spacious and inviting. The backyard is perfect for gardening and the neighborhood is peaceful and safe.The suburbs of Chicago offer a peaceful and family-friendly atmosphere. With plenty of parks and community events, there is always something to do. The area is also known for its great schools and low crime rates, making it a great place to raise a family. [{'id': 2, 'style': 'Craftsman', 'rooms': 5, 'floors': 1, 'house_size': 1500, 'source': 'generated_adverts', 'bathrooms': 1, 'bedrooms': 2, 'price': 350000, 'location': 'Chicago, Illinois'}]
* This charming Victorian home is located in the heart of Brooklyn. With 3 spacious bedrooms and 2 full bathrooms, this home is perfect for a growing family. The beautiful hardwoo