# Get a "sentiment" or a description of the neighborhood, even a suggestion for which demographic class this neighborhood would be best suited for.
Here we make trials with ChatGPT (gpt-3.5-turbo-0125).  
The API key is the key from Constructor Academy, which we could use during the DeepLearning Challenge.

# **Warning: to run this notebook you need to have a OpenAI API key!**

In [1]:
import json
import os
import openai
import locale
import yaml

# Get and Load Data
the data files here are sample files generated by a Google PLACES API query.   
The files are in JSON format.   
To be able to use in the script in COLAB, load them first in the Colab environment (from .\COMPARIS-REPO\data\raw\address_examples)

In [2]:
def load_data(path):
    try: 
        with open(path, 'r') as file:
            data = json.load(file)
        return data
    except Exception as e:
        print(f"An error as occured: {e}")
        
def get_file_names(directory):
    return [
        os.path.join(directory, file) for file in os.listdir(directory) if file.endswith('.json')
    ]

In [3]:
directory = r"../data/google_data_isochrone_pop_cgpt"
FILES = get_file_names(directory)
FILES

['../data/google_data_isochrone_pop_cgpt\\Ex1_8004_Zurich_Werdgartengasse_4.json',
 '../data/google_data_isochrone_pop_cgpt\\Ex2_3027_Bern_Colombstrasse_39.json',
 '../data/google_data_isochrone_pop_cgpt\\Ex3_1006_Lausanne_Av_d_Ouchy_58.json',
 '../data/google_data_isochrone_pop_cgpt\\Ex4_8355_Aadorf_Bruggwiesenstrasse_5.json',
 '../data/google_data_isochrone_pop_cgpt\\Ex5_6319_Allenwinden_Winzruti_39.json',
 '../data/google_data_isochrone_pop_cgpt\\Ex6_8005_Zurich_Heinrichstrasse_200.json',
 '../data/google_data_isochrone_pop_cgpt\\Ex7_8003_Zurich_Birmensdorferstrasse_108.json']

# Explore data file structure

In [4]:
# example with : 
data = load_data(FILES[5])

In [5]:
# facilities names
data.keys()

dict_keys(['original_address', 'facilities', 'isochrone', 'population', 'text_description'])

In [6]:
data['facilities'].keys()

dict_keys(['bars', 'restaurants', 'kindergarten', 'public_transportation', 'gym_fitness', 'grocery_stores_supermarkets', 'gas_ev_charging', 'schools'])

In [7]:
data['facilities']['restaurants'].keys()

dict_keys(['data', 'count', 'average_rating', 'closest'])

# Describe neigborhood.

In [8]:
'Total restaurants = ' + str(data['facilities']['restaurants']['count']) + ', the closest one is "' + data['facilities']['restaurants']['closest']['name'] +'", ' +data['facilities']['restaurants']['closest']['travel_time'] + ' away by foot'

'Total restaurants = 15, the closest one is "Don Weber", 1 min away by foot'

In [9]:
data['facilities']['restaurants']['closest']['name'] +'   ' +data['facilities']['restaurants']['closest']['travel_time']

'Don Weber   1 min'

In [10]:
# describe neighborhood simply on the number and vicinity of restaurants
def describe_neighborhood(data):
  Neighborhood ='The neighborhood comprises following facilities within 10 min walking distance: '
  for facility in data['facilities'].keys():
    if data['facilities'][facility]['count']!=0:
      Neighborhood +=  facility + ': ' + str(data['facilities'][facility]['count'])+' (closest one: ' + data['facilities'][facility]['closest']['name'] +', ' +data['facilities'][facility]['closest']['travel_time'] +'), '
    else:
      pass
  
  return Neighborhood

print(describe_neighborhood(data))

The neighborhood comprises following facilities within 10 min walking distance: bars: 13 (closest one: Don Weber, 1 min), restaurants: 15 (closest one: Don Weber, 1 min), kindergarten: 16 (closest one: Hotel Züri by Fassbind, 4 mins), public_transportation: 37 (closest one: Löwenbräu, 3 mins), gym_fitness: 16 (closest one: Body Mind Coaching, 3 mins), grocery_stores_supermarkets: 21 (closest one: Berg und Tal Viadukt, 1 min), gas_ev_charging: 12 (closest one: Rigoni & Co, 1 min), schools: 4 (closest one: Schulhaus Schütze, 3 mins), 


# Format the reviews for the prompt

## All reviews for ONE restaurant

In [11]:
# info available for one specific facility
data['facilities']['restaurants']['data'][0].keys()

dict_keys(['place_id', 'name', 'rating', 'num_ratings', 'vicinity', 'location', 'reviews', 'num_reviews', 'url', 'travel_time'])

In [12]:
# info available for one specific review
data['facilities']['restaurants']['data'][0]['reviews'][0].keys()

dict_keys(['author_name', 'author_url', 'language', 'original_language', 'profile_photo_url', 'rating', 'relative_time_description', 'text', 'time', 'translated'])

In [13]:
Review_0 = ''
for review_i in range(len(data['facilities']['restaurants']['data'][0]['reviews'])):
  Review_0 = Review_0 + 'Review#: ' + str(review_i) + ' / Rating: ' +   str(data['facilities']['restaurants']['data'][0]['reviews'][review_i]['rating']) + ' / Text: '+   str(data['facilities']['restaurants']['data'][0]['reviews'][review_i]['text']) + '\n'
print(Review_0)

Review#: 0 / Rating: 4 / Text: At CLOUDS, the view is the real star. The brunch on Sunday was diverse and varied.
The quality of the food was delicious and flavoursome. Not overcrowded, the selection was suitable.
The service was very attentive, courteous and friendly.
Unfortunately, we had to wait a very long time at the start before we were shown to our seats, which could be improved.
Review#: 1 / Rating: 4 / Text: Although the restaurant emulates the "in the clouds" concept, I was slightly disappointed in the quality of food and atmosphere. The best thing consume was the lentils salad, then was followed by subpar dishes that lacked the momentum of the starter. However, the venue's photo worthy placement and accommodating service complimented the reservation.
Review#: 2 / Rating: 2 / Text: Disappointing Tourist Trap

Our visit for a celebratory dinner turned out to be regrettable.

Despite the steep prices, the service fell short of expectations. From wine being forgotten to be serve

## All reviews for ALL restaurants

In [14]:
Reviews_all = ''

for facility_i in range(len(data['facilities']['restaurants']['data'])):
  Reviews = ''

  for review_i in range(len(data['facilities']['restaurants']['data'][facility_i]['reviews'])):
    review_text = data['facilities']['restaurants']['data'][facility_i]['reviews'][review_i]['text'].replace('\r','').replace('\n', ' ').strip()
    Reviews +=  'Review#: ' + str(review_i) + ' / Rating: ' + str(data['facilities']['restaurants']['data'][facility_i]['reviews'][review_i]['rating']) + ' / Text: '+ review_text + '\n'

  Reviews_all += (
    'Restaurant "'+ str(data['facilities']['restaurants']['data'][facility_i]['name'])+
    '" has a rating of ' + str(data['facilities']['restaurants']['data'][facility_i]['rating']) +
    ' based on ' +str(data['facilities']['restaurants']['data'][facility_i]['num_ratings']) + ' single ratings' +
    ' and ' +str(data['facilities']['restaurants']['data'][facility_i]['num_reviews']) + ' reviews. \n-----\n' +
    Reviews +  '\n****************************************\n'
  )



print(Reviews_all)

Restaurant "CLOUDS" has a rating of 4 based on 1954 single ratings and 5 reviews. 
-----
Review#: 0 / Rating: 4 / Text: At CLOUDS, the view is the real star. The brunch on Sunday was diverse and varied. The quality of the food was delicious and flavoursome. Not overcrowded, the selection was suitable. The service was very attentive, courteous and friendly. Unfortunately, we had to wait a very long time at the start before we were shown to our seats, which could be improved.
Review#: 1 / Rating: 4 / Text: Although the restaurant emulates the "in the clouds" concept, I was slightly disappointed in the quality of food and atmosphere. The best thing consume was the lentils salad, then was followed by subpar dishes that lacked the momentum of the starter. However, the venue's photo worthy placement and accommodating service complimented the reservation.
Review#: 2 / Rating: 2 / Text: Disappointing Tourist Trap  Our visit for a celebratory dinner turned out to be regrettable.  Despite the st

## All reviews of the whole neighborhood (i.e. ALL instances of ALL facility types)

In [15]:
def concatenate_reviews_whole_neigborhood(data: 'json file', min: 'minimum reviews to be considered'):
    """
    Takes in a json file with all reviews (if any) from a neighborhood and formats them in a pretty way.
    In case less than min reviews are present reviews are not listed.
    Arguments:
    - data: a json file
    - min: minimum reviews to be considered
    """

    Reviews_Neighborhood =''

    for facility_type in data['facilities'].keys():
    # check if at least one entry for this facility type
        if len(data['facilities'][facility_type]['data'])==0:
            Reviews_facility = '\n\n'+facility_type + '\n###################################################\n' + 'There is no ' + facility_type + ' in the area.\n'
        else:
            Reviews_facility = '\n\n'+facility_type + '\n###################################################\n'

            # Loop through each instance of one facility type:
            for facility_i in range(len(data['facilities'][facility_type]['data'])):
                num_reviews = data['facilities'][facility_type]['data'][facility_i]['num_reviews']
                num_ratings = data['facilities'][facility_type]['data'][facility_i]['num_ratings']

                if (type(num_reviews)==str) or (num_reviews <= min):
                    # case if not enough reviews, and also not enough ratings:
                    if (type(num_ratings)==str) or (num_reviews <= min):
                        Reviews_facility += ('****************************************\n'+
                        facility_type + ' "'+ str(data['facilities'][facility_type]['data'][facility_i]['name'])+
                        '" has not yet been rated or reviewed sufficiently (<'+str(min)+').\n')

                    # case if not enough reviews, but enough ratings:
                    else:
                        Reviews_facility += ('****************************************\n'+
                        facility_type + ' "'+  str(data['facilities'][facility_type]['data'][facility_i]['name'])+
                        '" has a rating of ' + str(data['facilities'][facility_type]['data'][facility_i]['rating']) +
                        ' based on ' +         str(data['facilities'][facility_type]['data'][facility_i]['num_ratings']) +
                        ' single ratings, but not enough text reviews are available(<' +str(min)+').\n')

                # only if enough reviews we go through the loop to aggregate the reviews:
                else:
                    Reviews_facility += ('****************************************\n'+
                    facility_type + ' "'+  str(data['facilities'][facility_type]['data'][facility_i]['name'])+
                    '" has a rating of ' + str(data['facilities'][facility_type]['data'][facility_i]['rating']) +
                    ' based on ' +         str(data['facilities'][facility_type]['data'][facility_i]['num_ratings']) + ' single ratings' +
                    ' and ' +              str(data['facilities'][facility_type]['data'][facility_i]['num_reviews']) + ' reviews. \n-----\n'    )

                    Reviews = ''
                    for review_i in range(len(data['facilities'][facility_type]['data'][facility_i]['reviews'])):
                        review_text = data['facilities'][facility_type]['data'][facility_i]['reviews'][review_i]['text'].replace('\r','').replace('\n', ' ').strip()
                        Reviews +=  'Review#: ' + str(review_i) + ' / Rating: ' + str(data['facilities'][facility_type]['data'][facility_i]['reviews'][review_i]['rating']) + ' / Text: '+ review_text + '\n'

                    Reviews_facility += Reviews
        Reviews_Neighborhood +=  Reviews_facility
    return Reviews_Neighborhood

In [16]:
print(concatenate_reviews_whole_neigborhood(data, 3))




bars
###################################################
****************************************
bars "CLOUDS" has a rating of 4 based on 1954 single ratings and 5 reviews. 
-----
Review#: 0 / Rating: 4 / Text: At CLOUDS, the view is the real star. The brunch on Sunday was diverse and varied. The quality of the food was delicious and flavoursome. Not overcrowded, the selection was suitable. The service was very attentive, courteous and friendly. Unfortunately, we had to wait a very long time at the start before we were shown to our seats, which could be improved.
Review#: 1 / Rating: 4 / Text: Although the restaurant emulates the "in the clouds" concept, I was slightly disappointed in the quality of food and atmosphere. The best thing consume was the lentils salad, then was followed by subpar dishes that lacked the momentum of the starter. However, the venue's photo worthy placement and accommodating service complimented the reservation.
Review#: 2 / Rating: 2 / Text: Disappointing 

# Interact with ChatGPT

## Load OpenAI API Credentials


In [17]:
locale.getpreferredencoding = lambda: "UTF-8"

with open('credentials.json', 'r') as file:
    api_creds = yaml.safe_load(file)

openai.api_key = api_creds['openai_key']

## PROMPTING


### Classify based on List of facilities


In [18]:
def classify_based_on_Composition(FILE, categories):

    text = describe_neighborhood(FILE)

    # Define the message for classification
    messages = [
    {"role": "system", "content": "You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property."},
    {"role": "user", "content": f"""
    Find out for which of these following categories the area would be best suited. \nCategories: \'{categories}\'.
    Text: \'{text}\'. 
    Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
    Your output must follow this structure (1), 2), 3)).
    """}
    ]

    # Call the OpenAI API with the chat completion endpoint
    response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0125",  # advice from Adriano, instead of gpt-3.5-turbo (because of tokens usage)
    messages=messages,
    max_tokens=1000,  # Adjust
    n=1,
    stop=None,
    temperature=0  # Setting temperature to 0 for deterministic results
    )

    # Extract the classification result from the response
    classification_result = response.choices[0].message['content'].strip()

    return FILE['original_address']['address']+ '\n--------\nPrompt:\n' + messages[0]['content'] + messages[1]['content'] + '--------\nAnswer: ' +classification_result


In [19]:
categories = "A. Party people, B. Calm and silence loving community"
print(classify_based_on_Composition(data,categories))

8005 Zürich, Heinrichstrasse 200
--------
Prompt:
You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property.
    Find out for which of these following categories the area would be best suited. 
Categories: 'A. Party people, B. Calm and silence loving community'.
    Text: 'The neighborhood comprises following facilities within 10 min walking distance: bars: 13 (closest one: Don Weber, 1 min), restaurants: 15 (closest one: Don Weber, 1 min), kindergarten: 16 (closest one: Hotel Züri by Fassbind, 4 mins), public_transportation: 37 (closest one: Löwenbräu, 3 mins), gym_fitness: 16 (closest one: Body Mind Coaching, 3 mins), grocery_stores_supermarkets: 21 (closest one: Berg und Tal Viadukt, 1 min), gas_ev_charging: 12 (closest one: Rigoni & Co, 1 min), schools: 4 (closest one: Schulhaus Schütze, 3 mins), '. 
    Return: 1) the category, 2) two sentences

#### example results

In [None]:
# 8005 Zürich, Heinrichstrasse 200
# --------
# Prompt:
# You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property.
#     Find out for which of these following categories the area would be best suited. 
# Categories: 'A. Party people, B. Calm and silence loving community'.
#     Text: 'The neighborhood comprises following facilities within 10 min walking distance: bars: 13 (closest one: Don Weber, 1 min), restaurants: 15 (closest one: Don Weber, 1 min), kindergarten: 16 (closest one: Hotel Züri by Fassbind, 4 mins), public_transportation: 37 (closest one: Löwenbräu, 3 mins), gym_fitness: 16 (closest one: Body Mind Coaching, 3 mins), grocery_stores_supermarkets: 21 (closest one: Berg und Tal Viadukt, 1 min), gas_ev_charging: 12 (closest one: Rigoni & Co, 1 min), schools: 4 (closest one: Schulhaus Schütze, 3 mins), '. 
#     Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
#     Your output must follow this structure (1), 2), 3)).
#     --------
# Answer: 1) Category: A. Party people
# 2) Explanation for choosing this category: The area is well-suited for party people due to the high number of bars (13) and restaurants (15) within a 10-minute walking distance. Additionally, the presence of a gym_fitness facility (16) suggests an active and social lifestyle, which aligns with the preferences of party people.
# 3) Explanation for excluding the other category (B. Calm and silence loving community): The high number of bars, restaurants, and other facilities such as public transportation (37) and grocery stores (21) indicates a lively and bustling environment, which may not be ideal for a calm and silence-loving community seeking a tranquil living space away from noise and activity.
# 3027 Bern, Colombstrasse 39
# --------
# Prompt:
# You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property.
#     Find out for which of these following categories the area would be best suited. 
# Categories: 'A. Party people, B. Calm and silence loving community'.
#     Text: 'The neighborhood comprises following facilities within 10 min walking distance: bars: 4 (closest one: Bits & Bites Bern, 5 mins), restaurants: 20 (closest one: Le Bistro - Westside, 3 mins), kindergarten: 20 (closest one: Holiday Inn Bern - Westside, an IHG Hotel, 5 mins), public_transportation: 22 (closest one: Gäbelbach, 3 mins), gym_fitness: 14 (closest one: Fitness meets Beauty, 1 min), grocery_stores_supermarkets: 18 (closest one: Migros-Supermarkt - Bern - Westside, 4 mins), gas_ev_charging: 12 (closest one: Holiday Inn Bern - Westside, an IHG Hotel, 5 mins), schools: 3 (closest one: Clubhaus (Buvette) FC Bethlehem, 4 mins), '. 
#     Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
#     Your output must follow this structure (1), 2), 3)).
#     --------
# Answer: 1) Category: A. Party people
# 2) Explanation: The area is well-suited for party people as it has a high number of bars (4) and restaurants (20) within a 10-minute walking distance. Additionally, there are gyms and grocery stores nearby, indicating a lively and convenient lifestyle that would appeal to party people.
# 3) Explanation: The category of Calm and silence loving community is excluded because the area has a high number of bars, restaurants, and other facilities that cater to a more vibrant and active lifestyle. The presence of these amenities suggests a bustling and lively environment, which may not be ideal for those seeking calm and silence.
# 1006 Lausanne, Av. d'Ouchy 58
# --------
# Prompt:
# You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property.
#     Find out for which of these following categories the area would be best suited. 
# Categories: 'A. Party people, B. Calm and silence loving community'.
#     Text: 'The neighborhood comprises following facilities within 10 min walking distance: bars: 12 (closest one: White Horse, 1 min), restaurants: 12 (closest one: Takayama. Sushi bar & restaurant, 1 min), kindergarten: 22 (closest one: ImmoStreet.ch SA, 1 min), public_transportation: 28 (closest one: ImmoStreet.ch SA, 1 min), gym_fitness: 15 (closest one: ImmoStreet.ch SA, 1 min), grocery_stores_supermarkets: 15 (closest one: ImmoStreet.ch SA, 1 min), gas_ev_charging: 14 (closest one: ImmoStreet.ch SA, 1 min), schools: 5 (closest one: Formasuisse, Formations Rh, Management, Certificat Rh, 2 mins), '. 
#     Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
#     Your output must follow this structure (1), 2), 3)).
#     --------
# Answer: 1) Category: A. Party people
# 2) Explanation: The area is well-suited for party people as it has a high number of bars (12) and restaurants (12) within a 1-minute walking distance. Additionally, there are facilities like gyms, grocery stores, and gas stations nearby, catering to the needs of an active and social community.
# 3) Explanation: The category of Calm and silence loving community is excluded because the area has a high number of bars, restaurants, and other facilities that indicate a lively and bustling environment, which may not be ideal for those seeking peace and quiet. The presence of a kindergarten and schools also suggests a more vibrant and active neighborhood rather than a tranquil one.
# 8355 Aadorf, Bruggwiesenstrasse 5
# --------
# Prompt:
# You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property.
#     Find out for which of these following categories the area would be best suited. 
# Categories: 'A. Party people, B. Calm and silence loving community'.
#     Text: 'The neighborhood comprises following facilities within 10 min walking distance: bars: 3 (closest one: RotFarbKeller, 5 mins), restaurants: 5 (closest one: Ristorante El Capone, 3 mins), kindergarten: 28 (closest one: System-Clinch Telecom GmbH, 4 mins), public_transportation: 15 (closest one: System-Clinch Telecom GmbH, 4 mins), gym_fitness: 18 (closest one: System-Clinch Telecom GmbH, 4 mins), grocery_stores_supermarkets: 15 (closest one: System-Clinch Telecom GmbH, 4 mins), gas_ev_charging: 16 (closest one: System-Clinch Telecom GmbH, 4 mins), '. 
#     Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
#     Your output must follow this structure (1), 2), 3)).
#     --------
# Answer: 1) Category: A. Party people
# 2) Explanation: The area seems best suited for party people because it has a high number of bars (3) and restaurants (5) within a short walking distance, indicating a vibrant nightlife scene. Additionally, the presence of a gym and grocery stores suggests convenience for residents who enjoy an active social life.
# 3) Explanation: The category of Calm and silence loving community is excluded because the high number of bars, restaurants, and other facilities catering to social activities may result in increased noise levels and activity, which may not be ideal for individuals seeking a quiet and peaceful environment.
# 6319 Allenwinden, Winzrüti 39
# --------
# Prompt:
# You are an assistant for text classification tasks. The text you are given refers to the number of several facilities which are located in an area of radius 1km around a given real estate property.
#     Find out for which of these following categories the area would be best suited. 
# Categories: 'A. Party people, B. Calm and silence loving community'.
#     Text: 'The neighborhood comprises following facilities within 10 min walking distance: restaurants: 1 (closest one: Gasthaus Löwen, 9 mins), kindergarten: 18 (closest one: Elektrizitäts-Genossenschaft, 4 mins), public_transportation: 14 (closest one: Elektrizitäts-Genossenschaft, 4 mins), gym_fitness: 9 (closest one: Elektrizitäts-Genossenschaft, 4 mins), grocery_stores_supermarkets: 9 (closest one: Elektrizitäts-Genossenschaft, 4 mins), gas_ev_charging: 9 (closest one: Elektrizitäts-Genossenschaft, 4 mins), '. 
#     Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
#     Your output must follow this structure (1), 2), 3)).
#     --------
# Answer: 1) The area would be best suited for category B. Calm and silence loving community.
# 2) I chose this category because the neighborhood has facilities like a kindergarten, grocery stores, and gas stations, which are essential for families and individuals looking for a peaceful and quiet environment. Additionally, the presence of public transportation and a gym suggests a focus on convenience and well-being rather than a party atmosphere.
# 3) I exclude category A. Party people because the area lacks facilities like bars, clubs, and entertainment venues typically preferred by individuals seeking a vibrant nightlife. Additionally, the high number of kindergartens and proximity to public transportation indicate a family-friendly and residential-focused environment rather than a party-centric one.

### Summarize the atmosphere of the neighborhood based on all reviews

In [20]:
def summarize_based_on_all_reviews(FILE):

    text = concatenate_reviews_whole_neigborhood(FILE, min= 3)

    # Define the message for classification
    messages = [
        {"role": "system", "content": "You are an assistant for text summarization tasks. The text you are given lists reviews (if any) for different facility types (hospitals, restaurants...) which are located in an area of radius 1km around a given real estate property."},
        {"role": "user", "content": f"""
        Summarize im three sentences and maximum 700 characters the atmosphere of that area given the following text: \'{text}\'.
        Be concise.
        Do not repeat the information given in the text.
        Do not output non-informative text like <<Facility_type xyz in the vicinity have varying ratings and reviews, with comments on facilities and staff>>.
        """}
    ]

    # Call the OpenAI API with the chat completion endpoint
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0125",  # advice from Adriano, instead of gpt-3.5-turbo (because of tokens usage)
        messages=messages,
        max_tokens=1000,  # Adjust
        n=1,
        stop=None,
        temperature=0  # Setting temperature to 0 for deterministic results
    )

    # Extract the classification result from the response
    summary_result = response.choices[0].message['content'].strip()
    return FILE['original_address']['address']+ '\n--------\nAnswer: ' + summary_result

In [39]:
FILE = load_data(FILES[5])
print(FILE['original_address']['address']+ '\n--------\nAnswer: ')
print(summarize_based_on_all_reviews(FILE))
# ERROR ! : This model's maximum context length is 16385 tokens. However, your messages resulted in 41199 tokens. Please reduce the length of the messages.

8005 Zürich, Heinrichstrasse 200
--------
Answer: 


In [33]:
data = load_data(FILES[1])
print(summarize_based_on_all_reviews(data))
# ERROR ! : This model's maximum context length is 16385 tokens. However, your messages resulted in 32171 tokens. Please reduce the length of the messages.

In [31]:
data = load_data(FILES[2])
print(summarize_based_on_all_reviews(data))
#  ERROR ! : This model's maximum context length is 16385 tokens. However, your messages resulted in 45355  tokens. Please reduce the length of the messages.

In [21]:
data = load_data(FILES[3])
print(summarize_based_on_all_reviews(data))

8355 Aadorf, Bruggwiesenstrasse 5
--------
Answer: The area around the property features highly-rated bars and restaurants with cozy atmospheres and friendly staff. Customers praise the quality of food and service at the restaurants, while the bars are commended for their welcoming ambiance and support for local artists. Additionally, there are options for grocery shopping and gas refueling available nearby.


In [22]:
data = load_data(FILES[4])
print(summarize_based_on_all_reviews(data))

6319 Allenwinden, Winzrüti 39
--------
Answer: The area around the property lacks bars but features highly-rated restaurants like "Gasthaus Löwen" with positive reviews praising the cuisine and cozy atmosphere. Kindergartens and public transportation options have mixed reviews, while gym/fitness centers and grocery stores/supermarkets have positive ratings and reviews. Gas/EV charging stations and schools are not present in the vicinity.


### Classify based on ALL reviews of all facilities available. Not sure this is really sensible.

In [23]:
def summarize_AND_classify_based_on_all_reviews(text,categories):
    # Define the message for classification
    messages = [
        {"role": "system", "content": "You are an assistant for segmentation tasks. The text you are given lists reviews (if any) for different facility types (hospitals, restaurants...) which are located in an area of radius 1km around a given real estate property."},
        {"role": "user", "content": f"""
        Find out for which of these following categories the area would be best suited. \nCategories: \'{categories}\'.
        The reviews are to be found at the end of the prompt after the *************.
        Return: 1) the category, 2) two sentences to explain why you chose this category,  3) two sentences to explain why you exclude the other category.
        Your output must follow this structure: 1), 2), 3).
        Be concise.
        Do not repeat the information given in the text.
        Do not output non-informative text like <<Facility_type xyz in the vicinity have varying ratings and reviews, with comments on facilities and staff>>.
        *********************************
        Reviews \'{text}\'
        """}
    ]

    # Call the OpenAI API with the chat completion endpoint
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0125",  # advice from Adriano, instead of gpt-3.5-turbo (because of tokens usage)
        messages=messages,
        max_tokens=1000,  # Adjust the number of tokens based on your needs
        n=1,
        stop=None,
        temperature=0  # Setting temperature to 0 for deterministic results
    )

    # Extract the classification result from the response
    result = response.choices[0].message['content'].strip()
    return result