<a href="https://colab.research.google.com/github/roy-sub/Automated-Google-Map-Scrapper/blob/main/gm_scraper_d1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Installing Necessary Libraries and Files**

In [None]:
! pip -q install langchain openai tiktoken cohere
! gdown --id 1NLsQzhzwcOqYvrVHpzpvD1-u1840rzFO # constant.py

#**Provide OpenAI Credentials**

In [4]:
import os
os.environ["OPENAI_API_KEY"] ="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"

#**Primary Script : To get relevant Google Map Searches**

In [2]:
# Import necessary modules and classes

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from constants import SEARCH_TERM_TEMPLATE_STRING, SEARCH_LOCATION_TEMPLATE_STRING

def get_gm_search(search_term, search_location):

  try:

    # Initialize output parser, format instructions, and language model

    output_parser = CommaSeparatedListOutputParser()

    format_instructions = output_parser.get_format_instructions()

    llm = ChatOpenAI(temperature=0.0) # model_name="text-davinci-003"

    # Operation I : Get Related Search Terms

    search_term_prompt = ChatPromptTemplate(
        messages=[
            HumanMessagePromptTemplate.from_template(SEARCH_TERM_TEMPLATE_STRING)
        ],
        input_variables=["industry_name"],
        partial_variables={"format_instructions": format_instructions}
    )

    search_term_messages_for_list_prompt = search_term_prompt.format_messages(industry_name=search_term,
                                    format_instructions=format_instructions)

    search_term_output = llm(search_term_messages_for_list_prompt)

    related_terms = output_parser.parse(search_term_output.content)

    # Operation II : Get Nearby Locations

    search_location_prompt = ChatPromptTemplate(
        messages=[
            HumanMessagePromptTemplate.from_template(SEARCH_LOCATION_TEMPLATE_STRING)
        ],
        input_variables=["location_name"],
        partial_variables={"format_instructions": format_instructions}
    )

    search_location_messages_for_list_prompt = search_location_prompt.format_messages(location_name=search_location,
                                    format_instructions=format_instructions)

    search_location_output = llm(search_location_messages_for_list_prompt)

    nearby_locations = output_parser.parse(search_location_output.content)

    # Operation III : Get Relevant Google Map Searches

    gm_search = [f"{related_terms[i].capitalize()} in {nearby_locations[j].capitalize()}" for i in range(len(related_terms)) for j in range(len(nearby_locations))]

    return gm_search

  except Exception as e:

    print(f"An error occurred: {e}")

    return None

#**Testing**

In [7]:
#@title **User Input**

search_term = 'Indian Restaurants' #@param {type:'string'}
search_location = 'London' #@param {type:'string'}

gm_search = get_gm_search(search_term, search_location)

gm_search

['Curry houses in Oxford',
 'Curry houses in Cambridge',
 'Curry houses in Brighton',
 'Curry houses in Bristol',
 'Curry houses in Reading',
 'Curry houses in Southampton',
 'Curry houses in Birmingham',
 'Curry houses in Manchester',
 'Curry houses in Liverpool',
 'Curry houses in Cardiff',
 'Tandoori restaurants in Oxford',
 'Tandoori restaurants in Cambridge',
 'Tandoori restaurants in Brighton',
 'Tandoori restaurants in Bristol',
 'Tandoori restaurants in Reading',
 'Tandoori restaurants in Southampton',
 'Tandoori restaurants in Birmingham',
 'Tandoori restaurants in Manchester',
 'Tandoori restaurants in Liverpool',
 'Tandoori restaurants in Cardiff',
 'Biryani restaurants in Oxford',
 'Biryani restaurants in Cambridge',
 'Biryani restaurants in Brighton',
 'Biryani restaurants in Bristol',
 'Biryani restaurants in Reading',
 'Biryani restaurants in Southampton',
 'Biryani restaurants in Birmingham',
 'Biryani restaurants in Manchester',
 'Biryani restaurants in Liverpool',
 'B

#**Future Updates Information**

* We are currently utilizing the `text-davinci-003` model from `OpenAI` for our project. While this model is cost-effective at $0.02 per 1000 tokens, it is not entirely `FREE`. Our decision to use this model as a baseline is strategic, as it allows us to focus on developing the next milestones without dedicating excessive time to selecting the ideal model.
Additionally, transitioning to a more suitable model for our use-case is straightforward, requiring just a few lines of script updates. Moving forward, I will explore various open-source models to achieve similar, if not better, accuracy and latency without incurring any costs i.e. `FREE`. This parallel exploration will enable us to enhance our project's performance while ensuring cost efficiency.

* Furthermore, while LLM models are powerful, they are still susceptible to hallucinations. There may be rare scenarios where the `related_terms` and `nearby_locations` generated are not as expected in terms of `format` or `correctness`. Therefore, I recommend implementing a validator script to verify these results after they are generated. It is important to note that while this step is necessary, it may increase latency and potentially lead to a poorer user experience. As a result, I have not implemented it in our current version. I would appreciate your feedback on how we should proceed with this matter.

* Lastly, I'd like to draw attention to our error handling approach at the end of our script. While errors are highly unlikely, it is not a standard practice in application development to simply print them. It is recommended that in such cases, we log the error in our database for e.g., `Azur` along with the `client_id, date_time and process_id`. These parameters will become more clear once the application reaches `milestone III`. As we progress towards `milestone III`, I will request your Azure credentials to implement this `logging mechanism`. Your feedback on this approach is also appreciated.