### industryGPT - Internal Apolo's use case

This is a python notebook to illustrate the system used internally by Apolo Mktng to classify potential companies. The way to do this is by enriching the basic set of useful data points to filter potential target accounts. 

Here is the data points enriched in this specific demo:
- **Industry**: Industry in which the company operates.
- **Business Model**: Business Model in which the company operates.
- **Client Focus**: Subset of customer business models does the company target.
- **End Buyer**: Subset of departments the company targets.
- **Enrichment Points**: Potential data points the company would be interested in enriching.


#### Importing
Importing the relevant modules for the test.  

In [1]:
from openai import OpenAI
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
from urllib.request import urlopen, Request
from dotenv import load_dotenv
from datetime import datetime
import json
import pandas as pd
import requests



##### OpenAI functions
Defining different set of OpenAI functions to work around for, dividing between models, type of response output and more...

In [2]:
load_dotenv()
client = OpenAI()

def generate_response(prompt):
    response = client.chat.completions.create(
    model="gpt-4-1106-preview",
    response_format={ "type": "json_object" },
    messages=[
        {"role": "system", "content": "You are a helpful assistant, expert in Analysing Companies."},
        {"role": "user", "content": str(prompt)},
    ],
    temperature=0
    )
    selection = response.choices[0].message.content
    return selection

def generate_response_feedback(initial_question, system_answer, feedback):
    response = client.chat.completions.create(
    model="gpt-4-1106-preview",
    response_format={ "type": "json_object" },
    messages=[
        {"role": "system", "content": "You are a helpful assistant, expert in Analysing Companies."},
        {"role": "user", "content": str(initial_question)},
        {"role": "system", "content": str(system_answer)},
        {"role": "user", "content": str(feedback)}
    ],
    temperature=0
    )
    selection = response.choices[0].message.content
    return selection

def generate_response_gpt3(prompt):
    response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant, expert in Analysing Companies."},
        {"role": "user", "content": str(prompt)},
    ],
    max_tokens=300,
    temperature=0
    )
    selection = response.choices[0].message.content
    return selection



#### Scrapping functions
Function to scrape contents of website based on module requests and BeautifulSoup for parsing and extraction of info.

In [3]:
def retrieve_html(url):
    headers = {'User-Agent': 'Mozilla/5.0'}
    req = Request(url, headers=headers)
    html = urlopen(req).read()
    soup = BeautifulSoup(html, features="html.parser")

    for script in soup(["script", "style"]):
        script.extract() 

    text = soup.get_text()
    lines = (line.strip() for line in text.splitlines())
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    text = '\n'.join(chunk for chunk in chunks if chunk)
    return text


Function to make google search query.

In [4]:

def selenium_search(query):
    # Replace these with your own API key and Custom Search Engine ID
    API_KEY = 'AIzaSyDzYsqSPrwhD6C1oraPyokaJ6JT-XiFPi4'
    SEARCH_ENGINE_ID = '3232eee26d51543f1'

    # Base URL for Google Custom Search
    url = 'https://www.googleapis.com/customsearch/v1'

    # Parameters for the search
    params = {
        'key': API_KEY,
        'cx': SEARCH_ENGINE_ID,
        'q': query
    }

    # Send the GET request
    response = requests.get(url, params=params)

    # Check if the request was successful
    if response.status_code == 200:
        results = response.json()
        # Return the first search result
        if 'items' in results:
            return results['items'][0]['link']
        else:
            return 'No results found'
    else:
        return f'Error: {response.status_code}'

#### Client Focus Prompt

Prompt to classify for Client Focus of a user company. By client focus we refer to what subset of market business model does the company target with their product / services.

In [5]:
clients_focus = ["B2C", "B2B", "B2B/B2C"]
prompt_client_focus = """

You are given the description of a company retrieved from a company's website.

You will need to categorize the company within a Client Focus according to the taxonomy below.

There can only be ONE client focus. The answer needs to be concise and only can follow the taxonomy.

DO NOT come up with any taxonomy. STICK to the taxonomy below. 

Give your results in the format JSON  - 

'client_focus': [client focus] 


Here the taxonomy for the client focus:
B2C
B2B
B2B/B2C
"""

#### Industry Prompt

Classify the industry of the a company based on what subset of company they are. Startup, MidMarket and Corporte are the different subsets. It all depends on the number of employees and founded date data points.

In [6]:
startup_industries = ["Deep Tech", "Edtech", "Fintech", "Foodtech", "Healthtech", "Insurtech", "Lawtech", "Salestech & Martech", "Mobility", "Energy", "Big data", "Cybersecurity", "Media & Telecom", "Consumer electronics", "Esports & Gaming", "Agritech", "Regtech", "Impact & diversity", "HRtech", "Cleantech", "Traveltech", "Miscellaneous", "Proptech"]
midmarket_industries = ["Finance", "Legal", "Digital", "Staffing", "Content & Marketing", "Cloud & infrastructure", "Web 3", "Data protection", "Energy", "Pharmaceutical", "Medical equipment", "Apparel", "Beauty", "Consumer goods", "Logistics & delivery", "Luxury", "FMCG", "Vehicle production", "Building & construction", "Sport", "In-store retail", "Real estate service", "Hotel & accomodation", "Restaurant and catering", "Manufacturing", "Education", "Miscellaneous", "Travel", "Agriculture"]
corporate_industries = ["Finance & Legal", "Technology", "Media & Telecommunication", "Consulting", "Insurance", "Recruitment", "Construction", "Energy & Chemical", "Automotive", "Retail", "Real estate", "Healthcare", "Travel", "Fashion", "Food & Beverage", "Education", "Logistics & Transportation", "Environmental", "Food & Beverage", "Education", "Logistics & Transportation", "Miscellaneous"]
business_models = ["SaaS", "Marketplace", "Ecommerce", "Service", "Manufacturing"]


prompt_industry_business =  f"""

You are given the description of a company retrieved from a company's website.

You will need to categorize the company within their Industry and Business Model Category according to the taxonomy below.

There can only be ONE industry and ONE business model. The answer needs to be concise and only can follow the taxonomy.

DO NOT come up with any taxonomy. STICK to the taxonomy below. 

Give your results in the format format JSON - 

'industry': [industry] 
'business_model': [business model]

Here the taxonomy for each of the Industries and Business Models - 

Business Models:
SaaS
Marketplace
Ecommerce
Service
Manufacturing 

Industries:

"""



#### End Buyer Prompt
Classify who the subset of department for which the company target. This is divided in departments to be concise about it.

In [8]:
end_buyers = ["Sales", "Marketing", "HR", "Finance", "Tech & Data", "Legal", "Procurement", "Client support", "CSE", "ESG", "Communication", "Consumer"]
prompt_end_buyer =  f"""

You are given the description of a company retrieved from a company's website.

You will need to categorize the company within an End Buyer category according to the taxonomy below.

There can only be ONE end buyer The answer needs to be concise and only can follow the taxonomy.

DO NOT come up with any taxonomy. STICK to the taxonomy below. 

Give your results in the format format JSON - 

'end_buyer': [end buyer] 

Here the taxonomy for each of the Industries and Business Models - 

End Buyer:
Sales
Marketing
HR
Finance
Tech & Data
Legal
Procurement
Client support
CSE
ESG
Communication
Consumer
"""

In [9]:
prompt_enrichment_capabilities = '''

Based on the description of this company, can you come up with a list of key-points that a COO could be interested in knowing with regards to their account prospects? 

Only list out data points that could be easily accessible through scrapping a prospect's websites. KPIs that could be protected intellectually can be difficult to retrieve so don't mention it.

Ouput in a json format:

{
    opp_enrichment: {
        "data_point_1",
        "data_point_2",
        ....
    }
}

Be concise and output the 5-10 most relevant datapoints. Don't include widely available ones like employee number.

Here the description of the company:

'''

In [10]:
organize_prompt = "From the following scrapped text of a website explain what the business does. The text will be poorly written so take that in mind. Write everything in third person naming the company. Output only a description using key words of the industry. Here the scrapped code/text: "
header_url= "Here the description of the company deducted from their website: "

In [11]:
def truncate_string(input_string):
    if len(input_string) <= 3000:
        return input_string
    else:
        return input_string[:3000]

def format_url(url):
    if url.startswith("http://www."):
        url = url.replace("http://www.", "https://www.")
    elif url.startswith("http://"):
        url = url.replace("http://", "https://www.")
    elif url.startswith("www."):
        url = "https://" + url
    elif not url.startswith("https://"):
        url = "https://www." + url
    return url

In [12]:
def business_status(company_employees, company_founded_date):
    try:
        company_age = 2024 - company_founded_date
    except Exception:
        print('Founded date not provided or NaN.')
        company_age = 0
    
    if (company_employees < 200 and company_age <= 5) or company_employees < 200:
        return 'Startup'
    elif (201 <= company_employees <= 1000 and 3 <= company_age < 15) or 201 <= company_employees <= 1000:
        return 'MidMarket'
    elif (company_employees >= 1001 or company_age >= 15) or company_employees >= 1001:
        return 'Corporate'
    else:
        return 'Uncategorized'

In [13]:
exec = ThreadPoolExecutor(12)

def industryGPT(name, url, company_id=None, company_employees=None, company_founded_date=None, extra_descriptors=None):
    print('Enriching: ', name)
    print('With URL: ', url)
    print('Company ID: ', company_id)

    full_response = {

            "company_id": str(company_id),
            "metadata": {
                "timestamp": str(datetime.now()),
                "source": "industryGPT"
            },
            "company_profile": {
                "website": str(url),
                "n_employees": str(company_employees),
                "founded_date": str(company_founded_date),
                "business_status": None,
                "industry": None,
                "business_model": None,
                "end_buyer": None,
                "client_focus": None,
                "enrichment": None,
                "description": None,
            }

        }
    
    # Categorise business status
    if company_employees != None:
        try:
            b_status = business_status(company_employees, company_founded_date)
            full_response["company_profile"]["business_status"] = b_status
            print('Company categorised as: ', b_status)
        except Exception as e:
            print(e)
            print("Error categorising business_status.")
            return Exception
    else:
        b_status = 'Uncategorized'
        full_response["company_profile"]["business_status"] = b_status
        print('Company categorised as: ', b_status)
    
    try:
        # Categorise industry & business model
        url_text = retrieve_html(format_url(url))
        print('\n-> Retrieved text from website...')
    except Exception as e:
        print('Unaccessible URL as per error ->', e)
        print('Inputting description from LinkedIn.')
        url_text = extra_descriptors

    
    if len(url_text) > 30:
        print('-> Crafting company description...')
        description_openai = generate_response_gpt3(organize_prompt + truncate_string(url_text))
        print('\nDescription of the company: ', description_openai)

    else:
        print('Scrapped website or extra descriptors have less than 100 characters...')
        print('Trying to search on google a new page...')
        new_url = selenium_search(format_url(url))
        url_text = retrieve_html(new_url)
        description_openai = generate_response_gpt3(organize_prompt + truncate_string(url_text))
        print('\nDescription of the company: ', description_openai)

    # Save description
    full_response["company_profile"]["description"] = description_openai


    # Define specific industry
    if b_status == 'Startup':
        industry_categories = startup_industries
        selected_industries = ', '.join(industry_categories)

    if b_status == 'MidMarket':
        industry_categories = midmarket_industries
        selected_industries = ', '.join(industry_categories)

    if b_status == 'Corporate':
        industry_categories = corporate_industries
        selected_industries = ', '.join(industry_categories)
    
    if b_status == 'Uncategorized':
        industry_categories = startup_industries
        selected_industries = ', '.join(industry_categories)
    

    # Categorise industry & business Model
    response_industry = exec.submit(generate_response, (prompt_industry_business +
                                selected_industries +
                                header_url +
                                description_openai))
    
    # Categorise client Focus
    response_clientfocus = exec.submit(generate_response, (prompt_client_focus +
                                header_url +
                                description_openai))
    
    # Categorise end Buyer
    response_endbuyer = exec.submit(generate_response, (prompt_end_buyer +
                                header_url +
                                description_openai))

    # Categorise enrichment
    response_enrichment = exec.submit(generate_response, (
                                prompt_enrichment_capabilities + 
                                description_openai))
    
    result_industry = response_industry.result()
    result_clientfocus = response_clientfocus.result()
    result_endbuyer = response_endbuyer.result()
    result_enrichment = response_enrichment.result()

    response_dict_industry = json.loads(result_industry)


    # Is the categorization correct for industry & business model? 
    industry = response_dict_industry["industry"]
    business_model = response_dict_industry["business_model"]


    tries_industry = 0
    while (industry not in industry_categories or business_model not in business_models) and tries_industry < 2:
        if tries_industry >= 2:
            print('Industry or Business Model not in category.')
            print('GPT failed twice, returning NaN in Industry & Business Model.')
            industry = None
            business_model = None
            break
        
        print('Industry or Business Model not in category.')
        print('Faulty Industry: ', industry)
        print('Faulty Business Model: ', business_model)

        initial_question = (prompt_industry_business + selected_industries + header_url + description_openai)
        system_answer = "Industry: " + industry + " Business Model: " + business_model
        feedback = "The Industry and/or Business Model is not within taxonomy... retry please and stay within taxonomy."
        # Resubmit the prompt
        retry_industry = exec.submit(generate_response_feedback, initial_question, system_answer, feedback)
        retry_industry_dict = retry_industry.result()
        retry_industry_dict_json = json.loads(retry_industry_dict)
        industry = retry_industry_dict_json["industry"]
        business_model = retry_industry_dict_json["business_model"]
        tries_industry += 1


    full_response["company_profile"]["industry"] = industry
    full_response["company_profile"]["business_model"] = business_model

    # Is the categorization correct for client focus? 
    response_dict_clientfocus = json.loads(result_clientfocus)
    client_focus = response_dict_clientfocus["client_focus"]
    
    tries_client_focus = 0
    while client_focus not in clients_focus and tries_client_focus < 2:
        if tries_client_focus >= 2:
            print('Client focus not in category.')
            print('GPT failed twice, returning NaN in Client focus.')
            client_focus = None
            break
        
        print('Client Focus not in category.')
        print('Faulty Client Focus: ', client_focus)
        initial_question = (prompt_client_focus+ header_url + description_openai)
        system_answer = "Client Focus: " + client_focus
        feedback = "The client focus is not within taxonomy... retry please and stay within the provided taxonomy."
        # Resubmit the prompt
        retry_clientfocus = exec.submit(generate_response_feedback, initial_question, system_answer, feedback)
        retry_clientfocus_dict = retry_clientfocus.result()
        retry_clientfocus_dict_json = json.loads(retry_clientfocus_dict)
        client_focus = retry_clientfocus_dict_json["client_focus"]
        tries_client_focus += 1
    
    full_response["company_profile"]["client_focus"] = client_focus   
    
    # Is the categorization correct for end buyer?
    response_dict_endbuyer = json.loads(result_endbuyer)
    end_buyer = response_dict_endbuyer["end_buyer"]

    tries_end_buyer = 0
    while end_buyer not in end_buyers and tries_end_buyer < 2:
        if tries_end_buyer >= 2:
            print('End buyer not in category.')
            print('GPT failed twice, returning NaN in end buyers.')
            end_buyer = None
            break

        print('End Buyer not in category.')
        print('Faulty End buyer: ', end_buyer)
        initial_question = (prompt_end_buyer+ header_url + description_openai)
        system_answer = "End Buyer: " + end_buyer
        feedback = "The end buyer is not within taxonomy... retry please and stay within the provided taxonomy."
        # Resubmit the prompt
        retry_endbuyer = exec.submit(generate_response_feedback, initial_question, system_answer, feedback)
        retry_endbuyer_dict = retry_endbuyer.result()
        retry_endbuyer_dict_json = json.loads(retry_endbuyer_dict)
        end_buyer = retry_endbuyer_dict_json["end_buyer"]
        tries_end_buyer += 1
        

    full_response["company_profile"]["end_buyer"] = end_buyer
    
    response_dict_enrichment = json.loads(result_enrichment)
    
    enrichment = response_dict_enrichment["opp_enrichment"]

    full_response["company_profile"]["enrichment"] = enrichment

    full_response_json = json.dumps(full_response)
    json_string_pretty = json.dumps(full_response, indent=2)
    print('')
    print(json_string_pretty)
    print('--------------------------------')

    return full_response_json 

#### Function for timeout handling

In [15]:
import signal
import time

# Define a timeout handler
class TimeoutException(Exception):
    pass

def timeout_handler(signum, frame):
    raise TimeoutException

# Function to run with timeout
def run_function_with_timeout(name, url, company_id, employee_count, founded_date, extra_descriptors):
    # Set the signal handler for the timeout
    signal.signal(signal.SIGALRM, timeout_handler)
    
    # Start the timer
    signal.alarm(45)  # 45 seconds timeout

    try:
        start_time = time.time()
        
        # Directly call the function
        result = industryGPT(name, url, company_id, employee_count, founded_date, extra_descriptors)
        
        elapsed_time = time.time() - start_time
        print(f"industryGPT: Completed in {elapsed_time:.2f} seconds.")
        
        # Cancel the alarm
        signal.alarm(0)
        
        return result
    except TimeoutException:
        print('Result took too long to output.')
        return None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# Remember to reset the alarm if the function completes earlier
signal.alarm(0)


0

#### Grouped main function

In [16]:
def industryGPT_mass(name, url, company_id, employee_count=0, founded_date=0, extra_descriptors=None):
    tries = 1

    while tries <= 4:
        try:
            if tries <= 2:
                response = run_function_with_timeout(name, format_url(url), company_id, employee_count, founded_date, extra_descriptors)

                if response == None:
                    print('Response is: ', response)
                    raise Exception
                else:
                    return response
            else:
                new_url = selenium_search(url)
                print('Trying with website:', format_url(format_url(new_url)))
                print('--------------------------------')
                response = run_function_with_timeout(name, new_url, company_id, employee_count, founded_date, extra_descriptors)
                
                if response == None:
                    print('Response is: ', response)
                    raise Exception
                else:
                    return response

    
        except Exception as e:
            print('Error, exception as e: ', e)
            tries += 1
    
    return None

#### Enrichment via .csv

In [17]:
accounts_test = pd.read_csv('Apolo Sprint Enrich v1 [checkpoint 1].csv')
accounts_test.columns

Index(['Company', 'Company Name for Emails', 'Account Stage', 'Lists',
       '# Employees', 'Industry', 'Account Owner', 'Website',
       'Company Linkedin Url', 'Facebook Url', 'Twitter Url', 'Company Street',
       'Company City', 'Company State', 'Company Country',
       'Company Postal Code', 'Company Address', 'Keywords', 'Company Phone',
       'SEO Description', 'Technologies', 'Total Funding', 'Latest Funding',
       'Latest Funding Amount', 'Last Raised At', 'Annual Revenue',
       'Number of Retail Locations', 'Apollo Account Id', 'SIC Codes',
       'Short Description', 'Founded Year', 'Logo Url', 'Business Status',
       'Business Model', 'End Buyer', 'Client Focus', 'Description of company',
       'Data points to enrich', 'Enrichment data points'],
      dtype='object')

In [18]:
accounts_test

Unnamed: 0,Company,Company Name for Emails,Account Stage,Lists,# Employees,Industry,Account Owner,Website,Company Linkedin Url,Facebook Url,...,Short Description,Founded Year,Logo Url,Business Status,Business Model,End Buyer,Client Focus,Description of company,Data points to enrich,Enrichment data points
0,Serent Capital,Serent Capital,Cold,[IndustryGPT] Jan Aguilas,94,Miscellaneous,asanchez@apolomarketing.net,http://www.serentcapital.com,http://www.linkedin.com/company/serent-capital,,...,Serent Capital invests in growing businesses t...,2008.0,https://zenprospect-production.s3.amazonaws.co...,Startup,Service,Finance,B2B,Serent Capital is a private equity firm that s...,,
1,DFJ,DFJ,Cold,[IndustryGPT] Jan Aguilas,310,Finance,asanchez@apolomarketing.net,http://www.dfj.com,http://www.linkedin.com/company/draper-fisher-...,http://www.facebook.com/pages/DFJ/110070025689158,...,DFJ is a venture capital firm that partners wi...,1985.0,https://zenprospect-production.s3.amazonaws.co...,MidMarket,Service,Finance,B2B,Threshold is a venture capital firm that speci...,,{'funding_rounds': 'The number and size of fun...
2,SurePayroll,SurePayroll,Cold,[IndustryGPT] Jan Aguilas,310,Finance,asanchez@apolomarketing.net,http://www.surepayroll.com,http://www.linkedin.com/company/surepayroll,https://www.facebook.com/SurePayroll,...,SurePayroll provides online payroll services.,2000.0,https://zenprospect-production.s3.amazonaws.co...,MidMarket,SaaS,Finance,B2B,SurePayroll is an online payroll service that ...,,"{'website_technologies': ""Technologies used on..."
3,Crosslink Capital,Crosslink Capital,Cold,[IndustryGPT] Jan Aguilas,67,Miscellaneous,asanchez@apolomarketing.net,http://www.crosslinkcapital.com,http://www.linkedin.com/company/crosslink-capital,https://facebook.com/pages/Crosslink-Capital/2...,...,"Crosslink, founded in 1989, is a premier early...",1989.0,https://zenprospect-production.s3.amazonaws.co...,Startup,Service,Finance,B2B,Crosslink Capital is a venture capital firm th...,,"{'current_funding_stage': ""Identify if the pro..."
4,Anduin Transactions,Anduin Transactions,Cold,[IndustryGPT] Jan Aguilas,130,Fintech,asanchez@apolomarketing.net,http://www.anduintransact.com,http://www.linkedin.com/company/anduin-transac...,https://www.facebook.com/anduintransact,...,Anduin is empowering lasting investor relation...,2014.0,https://zenprospect-production.s3.amazonaws.co...,Startup,SaaS,Finance,B2B,Anduin is a company that specializes in revolu...,,"{'current_technology_stack': ""Technologies use..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1551,LeagueApps,LeagueApps,Cold,[IndustryGPT] Jan Aguilas,120,,asanchez@apolomarketing.net,,,,...,"LeagueApps is a fast-growing, venture-backed c...",,https://zenprospect-production.s3.amazonaws.co...,,,,,,,
1552,SparkXGlobal,SparkXGlobal,Cold,[IndustryGPT] Jan Aguilas,450,,asanchez@apolomarketing.net,,http://www.linkedin.com/company/sparkxglobal,,...,SparkXGlobal is at the forefront of MarTech. F...,,https://zenprospect-production.s3.amazonaws.co...,,,,,,,
1553,OutReachly by 500apps,OutReachly by 500apps,Cold,[IndustryGPT] Jan Aguilas,500,,asanchez@apolomarketing.net,http://www.outreachly.com,http://www.linkedin.com/company/outreachly-by-...,,...,,2019.0,https://zenprospect-production.s3.amazonaws.co...,,,,,,,
1554,Designer Web Agency,Designer Web Agency,Cold,[IndustryGPT] Jan Aguilas,500,,asanchez@apolomarketing.net,http://www.designerwebagency.com,http://www.linkedin.com/company/designerwebagency,,...,Designer Web Agency is a leading digital marke...,2009.0,https://zenprospect-production.s3.amazonaws.co...,,,,,,,


In [19]:
df = accounts_test
df['Business Status'] = None
df['Industry'] = None
df['Business Model'] = None
df['End Buyer'] = None
df['Client Focus'] = None
df['Description of company'] = None
df['Data points to enrich'] = None

name_column = 'Company'
website_column = 'Website'
founded_column = 'Founded Year'
employees_column = '# Employees'
# id_columns = 'sales_navigator_company_id'
company_description_columns = 'Short Description'

index_start = 104

# df = pd.read_csv('/Users/ismadoukkali/Desktop/industryGPT/industryGPT/scalability/test_scalability_ecomm [checkpoint 4].csv')

df['Description of company'] = df['Description of company'].astype(str)

for index, row in df.iterrows(): 
    if index_start <= index:
        name = row[str(name_column)]
        website = row [str(website_column)]
        employee_count = row[str(employees_column)]
        founded_date = row[str(founded_column)]
        company_id = index
        company_description = row[str(company_description_columns)]
        
        if website != None:
            try:
                print('Index: ', index)
                results = industryGPT_mass(name, website, company_id, employee_count, founded_date, company_description)
                print('\nNext...')
                print('\n')
                response_json = json.loads(results)
                df.at[index, 'Business Status'] = response_json["company_profile"]["business_status"]
                df.at[index, 'Industry'] = response_json["company_profile"]["industry"]
                df.at[index, 'Business Model'] = response_json["company_profile"]["business_model"]
                df.at[index, 'End Buyer'] = response_json["company_profile"]["end_buyer"]
                df.at[index, 'Client Focus'] = response_json["company_profile"]["client_focus"]
                df.at[index, 'Description of company'] = response_json["company_profile"]["description"]
                df.at[index, 'Enrichment data points'] = response_json["company_profile"]["enrichment"]
                df.to_csv('Apolo Sprint Enrich v1 [checkpoint 2].csv', index=False)

            except Exception as e:
                print(e)
                pass
        else:
            print('Company website not available, pass')
            pass



Index:  104
Enriching:  Gaingels
With URL:  https://www.gaingels.com
Company ID:  104
Company categorised as:  Startup

-> Retrieved text from website...
-> Crafting company description...

Description of the company:  Gaingels is a venture capital firm that focuses on investing in companies that embrace diversity. They are one of the largest investors in the world, aiming to deliver above-market returns while increasing visibility, representation, and access for underrepresented communities in venture capital. Gaingels co-invests with select venture capital leads in companies that prioritize building diverse and inclusive teams. They seek to drive top returns while also influencing the ecosystem and representing the LGBTQ community, its allies, and a diverse group of investors. Gaingels has invested over $800 million in their portfolio since 2019 and their portfolio includes over 2,000 companies, including more than 70 unicorns. They also provide educational content, community events,

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  The company is involved in data analysis and web scraping.

{
  "company_id": "412",
  "metadata": {
    "timestamp": "2024-01-23 01:36:24.896250",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.ewaycorp.com",
    "n_employees": "74",
    "founded_date": "2005.0",
    "business_status": "Startup",
    "industry": "Big data",
    "business_model": "SaaS",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B",
    "enrichment": {
      "website_technologies": "Technologies used on the prospect's website, such as analytics tools, CMS, e-commerce platforms, which can indicate sophistication and potential needs.",
      "web_traffic_estimates": "Estimates of monthly web traffic, which can indicate the scale of the prospect's online operations.",
      "online_presence_channels": "Presence on various channels like social media, blogs, forums, which can 

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  Based on the poorly written text, it is difficult to determine the exact nature of the business. However, some keywords and phrases suggest that the company may be involved in data analysis, web scraping, or artificial intelligence.

{
  "company_id": "443",
  "metadata": {
    "timestamp": "2024-01-23 01:46:30.511494",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.saucelabs.com",
    "n_employees": "330",
    "founded_date": "2008.0",
    "business_status": "MidMarket",
    "industry": "Data protection",
    "business_model": "SaaS",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B/B2C",
    "enrichment": {
      "product_offering": "List of services or products offered, indicating the company's focus areas in data analysis, web scraping, or AI.",
      "technology_stack": "Information on the technologies and tools used by the company, which

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  Based on the poorly written text, it is difficult to determine the exact nature of the business. However, some keywords and phrases suggest that the company may be involved in data analysis, web scraping, or software development.

{
  "company_id": "945",
  "metadata": {
    "timestamp": "2024-01-23 04:56:33.053534",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.selling.com",
    "n_employees": "130",
    "founded_date": "2019.0",
    "business_status": "Startup",
    "industry": "Big data",
    "business_model": "SaaS",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B",
    "enrichment": {
      "website_technologies": "Technologies used on the prospect's website, such as CMS, analytics tools, and e-commerce platforms",
      "web_traffic_estimates": "Estimated number of visitors and page views",
      "online_customer_reviews": "Sentiment a

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  The company is involved in the analysis and interpretation of data from various sources. They provide insights and recommendations to businesses based on their findings.

{
  "company_id": "988",
  "metadata": {
    "timestamp": "2024-01-23 05:12:25.101192",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.invisionapp.com",
    "n_employees": "490",
    "founded_date": "2011.0",
    "business_status": "MidMarket",
    "industry": "Data protection",
    "business_model": "Service",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B",
    "enrichment": {
      "industry_vertical": "The specific industry or vertical the prospect operates in, which can influence the type of data analysis they require.",
      "current_technologies": "Information about the technologies and tools currently used by the prospect for data analysis and other business operat

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  Based on the poorly written text, it is difficult to determine the exact nature of the business. However, some keywords and phrases suggest that the company may be involved in data analysis, web scraping, or artificial intelligence.

{
  "company_id": "1117",
  "metadata": {
    "timestamp": "2024-01-23 05:57:06.349315",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.getnerdio.com",
    "n_employees": "160",
    "founded_date": "2016.0",
    "business_status": "Startup",
    "industry": "Big data",
    "business_model": "SaaS",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B",
    "enrichment": {
      "product_service_offering": "List of products or services offered, which could indicate the company's main area of expertise or market focus.",
      "client_testimonials": "Statements from clients that can provide insight into the company's re

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  Based on the poorly written text, it is difficult to determine the exact nature of the business. However, some keywords and phrases suggest that the company may be involved in data analysis, web scraping, or software development.

{
  "company_id": "1136",
  "metadata": {
    "timestamp": "2024-01-23 06:03:25.053199",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.3enrollment.com",
    "n_employees": "89",
    "founded_date": "2019.0",
    "business_status": "Startup",
    "industry": "Big data",
    "business_model": "SaaS",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B",
    "enrichment": {
      "product_offering": "List of services or products offered, which may include data analysis, web scraping, or software development tools",
      "client_testimonials": "Feedback from clients that could indicate satisfaction and potential for long-

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.



-> Retrieved text from website...
-> Crafting company description...

Description of the company:  Based on the poorly written text, it is difficult to determine the exact nature of the business. However, some keywords and phrases suggest that the company may be involved in data analysis, web scraping, or artificial intelligence.

{
  "company_id": "1310",
  "metadata": {
    "timestamp": "2024-01-23 07:08:07.170704",
    "source": "industryGPT"
  },
  "company_profile": {
    "website": "https://www.tribalvision.com",
    "n_employees": "54",
    "founded_date": "2010.0",
    "business_status": "Startup",
    "industry": "Big data",
    "business_model": "SaaS",
    "end_buyer": "Tech & Data",
    "client_focus": "B2B",
    "enrichment": {
      "product_service_offering": "List of products or services offered, which could indicate the company's core competencies and potential needs",
      "technology_stack": "Information on the technology stack used by the company, such as programm