# Trends analyzer and SEO tool

The following Python notebook has all the code needed for the solution presented by team Asterion on the AI Hackathon. 

Our solution focuses on the 3 following keypoints: 
- Creating insights about your startup by using AI. 
- Finding recent and relevant news related to your buisiness field.
- Analyizing your startup's competition by using SEO techniques.
- Generating a PDF report with all the previous keypoints. 

## Understanding your business 

In [1]:
# For fetching and saving user data
import json 

def save_user_info(data, filename="user_info"):
    with open(filename + ".json", "w") as file: 
        json.dump(data, file)

# Save updated info
def update_json(json):
    with open("user_info.json", "w") as file: 
        file.write(json)

    print("user_info updated!")

# Load user_info current data
# user_info is a .json file with current user info
with open("user_info.json", "r") as file:
    data = json.load(file)


data

{'Name': 'AI LAB SCHOOL',
 'Business description': 'AI Lab School specializes in offering a comprehensive 4-month certification program in artificial intelligence tailored for individuals without prior coding experience. They provide an immersive learning experience with live online classes focusing on the latest AI trends and tools. The curriculum is designed to help students master AI technologies, enabling them to boost their careers in the digital world.',
 'Country': 'Mexico',
 'State': 'Mexico City',
 'Timezone': '',
 'Industry': 'Sci/Tech',
 'Classification': {'Category_id': 5,
  'Subcategory_id': 30,
  'Subsubcategory_id': 717,
  'Subsubsubcategory_id': 741},
 'Keywords_default': ['Artificial Intelligence'],
 'Keywords_ai': ['AI certification program',
  'artificial intelligence training',
  'no-code AI education',
  'AI career development',
  'immersive AI course']}

## Business Keywords

After providing a business description found on 'user_info.json', we use AI with ChatGPT 4.0 to create 5 keywords that are used on all the other parts of the code execution. 

In [4]:
from openai import OpenAI
from dotenv import load_dotenv
import os 

# Load environment variables from .env file
load_dotenv()

chatgpt_key = os.getenv("CHATGPT_KEY")
client = OpenAI(api_key=chatgpt_key)

def query_keywords(prompt):
    response = client.chat.completions.create(
    model = "gpt-4-turbo",
    temperature = 1,
    max_tokens = 3000,
    response_format={ "type": "json_object" },
    messages = [
        {"role": "system", "content": "You give concise answers. The output should be in JSON format."},
        {"role": "user", "content": prompt}
    ]
    )
    keywords_json = response.choices[0].message.content
    keywords_data = json.loads(keywords_json)
    keywords_list = keywords_data["keywords"]
    return keywords_list

prompt_base = """
Given the following buisiness description, give me 5 keywords that would be useful to use
in google trends to search for trends in this buisiness.\n
"""
prompt = prompt_base + data["Business description"]

data["Keywords"] = query_keywords(prompt)

# convert to json 
updated_json = json.dumps(data, indent=4)
update_json(updated_json)


user_info updated!


## News related to your business

We used News API. 

Staying updated with news related to your company is crucial as it keeps employees informed about strategic decisions, new initiatives, and industry trends. This knowledge helps align individual and departmental goals with the company's objectives, fosters innovation, and enhances job performance. Additionally, understanding public perceptions and competitor actions through media can boost the company's adaptability and competitiveness in the market.

With this code segment, you can read up news that are related to your startup (provided by the keywords).

In [7]:
import requests
from datetime import datetime, timedelta

all_keywords = data["Keywords_default"] + data["Keywords_ai"]

load_dotenv()
news_api = os.getenv('NEWS_API')

def search_news(keywords):
    
    # Base URL for the '/everything' endpoint
    URL = 'https://newsapi.org/v2/everything'
    
    # Calculate the date one week ago from today
    one_week_ago = (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%d')
    
    # Join keywords into a single string separated by ' OR ' for the query
    query = ' OR '.join(keywords)
    
    # Parameters for the API request
    params = {
        'q': query,
        'from': one_week_ago,
        'sortBy': 'relevance',
        'apiKey': news_api
    }
    
    # Send a GET request to the NewsAPI
    response = requests.get(URL, params=params)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Return the JSON response containing the news articles
        return response.json()['articles']
    else:
        # Return an error message if something went wrong
        return f"Error: {response.status_code}, {response.text}"


NEWS = search_news(all_keywords)
print(NEWS)

96449d683ed9458ea0050ed197f4acbb
[{'source': {'id': None, 'name': 'Nichepursuits.com'}, 'author': 'Maia Ellis', 'title': '21 of the Fastest Growing Remote Jobs to Take Your Career Online', 'description': 'Some of the fastest growing remote jobs may not be what you expect. Roles like personal assistant or project manager were once typically confined to the office — but not anymore! We all got a taste for remote work when…\nThe post 21 of the Fastest Growing Remo…', 'url': 'https://www.nichepursuits.com/fastest-growing-remote-jobs/', 'urlToImage': 'https://www.nichepursuits.com/wp-content/uploads/2024/05/rmote.png', 'publishedAt': '2024-06-24T16:38:57Z', 'content': 'Some of the fastest growing remote jobs may not be what you expect. Roles like personal assistant or project manager were once typically confined to the office but not anymore!\r\nWe all got a taste fo… [+10806 chars]'}, {'source': {'id': None, 'name': 'Hubspot.com'}, 'author': 'esantiago@hubspot.com (Erica Santiago)', 'title

## Competition analysis

SEO tool for finding key elements about the competition 

Using SerpAPI

In [8]:
load_dotenv()

serp_api = os.getenv("SERP_API")

def SEO_links(keywords):

    base_url = "https://serpapi.com/search.json"
    search_results = {} 
    
    for cont, keyword in enumerate(keywords):

        kwd_parsed = keyword.replace(" ", "_")
        params = {
            'engine': 'google',
            'q': kwd_parsed,  # Replace 'YourSearchTerm' with your actual search term
            'api_key': serp_api       
                }
        
        response = requests.get(base_url, params=params)
        if response.status_code == 200:
            print(f"Request {cont + 1}/{len(keywords)} successful!")
            # Assuming the response is in JSON format, parse and print the JSON data
            response_json = response.json()
            organic_results = response_json["organic_results"][0]
            search_results[organic_results["title"]] = organic_results["link"]
            
        else:
            print("Request failed with status code:", response.status_code)
            print(response.text)  # Print any error messages from the API

    return search_results

LINKS = SEO_links(all_keywords)
LINKS

Request 1/6 successful!
Request 2/6 successful!
Request 3/6 successful!
Request 4/6 successful!
Request 5/6 successful!
Request 6/6 successful!


In [33]:
# Call to Chatgpt-4 to analyze the competition 
from openai import OpenAI

load_dotenv()
chatgpt_key = os.getenv("CHATGPT_KEY")

def analyze_competition(competition_info:dict):

  client = OpenAI(api_key=chatgpt_key)

  ALL_RESULTS = []

  for key, value in competition_info.items():

    prompt = f"""
          The following page is called {key}. And this is the link: {value}. 
          If the link corresponds to a buisiness, analyze the page and give me hints of why it shows on top of Google
          search engine. Add the 'page_title' to the JSON. Its value should be the {key}. 
          Also include the link which value should be: {value}.
              """

    response = client.chat.completions.create(
      model = "gpt-4-turbo",
      temperature = 1,
      max_tokens = 3000,
      response_format={ "type": "json_object" },
      messages = [
        {
          "role": "system", "content": "You are a SEO and want to analyze competition from a specific market.\
          You will receive link, if the link is related to a company or buisiness, do whatever the \
          user tells you. The output should be in JSON format, add the key 'needs_analysis' and set its value to True if you did a SEO analysis, otherwise set to False."
        },
        {
          "role": "user", "content":prompt
        }
      ]
    )

    info_json = response.choices[0].message.content
    info_dict = json.loads(info_json)

    print(info_dict)

    ALL_RESULTS.append(info_dict)

  return ALL_RESULTS 



SEO = analyze_competition(LINKS)
print(SEO)

{'page_title': 'Artificial intelligence', 'link': 'https://en.wikipedia.org/wiki/Artificial_intelligence', 'needs_analysis': False}
{'needs_analysis': True, 'page_title': 'Best Artificial Intelligence Courses Online with Certificates', 'link': 'https://www.coursera.org/courses?query=artificial%20intelligence', 'SEO_analysis': {'reasons_for_ranking_high': ['High Domain Authority: Coursera is a well-established platform in the online education sector.', "Relevant Keywords: The URL includes highly relevant keywords such as 'artificial intelligence' which match common search queries.", 'Quality Content: The page likely provides comprehensive information about AI courses, which enhances user engagement and satisfaction.', "Strong Backlink Profile: Coursera's content is often linked by reputable educational and technology websites.", 'User Experience: Coursera has a user-friendly interface that provides smooth navigation and accessibility.', 'Page Performance: The website is optimized for fa

In [34]:
SEO

[{'page_title': 'Artificial intelligence',
  'link': 'https://en.wikipedia.org/wiki/Artificial_intelligence',
  'needs_analysis': False},
 {'needs_analysis': True,
  'page_title': 'Best Artificial Intelligence Courses Online with Certificates',
  'link': 'https://www.coursera.org/courses?query=artificial%20intelligence',
  'SEO_analysis': {'reasons_for_ranking_high': ['High Domain Authority: Coursera is a well-established platform in the online education sector.',
    "Relevant Keywords: The URL includes highly relevant keywords such as 'artificial intelligence' which match common search queries.",
    'Quality Content: The page likely provides comprehensive information about AI courses, which enhances user engagement and satisfaction.',
    "Strong Backlink Profile: Coursera's content is often linked by reputable educational and technology websites.",
    'User Experience: Coursera has a user-friendly interface that provides smooth navigation and accessibility.',
    'Page Performance

In [35]:
### FORMAT SEO RESULTS
def format_seo_data(entries):
    # List to hold the formatted dictionaries
    formatted_entries = []

    # Iterate over each dictionary in the input list
    for entry in entries:
        # Check if 'needs_analysis' is True and skip if not
        if entry.get('needs_analysis') in [True, 'True', 'true']:
            # Initialize a new dictionary to hold selected keys
            
            new_entry = {}
            
            # Handle different title keys and assign the correct value to 'page_title'
            if 'page_title' in entry:
                new_entry['page_title'] = entry['page_title']
            elif 'business' in entry:
                new_entry['page_title'] = entry['business']
            
            # Assign the link value, checking for different key variations
            if 'link' in entry:
                new_entry['link'] = entry['link']
            elif 'page_url' in entry:
                new_entry['link'] = entry['page_url']
            
            # Assign SEO factors, checking for different key variations
            if 'SEO_factors' in entry:
                new_entry['SEO_factors'] = entry['SEO_factors']
            elif 'SEO_analysis' in entry:
                new_entry['SEO_factors'] = entry['SEO_analysis']
            elif 'SEO_analysis_hints' in entry:
                new_entry['SEO_factors'] = entry['SEO_analysis_hints']
            
            # Append the new dictionary to the list of formatted entries
            formatted_entries.append(new_entry)

    return formatted_entries


# Call the function and print the results
formatted_SEO = format_seo_data(SEO)
print("\n\n")
print(formatted_SEO)




[{'page_title': 'Best Artificial Intelligence Courses Online with Certificates', 'link': 'https://www.coursera.org/courses?query=artificial%20intelligence', 'SEO_factors': {'reasons_for_ranking_high': ['High Domain Authority: Coursera is a well-established platform in the online education sector.', "Relevant Keywords: The URL includes highly relevant keywords such as 'artificial intelligence' which match common search queries.", 'Quality Content: The page likely provides comprehensive information about AI courses, which enhances user engagement and satisfaction.', "Strong Backlink Profile: Coursera's content is often linked by reputable educational and technology websites.", 'User Experience: Coursera has a user-friendly interface that provides smooth navigation and accessibility.', 'Page Performance: The website is optimized for fast loading times and mobile devices, crucial for retaining visitors and reducing bounce rates.']}}, {'page_title': 'About - No Code AI for all', 'link': 

In [36]:
data["Industry"] = "Sci/Tech" # per Google Trends
LOGO_PATH = r"images\logo.png"
data

{'Name': 'AI LAB SCHOOL',
 'Business description': 'AI Lab School specializes in offering a comprehensive 4-month certification program in artificial intelligence tailored for individuals without prior coding experience. They provide an immersive learning experience with live online classes focusing on the latest AI trends and tools. The curriculum is designed to help students master AI technologies, enabling them to boost their careers in the digital world.',
 'Country': 'Mexico',
 'State': 'Mexico City',
 'Timezone': '',
 'Industry': 'Sci/Tech',
 'Classification': {'Category_id': 5,
  'Subcategory_id': 30,
  'Subsubcategory_id': 717,
  'Subsubsubcategory_id': 741},
 'Keywords_default': ['Artificial Intelligence'],
 'Keywords_ai': ['AI certification program',
  'artificial intelligence training',
  'no-code AI education',
  'AI career development',
  'immersive AI course'],
 'Keywords': ['AI certification program',
  'artificial intelligence',
  'immersive learning',
  'no coding expe

## Generate report

With all the previous info, we generate a pdf that showcases this information

In [51]:
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib import utils
from reportlab.lib.units import inch
from reportlab.lib.styles import getSampleStyleSheet
from datetime import datetime

def create_report(news_data, logo_path, company_name, data, seo_data):
    pdf_path = "Report.pdf"
    c = canvas.Canvas(pdf_path, pagesize=A4)
    width, height = A4
    styles = getSampleStyleSheet()
    normal_style = styles['Normal']

    def draw_wrapped_text(c, text, x, y, max_width, line_height):
        words = text.split(' ')
        line = ''
        for word in words:
            if c.stringWidth(line + word + ' ', 'Helvetica', 10) <= max_width:
                line += word + ' '
            else:
                c.drawString(x, y, line)
                line = word + ' '
                y -= line_height
        c.drawString(x, y, line)
        return y - line_height

    def draw_paragraph(c, text, x, y, max_width, line_height):
        paragraphs = text.split('\n')
        for paragraph in paragraphs:
            y = draw_wrapped_text(c, paragraph, x, y, max_width, line_height)
        return y

    # Set font for the title and add a title
    c.setFont("Helvetica-Bold", 16)
    full_title = f"Report Analysis for {company_name}"
    c.drawCentredString(width / 2, height - 50, full_title)

    # Optional: Add a logo
    if logo_path:
        logo = utils.ImageReader(logo_path)
        c.drawImage(logo, 10, height - 80, width=60, preserveAspectRatio=True, mask='auto')

    # Add company information
    c.setFont("Helvetica-Bold", 10)
    y_position = height - 120
    c.drawString(10, y_position, "Country:")
    c.setFont("Helvetica", 10)
    y_position -= 15
    c.drawString(20, y_position, data['Country'])

    c.setFont("Helvetica-Bold", 10)
    y_position -= 30
    c.drawString(10, y_position, "Industry:")
    c.setFont("Helvetica", 10)
    y_position -= 15
    c.drawString(20, y_position, data['Industry'])

    c.setFont("Helvetica-Bold", 10)
    y_position -= 30
    c.drawString(10, y_position, "Business Description:")
    c.setFont("Helvetica", 10)
    y_position -= 15
    y_position = draw_paragraph(c, data['Business description'], 20, y_position, width - 40, 15)

    # Add news articles
    y_position -= 30
    c.setFont("Helvetica-Oblique", 12)
    c.drawString(10, y_position, "News articles related to your company")
    y_position -= 20

    for article in news_data:
        date_obj = datetime.strptime(article['publishedAt'], '%Y-%m-%dT%H:%M:%SZ')
        formatted_date = date_obj.strftime('%d-%m-%Y')

        c.setFont("Helvetica-Bold", 12)
        c.drawString(10, y_position, article['title'])
        y_position -= 15

        c.setFont("Helvetica", 10)
        c.drawString(10, y_position, f"Date: {formatted_date}")
        y_position -= 15
        c.drawString(10, y_position, f"Publisher: {article['source']['name']}")
        y_position -= 15

        c.setFillColorRGB(0, 0, 1)
        c.drawString(10, y_position, f"URL: {article['url']}")
        c.setFillColorRGB(0, 0, 0)
        y_position -= 15

        c.setFont("Helvetica-Bold", 10)
        c.drawString(10, y_position, "Description:")
        y_position -= 15

        y_position = draw_paragraph(c, article['description'], 10, y_position, width - 20, 15)
        y_position -= 10

        c.setStrokeColorRGB(0, 0, 0)
        c.line(10, y_position, width - 10, y_position)
        y_position -= 20

        if y_position < 50:
            c.showPage()
            y_position = height - 50

    # Add SEO analysis
    c.showPage()
    c.setFont("Helvetica-Oblique", 12)
    c.drawCentredString(width / 2, height - 50, "SEO ANALYSIS")

    y_position = height - 100
    c.setFont("Helvetica", 10)

    for seo in seo_data:
        c.setFont("Helvetica-Bold", 12)
        c.drawString(10, y_position, seo['page_title'])
        y_position -= 15

        c.setFont("Helvetica-Oblique", 10)
        c.drawString(10, y_position, f"Link: {seo['link']}")
        y_position -= 15

        c.setFont("Helvetica", 10)
        if 'SEO_factors' in seo:
            for key, value in seo['SEO_factors'].items():
                c.setFont("Helvetica-Bold", 10)
                c.drawString(10, y_position, f"- {key}:")
                y_position -= 15

                c.setFont("Helvetica", 10)
                if isinstance(value, list):
                    value = ', '.join(value)
                y_position = draw_paragraph(c, value, 20, y_position, width - 40, 15)
                y_position -= 15
        elif 'reasons_for_high_ranking' in seo:
            for reason in seo['reasons_for_high_ranking']:
                c.setFont("Helvetica-Bold", 10)
                c.drawString(10, y_position, "- Reason:")
                y_position -= 15

                c.setFont("Helvetica", 10)
                if isinstance(reason, list):
                    reason = ', '.join(reason)
                y_position = draw_paragraph(c, reason, 20, y_position, width - 40, 15)
                y_position -= 15

        if y_position < 50:
            c.showPage()
            y_position = height - 50

    # Save the PDF
    c.save()

# Example usage:
# Assuming `news_data` is a list of dictionaries containing news information and `data` containing company details:
# Replace 'YOUR_COMPANY_LOGO.png' and 'Your Company Name' with appropriate values and uncomment the line below to test

In [52]:
create_report(NEWS, LOGO_PATH, data["Name"], data, formatted_SEO)