### From a File


In [None]:
from main import create_taxonomy

filename = "cais_data_expanded.csv"
brand_terms = [
    "cais",
    "cais group",
    "glas",
    "glas funds",
    "halo",
    "halo investing",
    "icapital",
    "icapital network",
]
website_subject = "Alternate Investing Platform"

taxonomy, df, samples = create_taxonomy(
    filename,
    text_column="keyword",
    search_volume_column="search_volume",
    min_df=5,
    brand_terms=brand_terms,
)

df.to_csv("cais_data_taxonomy.csv", index=False)

print("\n".join(taxonomy))

In [None]:
from main import create_taxonomy

filename = "HM Raw Data.csv"
website_subject = "Houston Methodist Hospital"

brand_terms = [
    "luke",
    "lukes",
    "md anderson",
    "anderson",
    "hca",
    "stlukes",
    "memorial hermann",
    "hermann",
    "herman",
    "houston methodist",
    "methodist",
    "st joseph",
    "joseph",
]

taxonomy, df, samples = create_taxonomy(
    filename,
    website_subject=website_subject,
    text_column="keyword",
    search_volume_column="search_volume",
    min_cluster_size=10,
    min_samples=3,
    limit_queries=3,
    ngram_range=(1, 5),
    min_df=10,
    brand_terms=brand_terms,
)


df.to_csv("HM_Raw_Data_ngram_taxonomy2.csv", index=False)
print("\n".join(taxonomy))

### From a GSC Account

In [None]:
from main import create_taxonomy

brand_terms = ["ledgeloungers", "ledge"]
property = "sc-domain:ledgeloungers.com"
website_subject = "Ledge Lounger offers luxury pool & outdoor furniture designed to create perfect spaces for outdoor entertaining and relaxation."

taxonomy, df, samples = create_taxonomy(
    property,
    website_subject=website_subject,
    days=30,
    ngram_range=(1, 5),
    min_df=5,
    brand_terms=brand_terms,
    limit_queries_per_page=5,
)


df.to_csv("ledgeloungers_taxonomy.csv", index=False)

df.head()

In [5]:
df.head()

Unnamed: 0,keyword,position,previous_position,position_difference,url,traffic,number_of_results,keyword_difficulty,timestamp,intent_commercial,...,sv_1,sv_12,sv_11,sv_10,sv_9,sv_8,sv_7,search_volume_percentiles,corpus,category
38029,elcy john,1,1,0,https://www.memorialhermann.org/doctors/ob-gyn...,56,346000,21,1688396090,0,...,90,90,90,70,90,170,90,60,person,Person
10405,breast cancer and wine,1,1,0,https://www.mdanderson.org/publications/focuse...,7,23600000,50,1687319732,0,...,210,170,210,260,170,170,140,80,breast cancer wine,Breast Cancer
75246,methodist hospital houston texas medical center,1,1,0,https://www.houstonmethodist.org/,32,6060000,77,1688065061,0,...,210,210,260,210,170,210,210,80,organization,Organization
75247,methodist hospital houston tx,1,1,0,https://www.houstonmethodist.org/,5,7150000,81,1687809582,0,...,1000,880,1000,1300,1000,1300,1000,100,methodist hospital location tx,Methodist Hospital
75248,methodist hospital human resources,1,1,0,https://www.houstonmethodist.org/for-health-pr...,112,6390000,67,1687060684,0,...,210,110,170,210,170,140,170,80,methodist hospital human resource,Methodist Hospital


In [1]:
import pandas as pd

df = pd.read_csv("HM Raw Data.csv")
df.sort_values("position", ascending=True, inplace=True)

by_url = (
    df.groupby("url")
    .agg({"search_volume": "sum", "keyword": list})
    .sort_values("search_volume", ascending=False)
    .head(10)
)

In [6]:
by_url.head()

Unnamed: 0_level_0,search_volume,keyword
url,Unnamed: 1_level_1,Unnamed: 2_level_1
https://www.mdanderson.org/treatment-options/targeted-therapy.html,60833080,"[precision targeted, precision targeted therap..."
https://www.mdanderson.org/cancerwise/canker-sore-vs--oral-cancer--how-can-you-tell-the-difference.h00-159542901.html,28151760,"[does oral cancer hurt, gum canker sore vs can..."
https://www.mdanderson.org/cancerwise/swollen-lymph-nodes-and-other-symptoms-of-lymphoma.h00-159464790.html,26339310,"[what are normal size lymph nodes, does swolle..."
https://www.mdanderson.org/patients-family/becoming-our-patient/getting-to-md-anderson/wayfinding.html,24667830,"[directions to md anderson, md anderson shuttl..."
https://hcahealthcare.com/,20334720,"[hca facility, hca flu track, hca hea, hca hea..."


In [7]:
from lib.api import get_openai_response_chat
import settings
from tqdm import tqdm

for url, row in tqdm(by_url.iterrows(), total=by_url.shape[0]):
    samples = "\n".join(row["keyword"][:10])

    prompt = f"""As an expert at understanding search intent, 
    Please provide a decrtiptive subject for the page based on the provided Search Queries.

    Search Queries:
    {samples}

    Subject: """

    explanation = get_openai_response_chat(
        prompt,
        model=settings.CLUSTER_DESCRIPTION_MODEL,
        system_message="You are an expert at understanding the intent of Google searches.",
    )

    samples = ", ".join(row["keyword"][:10])
    print(explanation)
    print(samples)
    print()

 10%|█         | 1/10 [00:01<00:12,  1.42s/it]

Understanding Precision Targeted Therapy and Targeted Therapeutics in Precision Medicine for Cancer
precision targeted, precision targeted therapy, precision therapy, precision medicine for cancer, target therapy, targeted therapies, targeted therapy, targeted therapeutics, what is targeted therapy, what is target therapy



 20%|██        | 2/10 [00:02<00:07,  1.02it/s]

Understanding the Difference Between Oral Cancer and Canker Sores
does oral cancer hurt, gum canker sore vs cancer, tongue canker sore vs cancer, tongue cancer vs canker sore, oral cancer canker sore, oral cancer or canker sore, oral cancer vs canker sore, oral cancer ulcer, canker sore vs cancer, canker sore vs cancer of the mouth



 30%|███       | 3/10 [00:02<00:06,  1.06it/s]

Understanding Lymph Nodes: Size, Swelling, and Cancer
what are normal size lymph nodes, does swollen lymph nodes mean cancer, does lymph node cancer hurt, lymph node mass, lymph nodes in neck always swollen, lymph node size, size of lymph nodes, lymph nodes big, lymph nodes size, swollen lymph nodes neck cancer



 40%|████      | 4/10 [00:03<00:04,  1.26it/s]

Getting to MD Anderson: Directions, Shuttle Information, and Address
directions to md anderson, md anderson shuttle, md anderson shuttle tracker, oneaccess md anderson, mda bus tracker, pickens tower md anderson, md anderson pickens tower, address md anderson, anderson directions, address for md anderson cancer center



 50%|█████     | 5/10 [00:04<00:03,  1.38it/s]

HCA Healthcare: Facilities, Flu Tracking, and Corporate Information
hca facility, hca flu track, hca hea, hca heal, hca corporation, hca healtcare, hca health, hca helthcare, hca helathcare, hca heathcare



 60%|██████    | 6/10 [00:05<00:03,  1.07it/s]

Understanding the Causes of Lower Back Pain
what can cause severe lower back pain, what can cause lower back pain, what can cause low back pain, what causes severe lower back pain, what causes your lower back to hurt, what causes pain in my lower back, what causes pain in lower back, what causes pain in the lower back, what causes pain in your lower back, what could lower back pain mean



 70%|███████   | 7/10 [00:06<00:02,  1.06it/s]

Effective Remedies and Treatments for Cold Sores and Lip Blisters
what can help cold sores, sore on lips remedy, what can i use on cold sores, what can i put on cold sore, what can you use for cold sores, lip blisters treatment, lip blister remedy, lip sore remedy, healing cold sores fast, heal blister on lip



 80%|████████  | 8/10 [00:07<00:01,  1.17it/s]

Understanding the Signs and Symptoms of Skin Cancer



 90%|█████████ | 9/10 [00:07<00:00,  1.21it/s]

Understanding Pancreatic Cancer: Symptoms, Treatment, and Support for Patients
md anderson pancreatic cancer, cancer pancreatic, pancreatic tumor, pancreas cancer, pan creatic cancer, cancer of pancreas, pancreatic cancer patient, pancreatic cancer patients, pancreatic cancer tumor, pancratic cancer



100%|██████████| 10/10 [00:08<00:00,  1.14it/s]

Understanding Colon Cancer: Symptoms, Definitions, and Origins
symptoms of colon cancer in man, symptoms of colorectal cancer in males, colon cancer md anderson, md anderson colon cancer, symptons colon cancer, colorectal cancer definition, definition of colon cancer, cancer of colon, definition colorectal cancer, how do colon cancer start




