In [1]:
import openai
import pandas as pd 
import numpy as np
import csv
import time
from fuzzywuzzy import fuzz, process
import time
import requests
import cloudscraper
from bs4 import BeautifulSoup
from googlesearch import search
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity



## Targeted Scenario:

A 28-year-old woman in Paterson, NJ, is living in temporary housing while working a part-time job and has a physical disability that doesn't prevent her from working entirely but makes it difficult. She has documented immigration status but struggles to balance employment demands with physical wellness and social engagement. She is uncertain enough about her sustained ability to earn enough from part-time work to make it that she is asking about disability benefits and housing subsidies. 


# Current response:

It sounds like there's a lot going on for her, and it's commendable that she's seeking support. Let's break down the details with a focus on SMART goals again along with additional resources she may find helpful.


#### SMART Goals for Enhanced Wellness

Emotional Wellness
Goal: Schedule and attend a weekly therapy session to develop coping strategies for stress and emotional challenges.

Next Step: Identify local mental health resources like the Preferred Behavioral Health Group or the Counseling Services (Mental Health Clinic of Passaic).

Physical Wellness
Goal: Engage in at least 30 minutes of adaptive physical activity three times per week to enhance physical fitness.

Next Step: Research adaptive exercise classes available in the community or online.

Occupational Wellness
Goal: Explore job training programs or workshops to improve skills for better employment options.

Next Step: Set a specific time to research programs like the Displaced Homemaker Program that can help her transition back to the workforce.

Financial Wellness
Goal: Investigate eligibility for SSI and housing subsidies in New Jersey.

Next Step: Contact the local Social Services office or use the Social Services for the Homeless service by calling 211 to gather information.

Social Wellness
Goal: Join a community or online support group for individuals with disabilities.

Next Step: Look for local listings or community boards to find groups, or consider specific organizations that cater to her interests.

Intellectual Wellness
Goal: Complete one online course related to adaptive skills or self-advocacy within the next month.

Next Step: Research online platforms like Coursera or Udemy for relevant course offerings.

#### Follow-Up Questions
Emotional Wellness: How has living in temporary housing impacted her mental health, and is there anything specific she'd like support with?
Physical Wellness: What types of physical activities or exercises has she enjoyed in the past, and does she need assistance finding those that accommodate her disability?
Occupational Wellness: What skills does she possess, and what new skills does she wish to develop through training programs?
Financial Wellness: Is she aware of her current financial situation and what documents she may need for applications?
Social Wellness: How does she currently maintain her social connections, and are there particular groups or activities she's interested in exploring?
Intellectual Wellness: Are there specific skills or subjects she is passionate about learning more about?

#### Resources
Preferred Behavioral Health Group: Offers mental health support services. Phone: 732-367-4700

Social Services for the Homeless: 24-hour assistance available. Call 211 for support.

Displaced Homemaker Program: Offers training and job counseling.

Would you like to focus on any specific area or goal more closely, or is there something else she needs assistance with?


# EPINET wellness planner Response:

#### Relevant Resources
Disability Rights Pennsylvania:
This organization provides vital assistance and legal advocacy for individuals applying for disability benefits and housing support, offering guidance throughout the application processes.
Visit Disability Rights PA for more information.

211 Pennsylvania:
A confidential helpline that connects users with local services for health care, housing, and disability assistance. It's a great resource for tailored support based on individual needs.
Access it at 211 PA.

Pennsylvania Housing Finance Agency (PHFA):
For anyone seeking rental assistance or housing subsidies in Pennsylvania, the PHFA offers valuable information on eligibility and application processes.
More details can be found at PHFA.

Mental Health Association in Pennsylvania:
This organization provides programs to support mental wellness, which can be helpful as she navigates her physical health and social engagement challenges.
Learn more at Mental Health Association Website.



#### SMART Goals

Goal 1: Disability Benefits Application
Specific: Complete and submit an application for SSI within the next 30 days.
Measurable: Keep track of the application status and document any communications with the SSA.
Achievable: Gather necessary documentation, such as medical records.
Relevant: This will provide financial assistance.
Time-Bound: Aim for submission by the end of the month.

Goal 2: Housing Assistance Research
Specific: Research at least three housing assistance programs and apply to one within the next month.
Measurable: Make a list of programs and track the application process.
Achievable: Set aside time each week for this research.
Relevant: Securing stable housing is crucial for her overall wellness.
Time-Bound: Submit her application within four weeks.

Goal 3: Social Engagement
Specific: Attend one local community group meeting related to disability support within the next month.
Measurable: Record her experiences and feelings about meeting new people.
Achievable: Look for groups online or in local community centers.
Relevant: Building a support network can aid in her emotional wellness.
Time-Bound: Join a meeting or event within a month.

#### Follow-Up Questions

Employment and Financial Resources:
What is her current hourly wage, and is she able to increase her hours if necessary?
Has she checked for any state-specific benefits that could supplement her income?

Support Systems:
Does she have a support system, such as friends, family, or community connections, that she can turn to for help?
Are there any previous experiences with support groups that she found beneficial?

Housing Preferences:
What are her preferences for housing in terms of location, size, and accessibility features?
Is she aware of any local resources for housing that she could immediately reach out to?

Physical Wellness Management:
What physical wellness activities does she currently engage in, and how could they be adapted to meet her needs?
Has she explored local resources, such as physical therapy or exercise programs, tailored to individuals with disabilities?

Crisis Resources:
Should she need immediate support, is she aware of local crisis intervention services available to assist her with any urgent issues?

These resources, goals, and questions aim to empower her in navigating her pathways towards improved wellness and stability. Would she like to discuss more specific resources regarding any area, or is there something else she needs assistance with?

### Number of housing related resources

In [4]:
original = pd.read_csv('all_resources_2025.csv')
epinet_data = pd.read_csv('EPINET_resource_with_descriptions.csv')

In [5]:
filtered_df = original[original['description'].str.contains(r'\bhouse\b|\bhousing\b', case=False, na=False, regex=True)]
print(filtered_df.shape[0]) 
print(f"Percentage of housing related sources in CSPNJ dataset: {filtered_df.shape[0]/original.shape[0] * 100}")

73
Percentage of housing related sources in CSPNJ dataset: 5.387453874538745


In [6]:
filtered_epinet = epinet_data[epinet_data['Generated Description'].str.contains(r'\bhouse\b|\bhousing\b', case=False, na=False, regex=True)]
print(filtered_epinet.shape[0])
filtered_epinet.shape[0]/epinet_data.shape[0]
print(f"Percentage of housing related sources in EPINET dataset: {filtered_epinet.shape[0]/epinet_data.shape[0] * 100}")

68
Percentage of housing related sources in EPINET dataset: 3.0141843971631204


### Average description length 

In [7]:
average_word_count = original['description'].dropna().apply(lambda x: len(x.split())).mean()
print(f"Average number of words in CSPNJ description: {average_word_count:.2f}")

Average number of words in CSPNJ description: 46.94


In [8]:
average_word_count = epinet_data['Generated Description'].dropna().apply(lambda x: len(x.split())).mean()
print(f"Average number of words in EPINET description: {average_word_count:.2f}")

Average number of words in EPINET description: 84.56


### Improve description

In [69]:
df = pd.read_csv("all_resources_2025.csv")
df = df.dropna(subset=['url'])
gao_key = "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"
openai.api_key = gao_key
scraper = cloudscraper.create_scraper()

def get_text_from_url(url, timeout=5):
    """
    Fetches text from a website. If it times out or fails, it returns None.
    """
    headers = {"User-Agent": "Mozilla/5.0"}
    
    try:
        response = requests.get(url, headers=headers, timeout=timeout)
        if response.status_code != 200:
            response = scraper.get(url, timeout=timeout)
            if response.status_code != 200:
                return None  

        soup = BeautifulSoup(response.text, 'html.parser')
        page_text = soup.get_text(separator='\n', strip=True)
        
        return page_text[:100000] if page_text else None  
    except (requests.exceptions.Timeout, requests.exceptions.RequestException):
        return None  

In [60]:
def generate_description(resource_name, website_text):
    """
    Calls OpenAI GPT to generate a short description based on the extracted website text.
    """
    prompt = f"Provide a detailed (less than 100 words) description for the following resource:\n\nResource Name: {resource_name}\nWebsite Info: {website_text} \
               Pay attention to what supporting services or activities this specifc resource provides. Include all the important details. \
               Do not include phone number in your description. Also, no need to state anything like 'More information can be found on their website, where resources and a patient portal are also available.'"
    
    try:
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "system", "content": "You are an assistant for mental health support that summarizes descriptions of mental health support resources."},
                      {"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content.strip()
    
    except Exception as e:
        print(f"GPT request failed for {resource_name}: {e}")
        return None 

def search_for_resource_online(resource_name):
    """
    Searches Google for the given resource name and address and returns the first relevant website.
    """
    query = f"{resource_name} mental health site"
    
    try:
        search_results = list(search(query, num=3, stop=3, pause=2))  
        for result in search_results:
            if "facebook.com" not in result and "linkedin.com" not in result:  
                return result  
    except Exception:
        return None  
    return None  

In [61]:
descriptions = []

for idx, row in df.iterrows():
    resource_name = row["service"]
    website = row["url"]

    # Try to extract website content
    website_text = None
    if pd.notna(website) and website.startswith("http"):
        website_text = get_text_from_url(website)
    
    # If no website or failed request, search online
    if website_text is None:
        new_website = search_for_resource_online(resource_name)
        if new_website:
            # print(f"Found alternative website for {resource_name}: {new_website}")
            website_text = get_text_from_url(new_website)
    
    # If still no useful text, skip processing
    if website_text is None:
        descriptions.append("Skipped due to timeout, missing website, or no useful content found.")
        # print(f"Skipping {resource_name} due to website failure or lack of content.")
        continue

    # Generate description using GPT
    description = generate_description(resource_name, website_text)

    # Skip if GPT processing fails
    descriptions.append(description if description else "Skipped due to GPT failure.")

    print(f"Processed resource {idx + 1} out of {df.shape[0]}.")

    time.sleep(1)

Processed resource 1 out of 1319.
Processed resource 2 out of 1319.
Processed resource 3 out of 1319.
Processed resource 4 out of 1319.
Processed resource 5 out of 1319.
Processed resource 6 out of 1319.
Processed resource 7 out of 1319.
Processed resource 8 out of 1319.
Processed resource 9 out of 1319.
Processed resource 10 out of 1319.
Processed resource 11 out of 1319.
Processed resource 12 out of 1319.
Processed resource 13 out of 1319.
Processed resource 14 out of 1319.
Processed resource 15 out of 1319.
Processed resource 16 out of 1319.
Processed resource 17 out of 1319.
Processed resource 18 out of 1319.
Processed resource 19 out of 1319.
Processed resource 20 out of 1319.
Processed resource 21 out of 1319.
Processed resource 22 out of 1319.
Processed resource 23 out of 1319.
Processed resource 24 out of 1319.
Processed resource 25 out of 1319.
Processed resource 26 out of 1319.
Processed resource 27 out of 1319.
Processed resource 28 out of 1319.
Processed resource 29 out of 

In [70]:
df.description = descriptions
df.to_csv("all_resources_2025_updated.csv")

In [9]:
updated = pd.read_csv('all_resources_2025_updated.csv')

### Comparison

In [116]:
original[original.service == '(NA) Narcotics Anonymous'].url.iloc[0]

'https://m.narcoticsanonymousnj.org/Meetings/default.asp'

In [111]:
original[original.service == '(NA) Narcotics Anonymous'].description.iloc[0]

'{"location": "New Jersey", "operating_hours": "", "prerequisites": ""}'

In [118]:
updated[updated.service == '(NA) Narcotics Anonymous'].description.iloc[0]

'Narcotics Anonymous (NA) offers support for individuals recovering from substance abuse. The NJ Statewide Meeting Directory helps users locate nearby NA meetings by entering a city or ZIP code. Meetings are available in-person as well as online and via phone. NA also provides Android and iPhone apps to facilitate meeting searches. Printable meeting lists can be accessed at meetinglist.nanj.org. NA encourages community involvement by allowing users to report any meeting changes to keep the directory current. Users can find support through various events and services dedicated to recovery.'

In [113]:
original[original.service == 'YWCA NNJ (Northern NJ)'].description.iloc[0]

'Location: 352 Delsea Dr N, Glassboro, NJ 08028  \nOperating hours: Not specified  \nPrerequisites: Accepting new patients; no specific identification requirements mentioned.'

In [119]:
updated[updated.service == 'YWCA NNJ (Northern NJ)'].description.iloc[0]

'YWCA Northern New Jersey is dedicated to eliminating racism, empowering women, and advancing gender equity. They offer a variety of programs, including sexual violence survivor support, women’s and girls’ empowerment initiatives, before-and-after school programs, summer camps, and senior health and wellness services. Additionally, YWCA NNJ hosts events like the Women’s Leadership Conference, supports women veterans through “Operation Sisterhood,” and provides mental health wellness resources via YWTV, a series of live programs. With over a century of impactful service, YWCA NNJ fosters community connection and personal growth.'

# Updated Responses

See the app

## Housing Resource Expansion

## ( transportaion)

In [168]:
openai.api_key = gao_key
housing_resources = []
def get_housing_resources():
    prompt = (
        f"Provide a list of 40 publicly available resources related to housing support. "
        "Each resource must be available either nationwide in the US or specifically for the state of New Jersey. "
        "Format each resource in this exact structure:\n\n"
        "Resource Name: [Name]\n"
        "Description: [Brief description (less than 100 words) of what the resource provides]\n"
        "URL: [Website link]\n"
        "Phone: [Phone number, or 'NaN' if not available]\n"
        "Separate each resource with a newline.\n"
    )
    
    try:
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "system", "content": "You are a helpful assistant that provides information about wellness resources."},
                      {"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        print(f"Error fetching data for {dimension}: {e}")
        return ""

In [169]:
resources_text = get_housing_resources()

if resources_text:
    resources = resources_text.strip().split("\n\n")
    for resource in resources:
        lines = resource.split("\n")
        try:
            name = lines[0].split(": ")[1]
            description = lines[1].split(": ")[1]
            url = lines[2].split(": ")[1]
            phone = lines[3].split(": ")[1] if len(lines) > 3 else "NaN"
            housing_resources.append([dimension, name, description, url, phone])
        except IndexError:
            print(f"Skipping malformed entry in {dimension} wellness")

In [171]:
csv_filename = "housing_resources.csv"
with open(csv_filename, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Dimension", "Resource Name", "Description", "URL", "Phone"])
    writer.writerows(housing_resources)

print(f"Successfully saved {len(housing_resources)} resources to {csv_filename}")

Successfully saved 40 resources to housing_resources.csv


In [10]:
housing_resources = pd.read_csv('housing_resources.csv')
housing_resources = housing_resources.drop(columns=['Dimension'])
housing_resources

Unnamed: 0,Resource Name,Description,URL,Phone
0,National Low Income Housing Coalition (NLIHC),A nonprofit organization advocating for afford...,https://nlihc.org,
1,HUD Exchange,The U.S. Department of Housing and Urban Devel...,https://www.hudexchange.info,
2,211,"A free, confidential service that helps people...",https://www.211.org,211
3,National Housing Trust,A nonprofit organization focused on preserving...,https://www.nhtinc.org,
4,Housing Assistance Council (HAC),A national nonprofit that supports affordable ...,https://www.ruralhome.org,
5,New Jersey Department of Community Affairs (DC...,The state agency provides housing assistance p...,https://www.nj.gov/dca,(609) 292-6420
6,New Jersey Housing and Mortgage Finance Agency...,Offers financial assistance for homebuyers and...,https://www.njhousing.gov,(609) 278-7400
7,The National Alliance to End Homelessness,A nonprofit organization dedicated to preventi...,https://endhomelessness.org,
8,Legal Services of New Jersey,Provides free legal assistance and resources f...,https://www.lsnj.org,(888) 576-5529
9,U.S. Interagency Council on Homelessness (USIC...,A federal agency that coordinates the federal ...,https://www.usich.gov,


In [11]:
updated_urls = updated['url'].dropna().unique()
new_housing_filtered = housing_resources[~housing_resources['URL'].dropna().apply(lambda x: any(pd.Series(updated_urls).str.contains(x, na=False, regex=False)))]
new_housing_filtered

Unnamed: 0,Resource Name,Description,URL,Phone
0,National Low Income Housing Coalition (NLIHC),A nonprofit organization advocating for afford...,https://nlihc.org,
1,HUD Exchange,The U.S. Department of Housing and Urban Devel...,https://www.hudexchange.info,
2,211,"A free, confidential service that helps people...",https://www.211.org,211
3,National Housing Trust,A nonprofit organization focused on preserving...,https://www.nhtinc.org,
4,Housing Assistance Council (HAC),A national nonprofit that supports affordable ...,https://www.ruralhome.org,
5,New Jersey Department of Community Affairs (DC...,The state agency provides housing assistance p...,https://www.nj.gov/dca,(609) 292-6420
6,New Jersey Housing and Mortgage Finance Agency...,Offers financial assistance for homebuyers and...,https://www.njhousing.gov,(609) 278-7400
7,The National Alliance to End Homelessness,A nonprofit organization dedicated to preventi...,https://endhomelessness.org,
8,Legal Services of New Jersey,Provides free legal assistance and resources f...,https://www.lsnj.org,(888) 576-5529
9,U.S. Interagency Council on Homelessness (USIC...,A federal agency that coordinates the federal ...,https://www.usich.gov,


In [38]:
df_A = updated
df_B = new_housing_filtered
# Combine resource names and URLs into a single string for vectorization
df_A['combined'] = df_A['service'] + " " + df_A['url']
df_B['combined'] = df_B['Resource Name'] + " " + df_B['URL']

# Vectorize the text data
vectorizer = TfidfVectorizer().fit(df_A['combined'])
A_vectors = vectorizer.transform(df_A['combined'])
B_vectors = vectorizer.transform(df_B['combined'])

# Compute cosine similarity
similarity_matrix = cosine_similarity(B_vectors, A_vectors)

# Define a similarity threshold
SIMILARITY_THRESHOLD = 0.6 

# Remove rows in B that are too similar to rows in A
similar_rows = similarity_matrix.max(axis=1) >= SIMILARITY_THRESHOLD
df_B_filtered = df_B[~similar_rows]

df_B_filtered = df_B_filtered.reset_index(drop = True)
print(df_B_filtered.shape)
housing_resources = df_B_filtered
df_B_filtered

(28, 5)


Unnamed: 0,Resource Name,Description,URL,Phone,combined
0,National Low Income Housing Coalition (NLIHC),A nonprofit organization advocating for afford...,https://nlihc.org,,National Low Income Housing Coalition (NLIHC) ...
1,National Housing Trust,A nonprofit organization focused on preserving...,https://www.nhtinc.org,,National Housing Trust https://www.nhtinc.org
2,Housing Assistance Council (HAC),A national nonprofit that supports affordable ...,https://www.ruralhome.org,,Housing Assistance Council (HAC) https://www...
3,New Jersey Housing and Mortgage Finance Agency...,Offers financial assistance for homebuyers and...,https://www.njhousing.gov,(609) 278-7400,New Jersey Housing and Mortgage Finance Agency...
4,U.S. Interagency Council on Homelessness (USIC...,A federal agency that coordinates the federal ...,https://www.usich.gov,,U.S. Interagency Council on Homelessness (USIC...
5,Affordable Housing Online,A comprehensive resource listing affordable ho...,https://affordablehousingonline.com,,Affordable Housing Online https://affordable...
6,New Jersey Tenants Organization,"Advocates for tenant rights in New Jersey, pro...",http://www.njto.org,(732) 819-0500,New Jersey Tenants Organization http://www.n...
7,National Coalition for the Homeless,Advocacy organization working to prevent and e...,https://www.nationalhomeless.org,,National Coalition for the Homeless https://...
8,The U.S. Department of Veterans Affairs (VA),"Offers housing assistance for veterans, includ...",https://www.va.gov/homeless,,The U.S. Department of Veterans Affairs (VA) ...
9,New Jersey Coalition to End Homelessness,A statewide organization that works to prevent...,https://njceh.org,(732) 246-0003,New Jersey Coalition to End Homelessness htt...


## Overall Resource Expansion

In [191]:
openai.api_key = gao_key
dimensions_of_wellness = [
    "emotional", "physical", "occupational", "social",
    "spiritual", "intellectual", "environmental", "financial"
]

all_resources = []

def get_resources(dimension):
    prompt = (
        f"Provide a list of 40 publicly available resources related to {dimension} wellness. "
        "Each resource must be available either nationwide in the US or specifically for New Jersey. "
        "Format each resource in this exact structure:\n\n"
        "Resource Name: [Name]\n"
        "Description: [Brief description (less than 100 words) of what the resource provides]\n"
        "URL: [Website link]\n"
        "Phone: [Phone number, or 'NaN' if not available]\n"
        "Separate each resource with a newline.\n"
    )
    
    try:
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "system", "content": "You are a helpful assistant that provides information about wellness resources."},
                      {"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        print(f"Error fetching data for {dimension}: {e}")
        return ""

In [129]:
for dimension in dimensions_of_wellness:
    print(f"Fetching resources for {dimension} wellness...")
    resources_text = get_resources(dimension)
    
    if resources_text:
        resources = resources_text.strip().split("\n\n")
        for resource in resources:
            lines = resource.split("\n")
            try:
                name = lines[0].split(": ")[1]
                description = lines[1].split(": ")[1]
                url = lines[2].split(": ")[1]
                phone = lines[3].split(": ")[1] if len(lines) > 3 else "NaN"
                
                all_resources.append([dimension, name, description, url, phone])
            except IndexError:
                print(f"Skipping malformed entry in {dimension} wellness")


    time.sleep(2)

Fetching resources for emotional wellness...
Skipping malformed entry in emotional wellness
Skipping malformed entry in emotional wellness
Fetching resources for physical wellness...
Skipping malformed entry in physical wellness
Fetching resources for occupational wellness...
Skipping malformed entry in occupational wellness
Skipping malformed entry in occupational wellness
Fetching resources for social wellness...
Skipping malformed entry in social wellness
Skipping malformed entry in social wellness
Fetching resources for spiritual wellness...
Skipping malformed entry in spiritual wellness
Fetching resources for intellectual wellness...
Skipping malformed entry in intellectual wellness
Skipping malformed entry in intellectual wellness
Fetching resources for environmental wellness...
Skipping malformed entry in environmental wellness
Fetching resources for financial wellness...


In [153]:
csv_filename = "wellness_resources.csv"
with open(csv_filename, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Dimension", "Resource Name", "Description", "URL", "Phone"])
    writer.writerows(all_resources)

print(f"Successfully saved {len(all_resources)} resources to {csv_filename}")

Successfully saved 415 resources to wellness_resources.csv


In [141]:
new_resources = pd.read_csv('wellness_resources.csv')

In [138]:
#remove rows that have the same url or contains a similar url
updated_urls = updated['url'].dropna().unique()
new_resources_filtered = new_resources[~new_resources['URL'].dropna().apply(lambda x: any(pd.Series(updated_urls).str.contains(x, na=False, regex=False)))]
new_resources_filtered

Unnamed: 0,Dimension,Resource Name,Description,URL,Phone
1,emotional,Mental Health America (MHA),MHA promotes mental health as a critical part ...,https://www.mhanational.org,1-800-969-6642
7,emotional,American Psychological Association (APA),The APA offers resources on mental health educ...,https://www.apa.org,
8,emotional,National Suicide Prevention Lifeline,A nationwide network of crisis centers that pr...,https://suicidepreventionlifeline.org,1-800-273-TALK (8255)
9,emotional,National Institute of Mental Health (NIMH),"NIMH provides information on mental disorders,...",https://www.nimh.nih.gov,
11,emotional,Mental Health First Aid,A public education program that teaches people...,https://www.mentalhealthfirstaid.org,
...,...,...,...,...,...
310,financial,Smart Money,Offers practical advice and tips on personal f...,https://www.smartmoney.com,
311,financial,New Jersey Credit Union League,Provides resources and educational programs ai...,https://www.njcul.org,
312,financial,The Financial Gym,Offers personal finance coaching and workshops...,https://financialgym.com,
313,financial,The Balance,A comprehensive personal finance website that ...,https://www.thebalance.com,


In [143]:
df_A = updated
df_B = new_resources_filtered
# Combine resource names and URLs into a single string for vectorization
df_A['combined'] = df_A['service'] + " " + df_A['url']
df_B['combined'] = df_B['Resource Name'] + " " + df_B['URL']

# Vectorize the text data
vectorizer = TfidfVectorizer().fit(df_A['combined'])
A_vectors = vectorizer.transform(df_A['combined'])
B_vectors = vectorizer.transform(df_B['combined'])

# Compute cosine similarity
similarity_matrix = cosine_similarity(B_vectors, A_vectors)

# Define a similarity threshold
SIMILARITY_THRESHOLD = 0.6 

# Remove rows in B that are too similar to rows in A
similar_rows = similarity_matrix.max(axis=1) >= SIMILARITY_THRESHOLD
df_B_filtered = df_B[~similar_rows]

df_B_filtered 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_B['combined'] = df_B['Resource Name'] + " " + df_B['URL']


Unnamed: 0,Dimension,Resource Name,Description,URL,Phone,combined
7,emotional,American Psychological Association (APA),The APA offers resources on mental health educ...,https://www.apa.org,,American Psychological Association (APA) https...
9,emotional,National Institute of Mental Health (NIMH),"NIMH provides information on mental disorders,...",https://www.nimh.nih.gov,,National Institute of Mental Health (NIMH) htt...
11,emotional,Mental Health First Aid,A public education program that teaches people...,https://www.mentalhealthfirstaid.org,,Mental Health First Aid https://www.mentalheal...
14,emotional,The Jed Foundation,A nonprofit organization focused on protecting...,https://www.jedfoundation.org,,The Jed Foundation https://www.jedfoundation.org
16,emotional,Support Groups Central,An online platform offering a variety of suppo...,https://www.supportgroupscentral.com,,Support Groups Central https://www.supportgrou...
...,...,...,...,...,...,...
310,financial,Smart Money,Offers practical advice and tips on personal f...,https://www.smartmoney.com,,Smart Money https://www.smartmoney.com
311,financial,New Jersey Credit Union League,Provides resources and educational programs ai...,https://www.njcul.org,,New Jersey Credit Union League https://www.n...
312,financial,The Financial Gym,Offers personal finance coaching and workshops...,https://financialgym.com,,The Financial Gym https://financialgym.com
313,financial,The Balance,A comprehensive personal finance website that ...,https://www.thebalance.com,,The Balance https://www.thebalance.com


In [159]:
df_B_filtered.iloc[:20]

Unnamed: 0,Dimension,Resource Name,Description,URL,Phone,combined
7,emotional,American Psychological Association (APA),The APA offers resources on mental health educ...,https://www.apa.org,,American Psychological Association (APA) https...
9,emotional,National Institute of Mental Health (NIMH),"NIMH provides information on mental disorders,...",https://www.nimh.nih.gov,,National Institute of Mental Health (NIMH) htt...
11,emotional,Mental Health First Aid,A public education program that teaches people...,https://www.mentalhealthfirstaid.org,,Mental Health First Aid https://www.mentalheal...
14,emotional,The Jed Foundation,A nonprofit organization focused on protecting...,https://www.jedfoundation.org,,The Jed Foundation https://www.jedfoundation.org
16,emotional,Support Groups Central,An online platform offering a variety of suppo...,https://www.supportgroupscentral.com,,Support Groups Central https://www.supportgrou...
18,emotional,National Center for PTSD,A resource for veterans and others affected by...,https://www.ptsd.va.gov,,National Center for PTSD https://www.ptsd.va.gov
19,emotional,The National Child Traumatic Stress Network,Offers resources and support for children and ...,https://www.nctsn.org,,The National Child Traumatic Stress Network ht...
20,emotional,The Anxiety Network,"A resource providing information, articles, an...",https://www.anxietynetwork.com,,The Anxiety Network https://www.anxietynetwork...
21,emotional,The National Alliance for Grieving Children,Offers resources and support for children and ...,https://childrengrieve.org,,The National Alliance for Grieving Children ht...
22,emotional,The Recovery Village,A resource for individuals seeking help for su...,https://www.therecoveryvillage.com,,The Recovery Village https://www.therecoveryvi...


In [161]:
sum(updated.url.str.contains('njamhaa'))

0

In [155]:
df_B_filtered.to_csv('filtered_new_resources.csv',index=False)

# Duplicate checking (manually)

In [62]:
updated[updated.url.str.contains('iclinc')]
# sum(updated.url.str.contains('hud'))

Unnamed: 0.1,Unnamed: 0,service,description,url,phone,url_phone,combined


In [64]:
housing_resources = housing_resources[~housing_resources['Resource Name'].str.contains('Community Development Block Grant Program', na=False)].reset_index(drop=True)
housing_resources = housing_resources[~housing_resources['Resource Name'].str.contains('The Housing Authority of New Jersey', na=False)].reset_index(drop=True)

housing_resources[20:]

Unnamed: 0,Resource Name,Description,URL,Phone,combined
20,The National Association of Realtors®,Provides resources and advocacy for homebuyers...,https://www.nar.realtor,,The National Association of Realtors® https:...
21,The Fair Housing Project,Offers education and advocacy to promote fair ...,https://www.fairhousingproject.org,,The Fair Housing Project https://www.fairhou...
22,The National Housing Conference,A nonprofit organization that promotes afforda...,https://www.nhc.org,,The National Housing Conference https://www....
23,The National Council for Behavioral Health,Provides resources for individuals experiencin...,https://www.thenationalcouncil.org,,The National Council for Behavioral Health h...
24,The Center for Urban and Regional Studies,Conducts research and provides resources relat...,https://curt.uab.edu,,The Center for Urban and Regional Studies ht...
25,The Institute for Community Living,Provides housing and support services for indi...,https://www.iclinc.org,,The Institute for Community Living https://w...


In [70]:
def classify_location(resource_name, description):
    prompt = f"""
    Determine if the following housing resource is specific to New Jersey or if it operates Nationwide:

    Resource Name: {resource_name}
    Description: {description}

    Answer with only one word: 'New Jersey' or 'Nationwide'.
    """

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant for classification tasks."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=3,
        temperature=0
    )

    classification = response.choices[0].message.content.strip()
    return classification

def classify_dataframe(df):
    locations = []
    for idx, row in df.iterrows():
        loc = classify_location(row['Resource Name'], row['Description'])
        locations.append(loc)
        print(f"Processed row {idx}: {row['Resource Name']} -> {loc}")
        time.sleep(1) 

    df['Location'] = locations
    return df


In [74]:
housing_resources = classify_dataframe(housing_resources)

In [77]:
housing_resources.to_csv('2025_housing_resources.csv')
pd.read_csv('2025_housing_resources.csv')

Unnamed: 0.1,Unnamed: 0,Resource Name,Description,URL,Phone,combined,Location
0,0,National Low Income Housing Coalition (NLIHC),A nonprofit organization advocating for afford...,https://nlihc.org,,National Low Income Housing Coalition (NLIHC) ...,Nationwide
1,1,National Housing Trust,A nonprofit organization focused on preserving...,https://www.nhtinc.org,,National Housing Trust https://www.nhtinc.org,Nationwide
2,2,Housing Assistance Council (HAC),A national nonprofit that supports affordable ...,https://www.ruralhome.org,,Housing Assistance Council (HAC) https://www...,Nationwide
3,3,New Jersey Housing and Mortgage Finance Agency...,Offers financial assistance for homebuyers and...,https://www.njhousing.gov,(609) 278-7400,New Jersey Housing and Mortgage Finance Agency...,New Jersey
4,4,U.S. Interagency Council on Homelessness (USIC...,A federal agency that coordinates the federal ...,https://www.usich.gov,,U.S. Interagency Council on Homelessness (USIC...,Nationwide
5,5,Affordable Housing Online,A comprehensive resource listing affordable ho...,https://affordablehousingonline.com,,Affordable Housing Online https://affordable...,Nationwide
6,6,New Jersey Tenants Organization,"Advocates for tenant rights in New Jersey, pro...",http://www.njto.org,(732) 819-0500,New Jersey Tenants Organization http://www.n...,'New Jersey
7,7,National Coalition for the Homeless,Advocacy organization working to prevent and e...,https://www.nationalhomeless.org,,National Coalition for the Homeless https://...,Nationwide
8,8,The U.S. Department of Veterans Affairs (VA),"Offers housing assistance for veterans, includ...",https://www.va.gov/homeless,,The U.S. Department of Veterans Affairs (VA) ...,Nationwide
9,9,New Jersey Coalition to End Homelessness,A statewide organization that works to prevent...,https://njceh.org,(732) 246-0003,New Jersey Coalition to End Homelessness htt...,New Jersey
