# Go Fund Me Campaign Analysis
----

This dataset comes from my scrape of selective go fund me queries. Selective in the fact that I was searching for campaigns for those diagnosed with COVID. This is not a full population but a sample. Go Fund Me would only serve me 1,000 results per query.

So my question is:

* Can I use this data because it is not comprehensive?
* If so, how do I talk about it?
* If not, I'd like to decide that now and pivot to focusing on to Story Idea B: The Cost of COVID in New York, a cost comparison analysis of the hospitals in NYC.

### To Do's
- This data also needs some cleaning, to filter out any non-Covid related posts that may have snuck in.
- I need to hone in on the story I want to tell


### My story

My focus will be on hospital stays. I can use regex to try to pull out days or weeks for as many campaigns as I can and write about what I found. Once I clean the data, I would like to try to compare the goals of gofundme campaigns with the length of the hospital stay. I'd like to look closer at the descriptions and see if I can draw out any more insightful information about their experiences.

Given that I have chargemaster data, and codes related to COVID stays, I would like to pick a few of the common codes, like ICU stay or 96+ hours on a ventilator, etc, and provide mean charges for those.

The challenge will be the comparison. Can I select a few hospitals in some of the major areas rerpesented in this dataset? That feels like shaky comparison. Perhaps I can't include that at all?


In [2]:
import pandas as pd
import re



### Merge my two campaign scrapes & drop duplicates

In [3]:
df1 = pd.read_csv("go-fund-me-campaign-data.csv")
df2 = pd.read_csv("go-fund-me-campaigns-data-2b.csv")

In [4]:
df = pd.concat([df1,df2], ignore_index=True)
df

Unnamed: 0,name,id,date,city,country,goal,progress,donations,currency,description,deactivated,url
0,Help Alex Wilson with Covid Medical Bills,62898511,2022-01-26T21:31:12-06:00,"Sherrard, IL",US,18000,18500,138,USD,\n<div>He’s always there for anyone who needs ...,False,
1,Help Spencer With Covid Medical Bills,62733635,2022-01-19T11:54:28-06:00,"Grantsville, UT",US,10000,2188,37,USD,"\n<p>If you know Spencer, you love him. There ...",False,
2,Funeral and Covid medical bills,63058075,2022-02-02T17:04:58-06:00,"Maricopa, AZ",US,50000,5645,43,USD,\n<div>I honestly never imagined that my famil...,False,
3,Help My Dad Pay for My Mom's COVID Medical Bills,63006803,2022-01-31T19:30:28-06:00,"Rochester, MN",US,5000,1025,14,USD,\n<div>Hi. My name is Amy. I lost my mom to Co...,False,
4,Please consider donating for funeral expenses.,62227253,2021-12-27T15:20:56-06:00,"Austin, TX",US,10000,4035,53,USD,\n<div>Thank you to everyone that donated. <sp...,False,
...,...,...,...,...,...,...,...,...,...,...,...,...
5418,Helping hands for Julian,60299613,2021-10-03T18:29:59-05:00,"Moline, IL",US,15000,1660,19,USD,\n<div>On 8/4/21 Julian Sanchez came down with...,False,/f/helping-hands-for-julian?qid=6f0662a5aad16a...
5419,Funeral expenses for Chip Duet,61136429,2021-11-09T19:49:18-06:00,"Albright, WV",US,3000,240,5,USD,\n<div>Hi my name is Jody Duet my husband Chip...,False,/f/help-to-have-a-service-for-chip-duet?qid=6f...
5420,Funeral cost,61184917,2021-11-11T20:18:08-06:00,"Huntsville, AL",US,4500,210,4,USD,"<font color=""#333333""><br /><br /></font>\n<di...",False,/f/b6ehp-funeral-cost?qid=6f0662a5aad16a769076...
5421,Help Chris back on his feet after Covid!,60914797,2021-10-30T21:26:45-05:00,"Mazomanie, WI",US,5000,95,4,USD,\n<div>Hi my name is Gena and I am fundraising...,False,/f/help-chris-back-on-his-feet-after-covid?qid...


In [5]:
# If there's a duplicate description, indicating a duplicate campaign, drop it.
df = df.drop_duplicates(subset="description")

In [6]:
df.describe()

Unnamed: 0,id,goal,progress,donations
count,4892.0,4892.0,4892.0,4892.0
mean,58193560.0,48906.4,5113.630621,53.210752
std,4180303.0,1769999.0,15632.981181,172.361235
min,34729360.0,1.0,0.0,0.0
25%,55472920.0,4000.0,280.0,5.0
50%,59494900.0,8000.0,1792.0,20.0
75%,61596830.0,15000.0,5266.25,55.0
max,63271340.0,120000000.0,570670.0,7668.0


### Converting Date Column to Datetime

In [7]:
df.date = df.date.astype('str')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [8]:
# It was easier for me to isolate the year/month/day than figure out what to deal with the timestamp
def datetime(date):
    datetime = re.findall(r"\d{4}-\d{2}-\d{2}", date)[0]
    return datetime

In [9]:
df['datetime'] = df.date.apply(datetime)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df.date.apply(datetime)


In [10]:
df.datetime = pd.to_datetime(df.datetime, format="%Y-%m-%d", errors='coerce')

### Create separate city and state columns

In [11]:
df['city_list'] = df.city.str.split(",")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['city_list'] = df.city.str.split(",")


In [12]:
df['city_list'] = df.city_list.fillna("None")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['city_list'] = df.city_list.fillna("None")


In [13]:
# Narrowing down to US because that's where I want to focus my analysis,
# Also the not all have a "state"

us = df.query('country == "US"')

In [14]:
def state(city_list):
    try:
        return city_list[1]
    except:
        return None
    
def city(city_list):
    try:
        return city_list[0]
    except:
        return None

In [15]:
us['state'] = us.city_list.apply(state)
us['city_1'] = us.city_list.apply(city)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us['state'] = us.city_list.apply(state)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us['city_1'] = us.city_list.apply(city)


### Let's do some text analysis to start saving the ones we want to keep

In [16]:
id_description = df[['id', 'description']].copy()

In [17]:
# Make sure there are no duplicate ids, which means each description has a unique ID
df.id.value_counts().sort_values(ascending=False)

62898511    1
61964433    1
63006803    1
62227253    1
62136867    1
           ..
63235101    1
62165527    1
63110837    1
48986330    1
60061053    1
Name: id, Length: 4892, dtype: int64

In [18]:
# Getting a list of dictionaries of description by ID so I can save a list of the 
dicts = id_description.to_dict('records')

In [19]:
dicts

[{'id': 62898511,
  'description': "\n<div>He’s always there for anyone who needs a hand, now we are asking for you to show your love and support to Big Al!\n</div>\n\n<div>\xa0\n</div>\n\n<div>After being diagnosed with Covid earlier this month, Alex was admitted to Trinity West on Jan 18th and a few days later, was flown to the University of Iowa. He is currently on a ventilator in the ICU in Iowa City to help heal his lungs. As an apprentice at Local 150 Operating Engineers, his insurance is ok but not the best and will definitely not cover all of his expenses. His deductible is $5,000 and his portion of the AirCare helicopter transportation to the U of I will be around $10,000.\n</div>\n\n<div>\xa0\n</div>\n\n<div>Please give what you can. Every little bit will help and is greatly appreciated. Anyone who knows Alex knows that he loves everyone and will go out of his way to help in any way he can. Please show him how much you love and appreciate him. Thank you to everyone and God Bl

In [20]:

#Trying out my regex: looking for people who were diagnosed with COVID
for record in dicts:
    query = re.findall(r".{10}.diagnosed with covid..{10}", record['description'], re.IGNORECASE)
    if len(query) > 0:
        print(query)


['fter being diagnosed with Covid earlier th']
['>Jimmy was diagnosed with COVID Pneumonia ']
['y. She was diagnosed with Covid-19right af']
['were first diagnosed with Covid-19 on Augu', 'e was then diagnosed with Covid pneumonia.']
['d. She was diagnosed with Covid Pneumonia,']
[' were both diagnosed with Covid. \xa0Sean was']
['e has been diagnosed with Covid pneumonia.']
['ly, he was diagnosed with COVID-19 which l']
['<p>Wes was diagnosed with COVID on January']
['n,  KJ was diagnosed with COVID-19 on Octo', 'se and was diagnosed with COVID pneumonia ']
['n-law, was diagnosed with COVID-19 along w']
['ow. He was diagnosed with covid pneumonia.']
['ns. He was diagnosed with Covid19 ARDS ove']
[' Chelo was diagnosed with COVID about a we']
['s recently diagnosed with COVID-19 and was']
[' have been diagnosed with COVID-19 and Wil']
['d Joy were diagnosed with Covid-19.\xa0 They ']
[' also been diagnosed with COVID-19, and ha']
['r has been diagnosed with COVID-19 and is ']
['x days 

['al, he was diagnosed with COVID-19 and was']
['olanda was diagnosed with cOvid just befor']
['e has been diagnosed with COVID as well. Y']
['al and was diagnosed with COVID and diabet']
['e hospital diagnosed with covid or the sam']
[' Cindy was diagnosed with Covid 19 back on']
[', Rex, was diagnosed with covid a few week']
['a has been diagnosed with COVID-19 and Pne', ' also been diagnosed with COVID-19 and had']
['arcia were diagnosed with Covid Pneumonia.']
[', Dave was diagnosed with Covid pneumonia ']
['fe. He was diagnosed with COVID a  few wee']
['My Dad was diagnosed with Covid-19 on 12/6']
['t, and was diagnosed with COVID, as was he']
['er. He was diagnosed with Covid and was on']
['21, he was diagnosed with Covid. Fast forw']
['/>Erin was diagnosed with Covid and has sp', '>Chris was diagnosed with Covid earlier th']
['ere he was diagnosed with COVID-19. When w']
['ere he was diagnosed with Covid Pneumonia.']
[' currently diagnosed with Covid-19 and can']
['>Chris was di

### Let's use Regext to start saving ID's with descriptions that match what I'm looking for

I only want campaigns that deal with a COVID-survivor, because they will likely be related to medical bills or funeral expenses.

In [21]:
ids_keep = []

In [22]:
for record in dicts:
    query = re.findall(r".{10}.diagnosed with covid..{10}", record['description'], re.IGNORECASE)
    if len(query) > 0:
        ids_keep.append(record['id'])

In [23]:
# Captured 392
len(ids_keep)

392

In [24]:
# Let's narrow down to medical bills
medical_bills = []
for record in dicts:
    query = re.findall(r".{10}.medical bills?..{10}", record['description'], re.IGNORECASE)
    if len(query) > 0:
        medical_bills.append(record)        

In [25]:
# Battle/s/ing covid
for record in medical_bills:
    if record['id'] not in ids_keep:
        query = re.findall(r"battle?s?i?n?g? covid", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])

In [27]:
# Test/ed positive
for record in medical_bills:
    if record['id'] not in ids_keep:
        query = re.findall(r"teste?d? positive", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])

In [28]:
# Let's look at the titles of campaigns to narrow down
ids_names = df[['name', 'id']].copy().to_dict('records')

In [29]:
ids_names

[{'name': 'Help Alex Wilson with Covid Medical Bills', 'id': 62898511},
 {'name': 'Help Spencer With Covid Medical Bills', 'id': 62733635},
 {'name': 'Funeral and Covid medical bills', 'id': 63058075},
 {'name': "Help My Dad Pay for My Mom's COVID Medical Bills", 'id': 63006803},
 {'name': 'Please consider donating for funeral expenses.', 'id': 62227253},
 {'name': 'Help the Kuhne family with Shaun’s Final expenses', 'id': 62136867},
 {'name': 'Help Brandi Hoffman with Co-Vid medical bills', 'id': 61558519},
 {'name': 'Help Nathan with COVID medical bills', 'id': 61786993},
 {'name': 'Help Kimberli Pay Long-Haul Covid Medical Bills', 'id': 62685413},
 {'name': 'Help The Beam Family Pay COVID Medical Bills', 'id': 63030111},
 {'name': 'Omar’s Covid Medical bills', 'id': 61964433},
 {'name': 'Help Josh With COVID Medical Bills', 'id': 61782703},
 {'name': 'Help Bullock family with Covid medical bills', 'id': 62627235},
 {'name': 'Help Keith Cover COVID Medical Bills', 'id': 61822537},
 {

In [30]:
# covid medical bill/s
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid medical bills?", record['name'], re.IGNORECASE)
        if len(query) > 0:
            print(record)
            ids_keep.append(record['id'])

{'name': 'Help Spencer With Covid Medical Bills', 'id': 62733635}
{'name': 'Funeral and Covid medical bills', 'id': 63058075}
{'name': "Help My Dad Pay for My Mom's COVID Medical Bills", 'id': 63006803}
{'name': 'Help Nathan with COVID medical bills', 'id': 61786993}
{'name': 'Help Kimberli Pay Long-Haul Covid Medical Bills', 'id': 62685413}
{'name': 'Help The Beam Family Pay COVID Medical Bills', 'id': 63030111}
{'name': 'Omar’s Covid Medical bills', 'id': 61964433}
{'name': 'Help Bullock family with Covid medical bills', 'id': 62627235}
{'name': 'Help Jose Grullon Fight COVID Medical Bills', 'id': 62182731}
{'name': 'Ron Brant fighting COVID medical bills fundraiser', 'id': 62588463}
{'name': 'Help single mom with COVID medical bills', 'id': 62374213}
{'name': 'Dan VanDolsen Covid Medical Bills', 'id': 61803759}
{'name': 'Bolivian Family  Needs Help - COVID Medical Bills', 'id': 62282591}
{'name': 'Help pay my COVID medical bills', 'id': 61196643}
{'name': 'Covid medical bills', 'id'

In [31]:
# covid recovery
# count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid recovery", record['name'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)
            ids_keep.append(record['id'])
#             count = count + 1

In [32]:
# covid medical expenses
# count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid medical expenses", record['name'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)
            ids_keep.append(record['id'])
#             count = count + 1


0

In [33]:
# covid death
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid death", record['name'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
            count = count + 1
count

53

In [34]:
# covid loss
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid loss", record['name'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
            count = count + 1
count

34

In [35]:
# recover from covid
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"recover from covid", record['name'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
            count = count + 1
count

546

In [36]:
len(ids_keep)

1830

In [37]:
# Covid19
# count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid-?19", record['name'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
#             count = count + 1
# count

In [None]:
#{'name': 'Covid19', 'id': 50001290}
#{'name': 'ICU iPads in Memory of Our Dad We Lost to COVID-19', 'id': 47738236}
#'COVID-19 Workers Solidarity Campaign Fundraising', 'id': 47130870}
# {'name': 'Taco Tuesday COVID-19 LA Initiative', 'id': 46962424}
#'Honduras COVID-19 fund relief', 'id': 50091280}
# {'name': 'COVID19 Relief in India', 'id': 56482517}
# {'name': 'Namaha Healthcare - Mumbai COVID-19 Relief', 'id': 57842437}
# {'name': 'COVID-19 Relief for LANGO Sub-region in N. Uganda', 'id': 58176765}
# {'name': 'Sushi & Sake COVID-19 Relief', 'id': 47074122}
# {'name': 'COVID19 Relief for Impoverished Indian Communities', 'id': 56339524}
# {'name': 'COVID-19 Relief in India', 'id': 56396848}
# {'name': 'iPads for COVID-19 Patients', 'id': 47456678}
# {'name': 'Support For Project Hennu - COVID19 Crisis India', 'id': 56375934}
# {'name': 'COVID-19 Relief for The Academy!', 'id': 51608014}
# {'name': 'COVID-19 in India: We Can Help', 'id': 56680047}
# {'name': 'COVID-19 Relief for Frontliners in the Philippines', 'id': 47471608}
# {'name': 'Pakistan Covid-19 Relief Funds', 'id': 55105018}

In [38]:
len(ids_keep)

2042

In [39]:
# fight covid
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"fight covid", record['name'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
            count = count + 1
count

14

In [40]:
# Covid bills
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid bills", record['name'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
            
            count = count + 1
count

52

In [41]:
# covid survivor
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid survivor", record['name'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
            count = count + 1
count

79

In [42]:
# Covid battle
count = 0 
for record in ids_names:
    if record['id'] not in ids_keep:
        query = re.findall(r"covid battle", record['name'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
            count = count + 1
count

120

In [43]:
len(ids_keep)

2307

In [44]:
# Let's look at descriptions again:
# ICU with covid
count=0
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r"ICU with covid", record['description'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
            count = count + 1
count

30

In [45]:
# came down with covid
count=0
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r"came down with covid", record['description'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
            count = count + 1
count

25

In [149]:
count=0
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r"got covid", record['description'], re.IGNORECASE)
        if len(query) > 0:
#             print(record)

            ids_keep.append(record['id'])
            
            count = count + 1
count

60

In [46]:
# Testing positive
count=0
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r".{10}.testing positive..{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
#             print(query)

            ids_keep.append(record['id'])
            
            count = count + 1
count

20

In [47]:
# Contract/ed covid
count=0
for record in dicts:
    if record['id'] not in ids_keep:
#         print(record)
        query = re.findall(r".{10}.contracte?d? covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
#             print(query)

            ids_keep.append(record['id'])
            
            count = count + 1
count

146

In [48]:
# Contract/ed/ing the (covid/19) virus
count=0
for record in dicts:
    if record['id'] not in ids_keep:
#         print(record)
        query = re.findall(r".{10}.contracte?d?i?n?g? the c?o?v?i?d?-?1?9?\s?virus..{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
#             print(query)

            ids_keep.append(record['id'])
            
            count = count + 1
count

23

In [50]:
# Contract/ed/ing covid
count=0
for record in dicts:
    if record['id'] not in ids_keep:
#         print(record)
        query = re.findall(r"contracti?n?g? covid", record['description'], re.IGNORECASE)
        if len(query) > 0:

            ids_keep.append(record['id'])
            
            count = count + 1
count

19

In [60]:
# ICU (with/and has) covid 
# Works well for a few but not ('id': 51428548,)
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r".{10}ICU w?i?t?h?\s?a?n?d?\s?h?a?s?\s?co-?vid", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])

In [64]:
# positive for covid
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r".{10}positive for covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])

In [67]:
# Let's look at ventilators and see if we can narrow it down from there
ventilators = []
# count = 0 
for record in dicts:
    if record['id'] not in ids_keep:
        query = re.findall(r".{10}on a ventilator.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ventilators.append(record)
#             print(query)
#             count = count+1
# count

In [77]:
# NArrowing down to descriptions that mentione covid
# Doesn't work for 63119443, 58482687, 62784797, 60266311, 61575311 
# NOTE: I should look for "Not covid, or does not have covid"
count=0
for record in ventilators:
    if record['id'] not in ids_keep:
#         print(record)
        query = re.findall(r".{20}covid.{20}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
# #             print(query)
#             count = count+1
# count

In [78]:
len(ids_keep)

3359

In [89]:
ids_remove = []
count=0
for record in dicts:
    if record['id'] not in ids_keep:
#         print(record)
        query = re.findall(r".{10}not co-?vid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_remove.append(record['id'])
#             print(query)
#             count = count+1
# count

In [103]:
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}battle?s?i?n?g? cov-?id.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [105]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r"fights?i?n?g? cov-?id.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [107]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}covid-?1?9? symptoms.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [110]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}sick with covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [114]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}long-?t?e?r?m? covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [117]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}covid recovery.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [120]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}covid survivor.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [126]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}gott?e?n? covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [130]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}covid battle.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [133]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}covid pneumonia.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [136]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}i had covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(query)
#             print("---------")
#             count = count+1
# count

In [137]:
ids_keep

[62898511,
 63228131,
 62414417,
 59890207,
 60932835,
 59748509,
 59531919,
 58391793,
 54556302,
 60576061,
 57121447,
 53221120,
 56881343,
 54543754,
 58538171,
 59009863,
 54806302,
 59277241,
 57873685,
 57895989,
 54628336,
 54542380,
 58744927,
 54908244,
 62858703,
 62201657,
 62054773,
 62162691,
 62355331,
 61460945,
 61787849,
 62716975,
 61494901,
 61475083,
 62757075,
 62554751,
 57826843,
 60350475,
 60642765,
 60755043,
 59588903,
 60692451,
 60822613,
 60830927,
 54703272,
 53943966,
 47932896,
 53559504,
 54818128,
 54930418,
 52266422,
 59243413,
 52373398,
 52990494,
 53411700,
 59704237,
 49612434,
 59787399,
 58709771,
 49399596,
 50453112,
 59309681,
 50809314,
 53660558,
 54019264,
 59653037,
 54468612,
 51147500,
 57178505,
 54466628,
 56870239,
 56128676,
 54214298,
 57278445,
 58138357,
 57457693,
 57597375,
 60102971,
 54041738,
 59364617,
 56589813,
 59518557,
 59788375,
 60919877,
 59646165,
 54955200,
 60232429,
 59957291,
 55277340,
 59412191,
 53897054,

In [141]:
keep = [61575549, 62671611, 56861517, 56997811, 60511441, 58600223, 58555031, 59593689, 59181743, 60523117, 62005377, 62280397, 62886121, 61042163, 56457202, 50330366, 54027892, 54995850, 54506688, 60941775, 56491695, 60056539, 53529602, 54995850]
for number in keep:
    if number not in ids_keep:
        ids_keep.append(number)

In [143]:
# Filtering out descriptions that talk about losing jobs due to covid. Spot checking for ones to keep
# KEEP: 61575549, 62671611, 56861517, 56997811, 60511441, 58600223, 58555031, 59593689, 59181743, 60523117, 62005377
# 62280397, 62886121, 61042163, 56457202, 50330366,54027892, 54995850, 54506688, 60941775, 56491695, 60056539, 53529602
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}job loss.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_remove.append(record['id'])
#             print(record)
#             print("---------")
# #             count = count+1
# # count

In [145]:
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}lost my job.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_remove.append(record['id'])
#             print(record)
#             print("---------")
# #             count = count+1
# # count

In [147]:
len(ids_keep)

3729

In [150]:
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
#         print(record)
        query = re.findall(r".{10}lost his job.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_remove.append(record['id'])
#             print(record)
#             print("---------")
# #             count = count+1
# # count

In [None]:
for record in dicts:
    if record['id'] in ids_remove:
#         print(record)
        query = re.findall(r".{10}s?h?e? had covid.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
            print(record)
#             print("---------")
# #             count = count+1
# # count

In [158]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
        query = re.findall(r".{10}covid medical bills.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(record)
#             print("---------")
#             count = count+1
# count

{'id': 62227253, 'description': '\n<div>Thank you to everyone that donated.\xa0<span style="letter-spacing:0.0085px;">Sadly, Greg passed away today from another heart attack. They were unable to revive him. I will leave this up if anyone would like to donate for funeral costs.\xa0</span>\n</div>\n<div><br />\n</div>'}
{'id': 62157373, 'description': "\n<div>Okay, here goes nothing. This is new to me, but I'm not sure what else to do.\n</div>\n<div><br />\n</div>\n<div>Hello! My name is Shaira and I struggle with polycystic ovarian syndrome and I am prone to cystadenomas. It started when I was a freshman in Highschool. The largest cyst I've had was the size of a soccer ball and had me in the hospital for a week. I was told I would struggle to have kids, if having them at all. Thankfully my beautifully frustrating body did provide me with two amazing daughters that I cherish.\xa0\n</div>\n<div><br />\n</div>\n<div>Just before Covid hit my Dr. found that I had a new cyst of a concerning s

{'id': 48489754, 'description': 'This campaign is for Chris Spellman from Coeur d\'Alene, ID.\xa0 He is cousins by marriage to my husband, Jay Stokes, and happens to be one of his best friends as well. My name is Emily Stokes and my husband and I were part of the search efforts in finding Chris who was lost in the woods. This campaign is to help offer some relief to medical expenses that are adding up due to a series of unfortunate events resulting in a side by side wreck, 42 hours in the woods alone, a one week stay in the hospital and many physical therapy appointments to follow in the coming months. Please read the story below for full details.<br /><br />On the evening of Saturday, May 2, 2020 Chris Spellman went missing in the woods of Saltese, Montana. After a series of unfortunate events, which ended with his side by sides front lower control arm broken off from the frame, Chris was left stranded in the woods. In his dazed, concussed, and confused state he decided to trek up and

{'id': 62852363, 'description': "\n<div>On Jean's 58th birthday, which was on 12/7/2021, she was diagnosed with COVID-19.\n</div>\n<div><br />\n</div>\n<div>Jean was hospitalized and on a ventilator shortly after. She was transferred from St Francis in Shakopee to United Hospital in St Paul. She was in a drug-induced coma for over a month. During this time, she developed blood clots in her arms and legs and several cysts. She had fevers on and off the entire time. They placed her on a tracheotomy and feeding tube to help her since she was struggling to breathe independently. \n</div>\n<div><br />\n</div>\n<div>After over a month on the ventilator, they took her off the medication that kept her in a coma. It took longer than usual for her to come out of the coma. She was unable to move any of her limbs. In the last few days, she has moved her legs, but her arms are still feeble.\n</div>\n<div><br />\n</div>\n<div>She is learning how to talk with the tracheotomy but gets very tired.  She

In [169]:
# NOT: 63006927, 61976211
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
        query = re.findall(r".{10}became ill.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(record)
#             print("---------")
#             count = count+1
# count

In [None]:
count = 0
for record in dicts:
    if (record['id'] not in ids_keep) & (record['id'] not in ids_remove):
        query = re.findall(r".{10}became ill.{10}", record['description'], re.IGNORECASE)
        if len(query) > 0:
            ids_keep.append(record['id'])
#             print(record)
#             print("---------")
#             count = count+1
# count

In [171]:
remove = [63006927, 61976211,63119443, 58482687, 62784797, 60266311, 61575311, 51428548, 47738236, 47130870,
46962424,
50091280,
56482517,
57842437,
58176765,
47074122,
56339524,
56396848,
47456678,
56375934,
51608014,
56680047,
47471608,
55105018]

In [172]:
final_ids_keep = []
for number in ids_keep:
    if number not in remove:
        final_ids_keep.append(number)

In [173]:
len(final_ids_keep)

3741

In [None]:
#{'name': 'Covid19', 'id': 50001290}

### Filtering my dataframe based off the list of ids I want to keep:

In [174]:
clean_df = df[df.id.isin(final_ids_keep)]

In [176]:
clean_df.to_csv("go-fund-me-campaigns-clean.csv", index=False)