# Dogs for Adoption Scrape and Text!

The website https://labsandmore.org/dogs/ uploads dogs in California that are up for adoption. The problem is the waitlist fills up extremely fast and there's no notification system or email list.

Thus I've been called to duty to scrape the website every hour and send a text when new dogs have been found for my friend looking to adopt!

I use the package Selenium to scrape the website, Pandas for some of the minor data wrangling, and finally Yagmail to send the text (from email).

## Importing packages

In [1]:
import pandas as pd
import yagmail
import time
import re

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

## Using Selenium to open website

In [2]:
# Setting path and opening browser
path = r'C:\Users\lukef\AppData\Local\BrowserDriver\geckodriver.exe'

# Setting headless
options = webdriver.FirefoxOptions()
options.headless = True

driver = webdriver.Firefox(executable_path = path, options = options)

# Navigating to website
driver.get('https://labsandmore.org/dogs/')

## Scraping the breed, details, and names

In [3]:
i = 0
breed = []
details = []
name = []


# First looping through breed and details elements
for li in driver.find_elements_by_xpath("//div[@class='property-stats']//ul//li"):
       
   
    # Alternates between breed and details in the li (list) elements
    if i % 2 == 0:
        breed.append(li.text)
    else:
        details.append(li.text)
    
    i += 1
    
# Now looping through the name elements
for li in driver.find_elements_by_xpath("//div[@class='tile-footer']/div[@class='price']/a"):
       
    name.append(li.text)


driver.close()

## Creating pandas dataframe

In [4]:
df = pd.DataFrame(dict(name=name,breed=breed,details=details))
df

Unnamed: 0,name,breed,details
0,America,Shepherd (mix),"Female, 12 years, 46 lbs. ID 7874"
1,Artichoke,Husky (mix),"Female, 2 years, 60 lbs. ID 11658"
2,Artichoke Pup - Broccoli,Husky - Shepherd (mix),"Female, 7 weeks, 5 lbs. ID 11659"
3,Artichoke Pup - Celery,Husky - Shepherd (mix),"Male, 7 weeks, 5 lbs. ID 11660"
4,Artichoke Pup - Cucumber,Husky - Shepherd (mix),"Male, 7 weeks, 5 lbs. ID 11661"
...,...,...,...
155,Yoda,Labrador Retriever - American Staffordshire Te...,"Male, 3 years, 54 lbs. ID 11751"
156,Yoko - Adopted!,Boxer (purebred),"Female, 2 years, 63 lbs. ID 11365"
157,Zinc,Terrier - Pointer (mix),"Female, 2 years, 32 lbs. ID 11531"
158,Zipper - Adopted!,Labrador Retriever - Cattle Dog (mix),"Female, 2 years, 40 lbs. ID 11489"


## Extracting the info from the details column

In [5]:
df[['gender','age','weight','id']] = df['details'].str.extract(pat = r'(.*?),\s(.*?),\s(.*?)\.\sID\s(\d*)')

df.drop(columns = 'details', inplace = True)

df['id'] = df['id'].astype('int32')

df

Unnamed: 0,name,breed,gender,age,weight,id
0,America,Shepherd (mix),Female,12 years,46 lbs,7874
1,Artichoke,Husky (mix),Female,2 years,60 lbs,11658
2,Artichoke Pup - Broccoli,Husky - Shepherd (mix),Female,7 weeks,5 lbs,11659
3,Artichoke Pup - Celery,Husky - Shepherd (mix),Male,7 weeks,5 lbs,11660
4,Artichoke Pup - Cucumber,Husky - Shepherd (mix),Male,7 weeks,5 lbs,11661
...,...,...,...,...,...,...
155,Yoda,Labrador Retriever - American Staffordshire Te...,Male,3 years,54 lbs,11751
156,Yoko - Adopted!,Boxer (purebred),Female,2 years,63 lbs,11365
157,Zinc,Terrier - Pointer (mix),Female,2 years,32 lbs,11531
158,Zipper - Adopted!,Labrador Retriever - Cattle Dog (mix),Female,2 years,40 lbs,11489


## Checking if dog is unavailable
i.e. already adopted or the waiting list is full.

In [6]:
df['unavailable'] = df['name'].str.lower().str.contains('adopted') \
                  | df['name'].str.lower().str.contains('list full')
df

Unnamed: 0,name,breed,gender,age,weight,id,unavailable
0,America,Shepherd (mix),Female,12 years,46 lbs,7874,False
1,Artichoke,Husky (mix),Female,2 years,60 lbs,11658,False
2,Artichoke Pup - Broccoli,Husky - Shepherd (mix),Female,7 weeks,5 lbs,11659,False
3,Artichoke Pup - Celery,Husky - Shepherd (mix),Male,7 weeks,5 lbs,11660,False
4,Artichoke Pup - Cucumber,Husky - Shepherd (mix),Male,7 weeks,5 lbs,11661,False
...,...,...,...,...,...,...,...
155,Yoda,Labrador Retriever - American Staffordshire Te...,Male,3 years,54 lbs,11751,False
156,Yoko - Adopted!,Boxer (purebred),Female,2 years,63 lbs,11365,True
157,Zinc,Terrier - Pointer (mix),Female,2 years,32 lbs,11531,False
158,Zipper - Adopted!,Labrador Retriever - Cattle Dog (mix),Female,2 years,40 lbs,11489,True


## Reading in saved (old) dataframe 

In [7]:
try:
    old = pd.read_csv('dogs.csv')
except:
    old = pd.DataFrame(columns = df.columns)
    
old

Unnamed: 0,name,breed,gender,age,weight,id,unavailable
0,America,Shepherd (mix),Female,12 years,46 lbs,7874,False
1,Asteroid - (Medical),Labrador Retriever - Cattle Dog (mix),Female,3 years,28 lbs,11632,False
2,Barcelona - Adopted!,Labrador Retriever - Shepherd (mix),Female,4 months,15 lbs,11636,True
3,Barnes - Waiting List Full,Labrador Retriever - Shepherd (mix),Female,5 months,10 lbs,11474,True
4,Bayleigh,Labrador Retriever (mix),Female,4 years,47 lbs,11547,False
...,...,...,...,...,...,...,...
209,,Shepherd (mix),Female,1 year,42 lbs,11410,
210,,Labrador Retriever - Terrier (mix),Female,5 years,30 lbs,10888,
211,,Labrador Retriever (mix),Female,2 years,40 lbs,11409,
212,,Labrador Retriever (mix),Male,1 year,60 lbs,11282,


## Send email function for if new dogs are found

In [13]:
def send_email(df):
    sender_email = 'lukefeilbergp@gmail.com'
    receiver_email = 'enter_phone_number_here@tmomail.net'
    subject = "New dogs alert!"
    password = 'Enter_Password_Here'
    
    contents = []
    
    yag = yagmail.SMTP(user=sender_email, password=password)
    
    # Displaying at most 5
    for i in range(min(len(df), 5)):
        contents.append('')
        contents.append(df.iloc[i]['name']
                        + ', '
                        + df.iloc[i]['breed']
                        + ', '
                        + df.iloc[i]['age'])
    
    contents.extend(['', 'https://labsandmore.org/dogs/'])

    yag.send(receiver_email, subject, contents);

## Checking for new dogs, sending email if so!

In [15]:
if sum(~df['id'].isin(old['id'])) > 0:
    print('New dogs found! :-)')
    
    new_dogs = df[~df['id'].isin(old['id'])]
    
    send_email(new_dogs)

New dogs found! :-)


## Concatenating and dropping duplicates

In [16]:
old_and_new = pd.concat([df, old])  

old_and_new.drop_duplicates(subset='id',keep='first',inplace=True)

old_and_new

Unnamed: 0,name,breed,gender,age,weight,id,unavailable
0,America,Shepherd (mix),Female,12 years,46 lbs,7874,False
1,Artichoke,Husky (mix),Female,2 years,60 lbs,11658,False
2,Artichoke Pup - Broccoli,Husky - Shepherd (mix),Female,7 weeks,5 lbs,11659,False
3,Artichoke Pup - Celery,Husky - Shepherd (mix),Male,7 weeks,5 lbs,11660,False
4,Artichoke Pup - Cucumber,Husky - Shepherd (mix),Male,7 weeks,5 lbs,11661,False
...,...,...,...,...,...,...,...
209,,Shepherd (mix),Female,1 year,42 lbs,11410,
210,,Labrador Retriever - Terrier (mix),Female,5 years,30 lbs,10888,
211,,Labrador Retriever (mix),Female,2 years,40 lbs,11409,
212,,Labrador Retriever (mix),Male,1 year,60 lbs,11282,


## Saving to csv for next time!

In [17]:
old_and_new.to_csv('dogs.csv', index = False)