## Table Of Contents

1. <a href='#scraping'>Scrape Data</a>


2. <a href='#clean'>Clean Data</a>


3. <a href='#recommend1'>Recommend Using Description-Cosine Similarity</a>


4. <a href='#recommend2'>Recommend Using More Information-Cosine Similarity</a>
    
    
5. <a href='#recommend3'>Recommend Using both Descriptions and More Information</a>


6. <a href='#images'>Scrape Images</a>


7. <a href= '#misc'>Miscellaneous</a>
    


# Scrape Data
<a id='scraping'></a>

In [1]:
#import modules
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np
import pickle
import urllib.request
import time

In [2]:
#create the url links to gather more urls to descriptions
url_of_pages = []
for n in range(2,15):
    url_of_pages.append(f'https://www.houseplant411.com/choose/page/{n}?choose-plant=1')
url_of_pages.append('https://www.houseplant411.com/choose?choose-plant=1')
print(url_of_pages)

['https://www.houseplant411.com/choose/page/2?choose-plant=1', 'https://www.houseplant411.com/choose/page/3?choose-plant=1', 'https://www.houseplant411.com/choose/page/4?choose-plant=1', 'https://www.houseplant411.com/choose/page/5?choose-plant=1', 'https://www.houseplant411.com/choose/page/6?choose-plant=1', 'https://www.houseplant411.com/choose/page/7?choose-plant=1', 'https://www.houseplant411.com/choose/page/8?choose-plant=1', 'https://www.houseplant411.com/choose/page/9?choose-plant=1', 'https://www.houseplant411.com/choose/page/10?choose-plant=1', 'https://www.houseplant411.com/choose/page/11?choose-plant=1', 'https://www.houseplant411.com/choose/page/12?choose-plant=1', 'https://www.houseplant411.com/choose/page/13?choose-plant=1', 'https://www.houseplant411.com/choose/page/14?choose-plant=1', 'https://www.houseplant411.com/choose?choose-plant=1']


In [5]:
#main function to loop over urls to get more urls.
def scrape(url_of_pages):
    list_of_urls = []
    list_of_soups = []
    for u in url_of_pages:
        url = u
        request = urllib.request.Request(url)
        response = urllib.request.urlopen(request)
        soup = BeautifulSoup(response.read())
        list_of_soups.append(soup)

    for soup in list_of_soups:
        for link in soup.findAll('a', attrs={'href':re.compile("https://www.houseplant411.com/")}):
            url = link.get('href')
            list_of_urls.append(url)

    k = set(list_of_urls)

    q = sorted(list(k))

    return q

In [6]:
#run the function.
q = scrape(url_of_pages)  

In [12]:
q #list of all links, but some of the links are irrelevant

['https://www.houseplant411.com/',
 'https://www.houseplant411.com/about-us',
 'https://www.houseplant411.com/about-us/contact-us',
 'https://www.houseplant411.com/askjudy',
 'https://www.houseplant411.com/choose',
 'https://www.houseplant411.com/choose/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/1/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/10/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/10?choose-plant=1',
 'https://www.houseplant411.com/choose/page/11/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/11?choose-plant=1',
 'https://www.houseplant411.com/choose/page/12/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/12?choose-plant=1',
 'https://www.houseplant411.com/choose/page/13/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/13?choose-plant=1',
 'https://www.houseplant411.com/choose/page/14/?choose-plant=1',
 'https://www.houseplant411.com/choose/page/2/?choose-plant=1',
 'https://www.house

In [11]:
q[35] #the first link that we want

'https://www.houseplant411.com/houseplant/agave-plants-how-to-grow-care-for-agave-plants-indoors'

In [8]:
q[-31] #the link we want to stop at 

'https://www.houseplant411.com/how-to-water-houseplants'

In [90]:
len(q[35:-31]) #the total number of relevant links. 

136

In [17]:
url_list_of_plants = q[35: -31] #the actual links we want.

In [92]:
url = url_list_of_plants[0]
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
soup = BeautifulSoup(response.read())

In [199]:
#main function to scrape descriptions, etc. from the links we want 
def scrape2(url_list_of_plants):
    list_of_more_info = []       #to collect info other than description
    list_of_descriptions = []    # to collect descriptions
    list_of_species = []        #to collect species names
    list_of_titles = []         #to collect the common names of the plants
    list_of_soups = []          #to collect the html for each url
    for url in url_list_of_plants:
        request = urllib.request.Request(url)
        response = urllib.request.urlopen(request)
        soup = BeautifulSoup(response.read())
        list_of_soups.append(soup)
    
    for soup in list_of_soups:
        title = soup.find("h2", attrs={'title'}).get_text()
        list_of_titles.append(title)
    
    for soup in list_of_soups:
        try: 
            species = soup.find("div", attrs={"clear resultSpecies"}).get_text()
            list_of_species.append(species)
        except: 
            list_of_species.append('NA')
            
            print(list_of_species)
        
        
    for soup in list_of_soups:
        description = soup.find("div", attrs={'boxExcerpt'}).get_text()
        list_of_descriptions.append(description)
        
    for soup in list_of_soups:
        more_info = soup.find("div", attrs = {"post-meta"})   
        string = str(more_info)  
        string2 = re.sub('\r', '', string)    
        string3 = re.sub('\n', '', string2)
        string4 = re.sub('\xa0', '', string3)
        edit = re.sub('<span class="popUpMid">.*?<span class="popUpBottom">', '', string4)
        edit2 = re.sub('<.*?>', '', edit)
        list_of_more_info.append(edit2)
     
    #put information in a dataframe
    d = {'title': list_of_titles, 'species': list_of_species, 'description': list_of_descriptions, 'more_info': list_of_more_info}
    df = pd.DataFrame(data=d)
    
    
    return df

In [200]:
scrape2(url_list_of_plants)

['Agave attenuata', 'Alocasia Amazonica', 'Aloe Vera', 'Hippeastrum', 'Begonia Coccinea', 'Anthurium', 'Asparagus sprengeri fern', 'Rhododendron', 'Soleirolia  helxine', 'Polyscias-scutellaria', 'Chamaedorea seifrizii', 'Begonia', 'Begoniaceae', 'Asplenium nidus', 'Clerodendrum', 'Nephrolepis exaltata', 'Bougainvillea', 'Aechmea fasciata', 'Bromeliaceae', 'Caladium', 'Calathea roseopicta', 'Calathea ornata lineata', 'Zantedeschia aethiopica', 'Aspidistra', 'Radermachera sinica', 'Aglaonema', 'Chrysanthemum', 'Pericallis cruenta', 'Clivia miniata', 'c.arabica', 'Codiaeum variegatum', 'Euphorbia milii', 'NA']
['Agave attenuata', 'Alocasia Amazonica', 'Aloe Vera', 'Hippeastrum', 'Begonia Coccinea', 'Anthurium', 'Asparagus sprengeri fern', 'Rhododendron', 'Soleirolia  helxine', 'Polyscias-scutellaria', 'Chamaedorea seifrizii', 'Begonia', 'Begoniaceae', 'Asplenium nidus', 'Clerodendrum', 'Nephrolepis exaltata', 'Bougainvillea', 'Aechmea fasciata', 'Bromeliaceae', 'Caladium', 'Calathea roseo

Unnamed: 0,title,species,description,more_info
0,Agave Plant,Agave attenuata,"\nThe agave plant, which originated in Mexico,...",Light Agave plants require very bright light ...
1,Alocasia,Alocasia Amazonica,"\nAn Alocasia plant, native to Asia, is also c...",Light An Alocasia plant requires very bright ...
2,Aloe Vera Plant,Aloe Vera,"\nAn Aloe Vera plant is an easy care, drought ...",Light An Aloe Vera plant requires very bright...
3,Amaryllis,Hippeastrum,\nAmaryllis plants are native to the tropical ...,Light Amaryllis plants need bright indirect l...
4,Angel Wing Begonia,Begonia Coccinea,"\nAngel Wing begonia plants, first found in So...",Light Angel wing begonias like bright indirec...
5,Anthurium,Anthurium,\nAnthurium is a large genus of plants contain...,Light Anthurium plants like as much bright in...
6,Asparagus Ferns,Asparagus sprengeri fern,"\nEasy care asparagus fern plants, native to S...",Light Asparagus ferns grow best in bright ind...
7,Azalea,Rhododendron,"\nAzaleas, the national flower of Nepal, are p...",Light Azalea plants require bright indirect l...
8,Baby’s Tears Plant,Soleirolia helxine,\nA Baby’s Tears plant is a delicate looking h...,Light Baby’s Tears plants like bright indirec...
9,Aralia Plant – Balfour,Polyscias-scutellaria,\nAn Aralia is an evergreen plant native to Af...,Light Balfour Aralia plants do best in bright...


In [201]:
#dump dataframe into picklefile

with open('plants_df.pkl', 'wb') as picklefile:
    pickle.dump(scrape2(url_list_of_plants), picklefile)

['Agave attenuata', 'Alocasia Amazonica', 'Aloe Vera', 'Hippeastrum', 'Begonia Coccinea', 'Anthurium', 'Asparagus sprengeri fern', 'Rhododendron', 'Soleirolia  helxine', 'Polyscias-scutellaria', 'Chamaedorea seifrizii', 'Begonia', 'Begoniaceae', 'Asplenium nidus', 'Clerodendrum', 'Nephrolepis exaltata', 'Bougainvillea', 'Aechmea fasciata', 'Bromeliaceae', 'Caladium', 'Calathea roseopicta', 'Calathea ornata lineata', 'Zantedeschia aethiopica', 'Aspidistra', 'Radermachera sinica', 'Aglaonema', 'Chrysanthemum', 'Pericallis cruenta', 'Clivia miniata', 'c.arabica', 'Codiaeum variegatum', 'Euphorbia milii', 'NA']
['Agave attenuata', 'Alocasia Amazonica', 'Aloe Vera', 'Hippeastrum', 'Begonia Coccinea', 'Anthurium', 'Asparagus sprengeri fern', 'Rhododendron', 'Soleirolia  helxine', 'Polyscias-scutellaria', 'Chamaedorea seifrizii', 'Begonia', 'Begoniaceae', 'Asplenium nidus', 'Clerodendrum', 'Nephrolepis exaltata', 'Bougainvillea', 'Aechmea fasciata', 'Bromeliaceae', 'Caladium', 'Calathea roseo

# Clean Data
<a id='clean'></a>

In [213]:
import string
import re
import pickle
import nltk

In [232]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [257]:
from sklearn.metrics.pairwise import cosine_similarity

In [202]:
with open('plants_df.pkl', 'rb') as f:
    x = pickle.load(f)

In [203]:
x

Unnamed: 0,title,species,description,more_info
0,Agave Plant,Agave attenuata,"\nThe agave plant, which originated in Mexico,...",Light Agave plants require very bright light ...
1,Alocasia,Alocasia Amazonica,"\nAn Alocasia plant, native to Asia, is also c...",Light An Alocasia plant requires very bright ...
2,Aloe Vera Plant,Aloe Vera,"\nAn Aloe Vera plant is an easy care, drought ...",Light An Aloe Vera plant requires very bright...
3,Amaryllis,Hippeastrum,\nAmaryllis plants are native to the tropical ...,Light Amaryllis plants need bright indirect l...
4,Angel Wing Begonia,Begonia Coccinea,"\nAngel Wing begonia plants, first found in So...",Light Angel wing begonias like bright indirec...
5,Anthurium,Anthurium,\nAnthurium is a large genus of plants contain...,Light Anthurium plants like as much bright in...
6,Asparagus Ferns,Asparagus sprengeri fern,"\nEasy care asparagus fern plants, native to S...",Light Asparagus ferns grow best in bright ind...
7,Azalea,Rhododendron,"\nAzaleas, the national flower of Nepal, are p...",Light Azalea plants require bright indirect l...
8,Baby’s Tears Plant,Soleirolia helxine,\nA Baby’s Tears plant is a delicate looking h...,Light Baby’s Tears plants like bright indirec...
9,Aralia Plant – Balfour,Polyscias-scutellaria,\nAn Aralia is an evergreen plant native to Af...,Light Balfour Aralia plants do best in bright...


In [208]:
#function to remove new lines
def remove_newlines(element):
    return re.sub('\n', '', element)

In [210]:
k = x['description'].apply(remove_newlines) #apply function to remove newlines in description

In [211]:
x['description_edited'] = k   #create a description_edited column for new description without new lines

In [212]:
x.head()

Unnamed: 0,title,species,description,more_info,description_edited
0,Agave Plant,Agave attenuata,"\nThe agave plant, which originated in Mexico,...",Light Agave plants require very bright light ...,"The agave plant, which originated in Mexico, t..."
1,Alocasia,Alocasia Amazonica,"\nAn Alocasia plant, native to Asia, is also c...",Light An Alocasia plant requires very bright ...,"An Alocasia plant, native to Asia, is also cal..."
2,Aloe Vera Plant,Aloe Vera,"\nAn Aloe Vera plant is an easy care, drought ...",Light An Aloe Vera plant requires very bright...,"An Aloe Vera plant is an easy care, drought re..."
3,Amaryllis,Hippeastrum,\nAmaryllis plants are native to the tropical ...,Light Amaryllis plants need bright indirect l...,Amaryllis plants are native to the tropical re...
4,Angel Wing Begonia,Begonia Coccinea,"\nAngel Wing begonia plants, first found in So...",Light Angel wing begonias like bright indirec...,"Angel Wing begonia plants, first found in Sout..."


In [219]:
x['description_edited'].iloc[60]      #checking to see what else needs to be removed

'The Dracaena compacta plant, a compact, slow growing member of the Dracaena family, is native to South East Africa. It is often referred to as Dracaena fragrans or Dracaena deremensis. The closest relative of the Compacta is the Dracaena janet craig, but the Compacta is quite different in appearance. The Dracaena Compacta has a thick green stem and several clumps of short dark green leaves 2″-4″ in length. It’s a beautiful addition to homes or offices, but it’s a bit more difficult and more expensive than other dracaenas.Dracaena compacta plants are considered by some to be slightly poisonous, especially to dogs and cats. Read more about common houseplants that are poisonous in Don’t Feed Me To Your Cat! A Guide to Poisonous Houseplants\xa0\xa0 '

In [247]:
def clean(element):    #functio to clean the description even more, removing punctuation and lower-casing everything
    regex = re.compile('[%s]' % re.escape('."#$%&\()*+-/:;<=>@[\\]^_`{|}~,!?'))
    out = regex.sub(' ', element)
    out = out.replace('\xa0', ' ')
    out = out.lower()
    return out

In [248]:
j = x['description_edited'].apply(clean)

In [250]:
j.iloc[4]

'angel wing begonia plants  first found in south america  make up a large portion of the cane begonia group  all cane begonia plants have long stems with “joints” on them  the leaves and flowers of begonia plants grow out of these joints  angel wing begonias have large  “angel wing” shaped  dark green leaves  often with metallic silver specks  the underside of the plant leaf is usually a deep red  angel wing begonias produce hanging clusters of delicate flowers in red  white  orange  or pink  the intensity of the color of the flowers and leaves depends upon how much light the plant gets  angel wing begonias are beautiful  easy  care  flowering plants that brighten your home all year  they are considered poisonous and should be kept away from pets and children  read more about common houseplants that are poisonous in my book don’t feed me to your cat  a guide to poisonous houseplants angel wing begonia '

In [251]:
x['description_clean'] = j

In [253]:
m = x['more_info'].apply(clean)

In [255]:
x['more_info_clean'] = m

In [256]:
x.head()

Unnamed: 0,title,species,description,more_info,description_edited,description_clean,more_info_clean
0,Agave Plant,Agave attenuata,"\nThe agave plant, which originated in Mexico,...",Light Agave plants require very bright light ...,"The agave plant, which originated in Mexico, t...",the agave plant which originated in mexico t...,light agave plants require very bright light ...
1,Alocasia,Alocasia Amazonica,"\nAn Alocasia plant, native to Asia, is also c...",Light An Alocasia plant requires very bright ...,"An Alocasia plant, native to Asia, is also cal...",an alocasia plant native to asia is also cal...,light an alocasia plant requires very bright ...
2,Aloe Vera Plant,Aloe Vera,"\nAn Aloe Vera plant is an easy care, drought ...",Light An Aloe Vera plant requires very bright...,"An Aloe Vera plant is an easy care, drought re...",an aloe vera plant is an easy care drought re...,light an aloe vera plant requires very bright...
3,Amaryllis,Hippeastrum,\nAmaryllis plants are native to the tropical ...,Light Amaryllis plants need bright indirect l...,Amaryllis plants are native to the tropical re...,amaryllis plants are native to the tropical re...,light amaryllis plants need bright indirect l...
4,Angel Wing Begonia,Begonia Coccinea,"\nAngel Wing begonia plants, first found in So...",Light Angel wing begonias like bright indirec...,"Angel Wing begonia plants, first found in Sout...",angel wing begonia plants first found in sout...,light angel wing begonias like bright indirec...


# Recommend using Description - Cosine Similarity
<a id='recommend1'></a>

In [298]:
vectorizer = TfidfVectorizer(stop_words = 'english')   #vectorize descriptions, after removal of english stop words
v = vectorizer.fit_transform(x['description_clean'])

In [299]:
cosine_sim = cosine_similarity(v,v)  # using cosine similarity to find the similarity between descriptions
print(cosine_sim)

[[1.         0.11061665 0.1196746  ... 0.10179903 0.12246633 0.06544282]
 [0.11061665 1.         0.12332514 ... 0.05753065 0.1108604  0.101335  ]
 [0.1196746  0.12332514 1.         ... 0.04693279 0.06833895 0.05688408]
 ...
 [0.10179903 0.05753065 0.04693279 ... 1.         0.32611384 0.04418455]
 [0.12246633 0.1108604  0.06833895 ... 0.32611384 1.         0.07992568]
 [0.06544282 0.101335   0.05688408 ... 0.04418455 0.07992568 1.        ]]


In [300]:
cosine_sim.shape

(136, 136)

In [301]:
indices = list(x['title'])

In [302]:
column_names = indices
row_names = indices
#giving column and row names to cosine similarity matrix
df = pd.DataFrame(cosine_sim, columns = column_names, index = row_names)
df

Unnamed: 0,Agave Plant,Alocasia,Aloe Vera Plant,Amaryllis,Angel Wing Begonia,Anthurium,Asparagus Ferns,Azalea,Baby’s Tears Plant,Aralia Plant – Balfour,...,Pygmy Date Palm,Schefflera Plant,Selaginella Plant,Spider Plant,Split Leaf Philodendron,Strawberry Begonia Plant,Stromanthe Tricolor Plant,Terrariums,Wandering Jew Plant,Zebra Plant
Agave Plant,1.000000,0.110617,0.119675,0.089797,0.058793,0.069313,0.065810,0.083404,0.055870,0.100513,...,0.031077,0.063959,0.065215,0.109906,0.091311,0.032285,0.049610,0.101799,0.122466,0.065443
Alocasia,0.110617,1.000000,0.123325,0.086799,0.071677,0.093637,0.050175,0.086933,0.081396,0.115208,...,0.016582,0.074505,0.056784,0.109899,0.114695,0.044807,0.067210,0.057531,0.110860,0.101335
Aloe Vera Plant,0.119675,0.123325,1.000000,0.060195,0.046701,0.055496,0.057229,0.064367,0.054691,0.092377,...,0.011678,0.059486,0.031719,0.099772,0.107831,0.043190,0.052802,0.046933,0.068339,0.056884
Amaryllis,0.089797,0.086799,0.060195,1.000000,0.084696,0.096521,0.053933,0.083418,0.022161,0.087475,...,0.015874,0.056238,0.034552,0.076306,0.057418,0.021298,0.044110,0.029190,0.077258,0.075056
Angel Wing Begonia,0.058793,0.071677,0.046701,0.084696,1.000000,0.075782,0.040247,0.065126,0.044307,0.076830,...,0.011570,0.051430,0.030629,0.062631,0.057014,0.212387,0.038386,0.054397,0.104825,0.050844
Anthurium,0.069313,0.093637,0.055496,0.096521,0.075782,1.000000,0.049130,0.156343,0.038598,0.057034,...,0.002320,0.050899,0.028192,0.069185,0.076832,0.025343,0.043291,0.048464,0.079947,0.042816
Asparagus Ferns,0.065810,0.050175,0.057229,0.053933,0.040247,0.049130,1.000000,0.040548,0.070206,0.050116,...,0.022802,0.037901,0.046099,0.076037,0.049011,0.021486,0.021076,0.048665,0.071315,0.046916
Azalea,0.083404,0.086933,0.064367,0.083418,0.065126,0.156343,0.040548,1.000000,0.035819,0.065180,...,0.004408,0.055402,0.012646,0.050577,0.061011,0.027885,0.031975,0.045139,0.068459,0.029266
Baby’s Tears Plant,0.055870,0.081396,0.054691,0.022161,0.044307,0.038598,0.070206,0.035819,1.000000,0.059964,...,0.016400,0.035936,0.046376,0.134471,0.047752,0.024707,0.041726,0.155644,0.044783,0.072662
Aralia Plant – Balfour,0.100513,0.115208,0.092377,0.087475,0.076830,0.057034,0.050116,0.065180,0.059964,1.000000,...,0.031209,0.081830,0.086600,0.104471,0.073343,0.031817,0.043129,0.060915,0.114597,0.049517


In [303]:
#function to recommend 5 different, yet most similar plants based on a choice of one of the plants in the list. 
def recommend(title):
    return df[title].sort_values(ascending=False).head(6).tail(5).index

In [304]:
recommend('Amaryllis')

Index(['Clivia Plant', 'Begonia Plant', 'Kalanchoe Plant', 'Easter Lily Plant',
       'Bougainvillea'],
      dtype='object')

# Recommend using More_Info - Cosine Similarity
<a id='recommend2'></a>

In [305]:
vectorizer = TfidfVectorizer(stop_words = 'english') #vectorize mroe_info, after removal of english stop words
v = vectorizer.fit_transform(x['more_info_clean'])

In [306]:
cosine_sim1 = cosine_similarity(v,v)
print(cosine_sim1)

[[1.         0.08152372 0.0694341  ... 0.06932207 0.045465   0.05494457]
 [0.08152372 1.         0.09123879 ... 0.11045984 0.09147846 0.10365583]
 [0.0694341  0.09123879 1.         ... 0.05438242 0.04863853 0.05743957]
 ...
 [0.06932207 0.11045984 0.05438242 ... 1.         0.05757973 0.06854177]
 [0.045465   0.09147846 0.04863853 ... 0.05757973 1.         0.05718725]
 [0.05494457 0.10365583 0.05743957 ... 0.06854177 0.05718725 1.        ]]


In [308]:
indices = list(x['title'])

In [309]:
column_names = indices
row_names = indices
#giving column and row names to cosine similarity matrix
df1 = pd.DataFrame(cosine_sim1, columns = column_names, index = row_names)
df1

Unnamed: 0,Agave Plant,Alocasia,Aloe Vera Plant,Amaryllis,Angel Wing Begonia,Anthurium,Asparagus Ferns,Azalea,Baby’s Tears Plant,Aralia Plant – Balfour,...,Pygmy Date Palm,Schefflera Plant,Selaginella Plant,Spider Plant,Split Leaf Philodendron,Strawberry Begonia Plant,Stromanthe Tricolor Plant,Terrariums,Wandering Jew Plant,Zebra Plant
Agave Plant,1.000000,0.081524,0.069434,0.039769,0.065289,0.087713,0.022973,0.052189,0.057618,0.076431,...,0.038961,0.037692,0.038201,0.132323,0.065849,0.038876,0.037163,0.069322,0.045465,0.054945
Alocasia,0.081524,1.000000,0.091239,0.084676,0.135810,0.133237,0.046960,0.087414,0.096881,0.104315,...,0.062820,0.085690,0.076530,0.209339,0.150127,0.068756,0.083943,0.110460,0.091478,0.103656
Aloe Vera Plant,0.069434,0.091239,1.000000,0.049744,0.073846,0.079272,0.031130,0.040589,0.057786,0.085597,...,0.048083,0.040017,0.042456,0.122127,0.080868,0.042569,0.048873,0.054382,0.048639,0.057440
Amaryllis,0.039769,0.084676,0.049744,1.000000,0.070413,0.066826,0.023370,0.047502,0.043294,0.058834,...,0.034260,0.032435,0.031311,0.097947,0.061671,0.031361,0.036749,0.062294,0.035165,0.048658
Angel Wing Begonia,0.065289,0.135810,0.073846,0.070413,1.000000,0.121809,0.044894,0.073954,0.071282,0.087380,...,0.065330,0.060971,0.058675,0.168202,0.144606,0.296632,0.064046,0.107110,0.075819,0.076894
Anthurium,0.087713,0.133237,0.079272,0.066826,0.121809,1.000000,0.041070,0.090108,0.082400,0.100585,...,0.057436,0.065259,0.057332,0.198708,0.124205,0.052474,0.057448,0.120790,0.071075,0.089150
Asparagus Ferns,0.022973,0.046960,0.031130,0.023370,0.044894,0.041070,1.000000,0.024955,0.029452,0.037989,...,0.035410,0.024710,0.031599,0.067837,0.041277,0.021992,0.024038,0.037491,0.026436,0.027998
Azalea,0.052189,0.087414,0.040589,0.047502,0.073954,0.090108,0.024955,1.000000,0.049571,0.067646,...,0.032748,0.044821,0.039168,0.113287,0.070058,0.036080,0.035377,0.084611,0.042360,0.057227
Baby’s Tears Plant,0.057618,0.096881,0.057786,0.043294,0.071282,0.082400,0.029452,0.049571,1.000000,0.073674,...,0.043261,0.038259,0.046620,0.187684,0.076937,0.079876,0.050281,0.058948,0.055293,0.064259
Aralia Plant – Balfour,0.076431,0.104315,0.085597,0.058834,0.087380,0.100585,0.037989,0.067646,0.073674,1.000000,...,0.049641,0.063244,0.054965,0.179076,0.088852,0.045899,0.057597,0.090482,0.064102,0.074282


In [310]:
#function to recommend 5 different, yet most similar plants based on a choice of one of the plants in the list. 
def recommend_more_info(title):
    return df1[title].sort_values(ascending=False).head(6).tail(5).index

In [311]:
recommend_more_info('Amaryllis')

Index(['Calla Lily Plant', 'Spider Plant', 'Shamrock Plant', 'Caladium Plant',
       'Orchid – Cymbidium'],
      dtype='object')

# Recommend using both Description + More_info
<a id='recommend3'></a>

In [312]:
#function that averages the two matrices above to create a more robust recommendation using description and more info.
def recommend_both_factors(title):
    df3 = (df + df1)/2
    return df3[title].sort_values(ascending=False).head(6).tail(5).index

In [313]:
recommend_both_factors('Amaryllis')

Index(['Calla Lily Plant', 'Clivia Plant', 'Kalanchoe Plant', 'Caladium Plant',
       'Spider Plant'],
      dtype='object')

In [397]:
df3 = (df + df1)/2

In [399]:
with open('rec_matrix.pkl', 'wb') as picklefile:
    pickle.dump(df3, picklefile)

# Scrape Images
<a id='images'></a>

In [None]:
#scrape the url of the images of the plants
def scrape3(url_list_of_plants):
    list_of_urls = []           #to collect the url for the images 
    list_of_soups = []          #to collect the html for each url
    for url in url_list_of_plants:
        request = urllib.request.Request(url)
        response = urllib.request.urlopen(request)
        soup = BeautifulSoup(response.read())
        list_of_soups.append(soup)
    
    for soup in list_of_soups:
        for link in soup.findAll('a', attrs={'href':re.compile("https://www.houseplant411.com/wp-content/uploads/")}):
            url = link.get('href')
            list_of_urls.append(url)
            
    return list_of_urls

In [None]:
i = scrape3(url_list_of_plants)

In [368]:
i

['https://www.houseplant411.com/wp-content/uploads/Agave-attenuata-08-1.jpg',
 'https://www.houseplant411.com/wp-content/uploads/Alocasia-X-amazonica-14-3.jpg',
 'https://www.houseplant411.com/wp-content/uploads/cb76738f43d2d7cfae8f87b3a865ec31.jpg',
 'https://www.houseplant411.com/wp-content/uploads/1024px-Amaryllis_belladonna_flowers.jpg',
 'https://www.houseplant411.com/wp-content/uploads/60ed368c-577f-4914-8d03-86782e0ff2a9.jpg',
 'https://www.houseplant411.com/wp-content/uploads/Anthurium-Red-06-2.jpg',
 'https://www.houseplant411.com/wp-content/uploads/Asparagus.fern_.2010.11.07-1024x848-3.jpg',
 'https://www.houseplant411.com/wp-content/uploads/800px-Azalea_japonica_Madame_Van_Hecke_J2.jpg',
 'https://www.houseplant411.com/wp-content/uploads/Soleirolia_soleirolii001.jpg',
 'https://www.houseplant411.com/wp-content/uploads/Polyscias-scutellaria-10-1.jpg',
 'https://www.houseplant411.com/wp-content/uploads/Chamaedorea-erumpens-10-2.jpg',
 'https://www.houseplant411.com/wp-content/

In [366]:
i_d = {'title': indices, 'image_url': i}
images_df = pd.DataFrame(data = i_d)
images_df

Unnamed: 0,title,image_url
0,Agave Plant,https://www.houseplant411.com/wp-content/uploa...
1,Alocasia,https://www.houseplant411.com/wp-content/uploa...
2,Aloe Vera Plant,https://www.houseplant411.com/wp-content/uploa...
3,Amaryllis,https://www.houseplant411.com/wp-content/uploa...
4,Angel Wing Begonia,https://www.houseplant411.com/wp-content/uploa...
5,Anthurium,https://www.houseplant411.com/wp-content/uploa...
6,Asparagus Ferns,https://www.houseplant411.com/wp-content/uploa...
7,Azalea,https://www.houseplant411.com/wp-content/uploa...
8,Baby’s Tears Plant,https://www.houseplant411.com/wp-content/uploa...
9,Aralia Plant – Balfour,https://www.houseplant411.com/wp-content/uploa...


In [367]:
with open('plants_images.pkl', 'wb') as picklefile:
    pickle.dump(images_df, picklefile)

In [None]:
import requests

In [393]:
#save image files under their plant names
counter = 0
for element in images_df['image_url']:
    img_data = requests.get(element).content
    with open(f'{indices[counter]}', 'wb') as handler:
        handler.write(img_data)
        counter += 1

# Miscellaneous
<a id='misc'></a>

In [395]:
with open('indices.pkl', 'wb') as picklefile:
    pickle.dump(indices, picklefile)

In [408]:
x_new = x.drop(['species', 'description', 'more_info', 'description_clean', 'more_info_clean'], axis=1)

In [434]:
def light_clean(element):
    return re.sub('\xa0', '', element)

In [435]:
x_new['description_edited'].apply(light_clean)

0      The agave plant, which originated in Mexico, t...
1      An Alocasia plant, native to Asia, is also cal...
2      An Aloe Vera plant is aneasy care, drought res...
3      Amaryllis plants are native to the tropical re...
4      Angel Wing begonia plants, first found in Sout...
5      Anthurium is a large genus of plants containin...
6      Easy care asparagus fern plants, native to Sou...
7      Azaleas, the national flower of Nepal, are par...
8      A Baby’s Tears plant is a delicate looking hou...
9      An Aralia is an evergreen plant native to Afri...
10     The beautiful, compact, easy careBamboo Palm, ...
11     Though many consider begonias to be an outdoor...
12     A Begonia Rex plant, also called a Painted-Lea...
13     The Bird’s Nest fern is native to the rain for...
14     A Bleeding Heart Vine Plant, native to tropica...
15     Boston Ferns are native to tropical forests an...
16     A bougainvillea plant is native to the rain fo...
17     A bromeliad fasciata (Ae

In [436]:
with open('description.pkl', 'wb') as d:
    pickle.dump(x_new, d)

In [438]:
x_new['description_clean'] = x_new['description_edited'].apply(light_clean)

In [440]:
x_new.drop(['description_edited'], axis=1, inplace=True)

In [442]:
x_new.iloc[0,1]

'The agave plant, which originated in Mexico, the Southwest US, and Central & tropical South America, is an easy care, impressive looking succulent plant that makes a great indoor or outdoor plant. It’s a common misconception that agave plants are a type of cactus plant.There are over 450 varieties of agave plants available, from small to huge, select a variety that will not outgrow your room if you plan to keep it indoors. Agave plants are slow growing plants that need little care as long as they are getting plenty of very bright light. These succulent plants have multi-layered rosettes of thick fleshy leaves with spiny margins that end in a sharp point. The sap of an agave plant is quite irritating and the spines and sharp points are painful.These plants are considered poisonous and should be kept away from pets and children. Read more about common houseplants that are poisonous in Don’t Feed Me To Your Cat! A Guide to Poisonous HouseplantsAgave Plant '

In [444]:
with open('indices.pkl', 'rb') as h:
    plant_names = pickle.load(h)

In [472]:
import random

plant1 = random.choice(x_new['title'])

    

TypeError: choice() takes 2 positional arguments but 3 were given

In [466]:
index1 = x_new[x_new['title'] == plant1].index[0]

In [473]:
index1

91

'The Areca palm, native to Madagascar, is one of the most popular indoor houseplants sold today. Indoors an Areca palm is a medium sized exotic looking plant that can reach a height of 6-8 feet; outdoors it may be as tall as 25 feet. The Areca palm gets its nickname, the Butterfly palm, because its long feathery fronds (leaves) arch upwards off multiple reed- like stems, resembling butterfly wings. Each frond has between 40-60 leaflets on it. When first bought, Areca palms are a delight, inexpensive good-sized plants with beautiful green upright fronds. However, over time, the overall appearance of an Areca palm may diminish; the older bottom fronds turn yellow and the larger fronds droop and bend.Areca Palm '

In [474]:
x_new1 = x_new.drop(index1)

In [475]:
x_new1

Unnamed: 0,title,description_clean
0,Agave Plant,"The agave plant, which originated in Mexico, t..."
1,Alocasia,"An Alocasia plant, native to Asia, is also cal..."
2,Aloe Vera Plant,"An Aloe Vera plant is aneasy care, drought res..."
3,Amaryllis,Amaryllis plants are native to the tropical re...
4,Angel Wing Begonia,"Angel Wing begonia plants, first found in Sout..."
5,Anthurium,Anthurium is a large genus of plants containin...
6,Asparagus Ferns,"Easy care asparagus fern plants, native to Sou..."
7,Azalea,"Azaleas, the national flower of Nepal, are par..."
8,Baby’s Tears Plant,A Baby’s Tears plant is a delicate looking hou...
9,Aralia Plant – Balfour,An Aralia is an evergreen plant native to Afri...
