# Overall Project Purpose: Website scraping using BeautifulSoup

## Project Details:

Website www.gurufocus.com/stock-market-valuations.php has stocks that are UP the most on the day (gainers) and stocks that are down the most on the day (losers), on its website. This program scarpes the website and appends the scraped information to a file.

The Python "class" Guru_Scrape which contains the following methods/functions:

1) __init__:  Guru_Scrape class requires 3 parameters: URL, main & backup 
Parameters: 
 URL - URL to be scraped
 main - file where the scraped info will be appended and saved
 backup - file where a previous snap shot of the main file will be saved before being modified

2) get_soup: Request to get the content of the webpage using Python libraries

3) find__gainers_losers: Iterates through 'td' tags to find gainers & losers data

4) create_gainers_loser_df: Apply day-and-time stamp and then place found data in dataframe to be appended to the existing file 
Parameters:
 win_loss_all - An array of information that has been extracted

5) append_file: Append data from dataframe

6) backup_file: Backup file before appending new information

7) initialize_file: Initialize and prepare the file to be written to

8) print_file: print any csv file
Parameter(s):
 File - file to be printed 


Note: The Guru_Scape class can be easily modified to scrape ANY website.


In [None]:
from datetime import datetime
import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
from pathlib import Path

class Guru_Scrape():
      soup = ""
      df_guru = pd.DataFrame()
    
      def __init__(self, URL, main, backup):
          self.URL = URL  
          self.main = main
          self.backup = backup
          self.get_soup()
          self.backup_file()
          
      def get_soup(self):
          webpage_response =  requests.get(self.URL)
          webpage = webpage_response.content
          self.soup= BeautifulSoup(webpage, "html.parser")
        
      def find__gainers_losers(self):
          td_tags = self.soup.find_all(['td'])
        
          win_loss = []
          win_loss_all = []
          count = 1           
          for td_tag in td_tags:  
              if len(win_loss_all) < 10:
                 win_loss.append(' '.join(td_tag.text.split()))
                 count+= 1
                 if count == 5:
                    win_loss_all.append(win_loss)
                    win_loss = []
                    count = 1
                
          return win_loss_all
      
      def create_gainers_losers_df(self, win_loss_all):
          self.df_guru = pd.DataFrame( win_loss_all, columns = ['Stock', 'Company', 'Price', 'Price_Change'])
        
          dateTimeObj = datetime.now()
          self.df_guru['date_time'] = str(dateTimeObj)
          
          return self.df_guru
      
      def backup_file(self):     
          my_file = Path(self.main)
          if my_file.is_file():
             df_guru_bkup = pd.read_csv(self.main)
             df_guru_bkup.to_csv(self.backup) 
          
        
          return
        
      def initialize_file(self):
          self.backup_file()
          df_initialize = pd.DataFrame([['','','','','']], \
                          columns = ['Stock', 'Company', 'Price', 'Price_Change', 'date_time'])
          df_initialize.to_csv(self.main)
        
        
      def append_file(self):
          self.backup_file()
          f = open(self.main, 'a') # Open file as append mode
          self.df_guru.to_csv(f, header = False)
          f.close()
        
          return
        
      def print_file(self,file):
          my_file = Path(file)
          if my_file.is_file():
             return print(pd.read_csv(file))
          else:
             print(f"File {file} does NOT exist")       

      
URL = 'https://www.gurufocus.com/stock-market-valuations.php'
main = "Losers_and_Gainers1.csv"
backup = "Losers_and_Gainers1_bkup.csv"
guru =  Guru_Scrape(URL, main, backup)

In [None]:
guru.initialize_file()

In [None]:
G_L_array = guru.find__gainers_losers()
print(guru.create_gainers_losers_df(G_L_array))

In [None]:
guru.print_file(backup)

In [None]:
guru.append_file()

In [None]:
guru.print_file(main)