### Automating repetitive tasks

1. Using loops and list comprehensions
2. Writing custom functions
3. Utilizing Python libraries for scheduling (e.g., schedule, cron-like job scheduling)

### Web scraping and data extraction
1. Introduction to web scraping with Beautiful Soup
2. Extracting data from web pages
3. Handling pagination and dynamic content (with Selenium, if needed)

In [1]:
!pip install schedule beautifulsoup4 selenium



In [2]:
import requests # for making HTTP requests to web pages
import schedule # for scheduling tasks to run periodically
import time # for sleeping the program for a specified amount of time
from bs4 import BeautifulSoup
from datetime import datetime # for getting the current date and time
from tqdm import tqdm # for displaying a progress bar for a loop
import matplotlib.pyplot as plt
import pandas as pd

In [3]:
# Example 1: Automating repetitive tasks

#def job():
    #print("I'm working...")

#schedule.every(5).seconds.do(job)

In [4]:
#while True:
    #schedule.run_pending()
    #time.sleep(1)

In [5]:
def get_bitcoin_price():
    url = 'https://www.coingecko.com/en/coins/bitcoin'
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    price_address = soup.find('span', class_='no-wrap')
    price = price_address.text.strip()
    return price


def log_bitcoin_price():
    current_price = get_bitcoin_price()
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    log_message = f"{timestamp} - Current Bitcoin price: {current_price}\n"
    # Write the log message to a file
    with open('bitcoin_price_log.txt', 'a') as log_file:
        # mode 'a' is for appending to the file
        # mode 'w' is for writing to the file
        # mode 'r' is for reading from the file
        log_file.write(log_message)

    # Print the log message to the console
    print(log_message.strip())

In [6]:
schedule.every(1).second.do(log_bitcoin_price)

Every 1 second do log_bitcoin_price() (last run: [never], next run: 2023-03-24 20:28:50)

In [7]:
while True:
    schedule.run_pending()
    time.sleep(1)

2023-03-24 20:28:51 - Current Bitcoin price: $28,026.20
2023-03-24 20:28:53 - Current Bitcoin price: $28,026.20
2023-03-24 20:28:54 - Current Bitcoin price: $28,026.20
2023-03-24 20:28:56 - Current Bitcoin price: $28,027.71
2023-03-24 20:28:58 - Current Bitcoin price: $28,027.71
2023-03-24 20:28:59 - Current Bitcoin price: $28,027.71


KeyboardInterrupt: 

In [6]:
URL = 'https://github.com/trending'

response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')

# Find the top 10 trending repositories
trending_repositories = soup.find_all('article', class_='Box-row', limit=10)

for index, repo in enumerate(trending_repositories):
    name = repo.h1.text.strip().replace('\n', '').replace(' ', '')
    description = repo.find('p', class_='col-9').text.strip() if repo.find('p', class_='col-9') else 'No description provided'
    language = repo.find('span', itemprop='programmingLanguage').text.strip() if repo.find('span', itemprop='programmingLanguage') else 'Not specified'
    print(f"{index + 1}. Repository: {name}\nDescription: {description}\nLanguage: {language}\n")

1. Repository: nsarrazin/serge
Description: A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
Language: Python

2. Repository: madawei2699/myGPTReader
Description: myGPTReader is a slack bot that can read any webpage, ebook or document and summarize it with chatGPT. It can also talk to you via voice using the content in the channel.
Language: Python

3. Repository: mrsked/mrsk
Description: Deploy web apps anywhere.
Language: Ruby

4. Repository: programthink/zhao
Description: 【编程随想】整理的《太子党关系网络》，专门揭露赵国的权贵
Language: Python

5. Repository: BlinkDL/RWKV-LM
Description: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Language: Python

6. Repository: mckaywrigley/chatbot-ui
Description: A ChatGPT clone for running loc

In [8]:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

# Go to the Wikipedia home page
driver.get('https://www.wikipedia.org')

# Find the search box
search_box = driver.find_element(By.ID, 'searchInput')

# Enter search query
search_box.send_keys('France')

# Submit the form (like hitting return)
search_box.submit()

# Wait for the page to load
time.sleep(5)

# Get the main content of the page
main_content = driver.find_element(By.ID, 'content').text

# Write the main content to a file
with open('wikipedia_page.txt', 'w', encoding='utf-8') as file:
    file.write(main_content)

# Close the browser
driver.quit()

In [12]:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Set the path to the ChromeDriver executable
chromedriver_path = '/path/to/chromedriver'

# Create a WebDriver instance
driver = webdriver.Chrome(executable_path=chromedriver_path)

# Navigate to the GitHub user's profile page
github_user = 'qlinhta'
driver.get(f'https://github.com/{github_user}?tab=repositories')

# Wait for the repositories to load
wait = WebDriverWait(driver, 10)
repositories = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//div[@class="wb-break-all"]//a')))

# Scrape the repository names
repo_names = [repo.text for repo in repositories]

# Close the WebDriver instance
driver.quit()

# Print the repository names
print(f"Repositories of {github_user}:")
for name in repo_names:
    print(name)

  driver = webdriver.Chrome(executable_path=chromedriver_path)


TimeoutException: Message: 
