# Twitter Data Scraper for Streamlit (not working in the application)

# Tweet Text Fetching and Processing Script

This script uses Selenium to fetch and process the text of a tweet from a given URL. Below are the detailed steps conducted in the script, the reasoning behind each step.

## Steps:

### 1. Import Libraries

- Import necessary libraries such as `selenium.webdriver` for browser automation and `selenium.webdriver.support.ui` for implementing explicit waits.

### 2. Setup Chrome WebDriver

**Why:**
- To interact with the web page and extract the tweet text.

**What:**
- Configure Selenium to use the Chrome browser in headless mode, which allows running the browser in the background without a GUI.

### 3. Open the Tweet URL

**Why:**
- To load the web page containing the tweet text that needs to be fetched.

**What:**
- Use the `get` method of the Selenium driver to navigate to the tweet URL.

### 4. Wait for the Tweet Text Element

**Why:**
- To ensure that the tweet text element is fully loaded before attempting to fetch it, which helps in avoiding errors due to elements not being loaded.

**What:**
- Implement an explicit wait to wait for the presence of the tweet text element using its CSS selector.

### 5. Fetch the Tweet Text

**Why:**
- To extract the actual text content of the tweet from the web page.

**What:**
- Locate the tweet text element and extract its text content.

### 6. Handle Exceptions

**Why:**
- To manage any potential errors that might occur during the fetching process, ensuring the script can handle failures gracefully.

**What:**
- Use a try-except block to catch exceptions and print an error message if the tweet text cannot be fetched.

### 7. Quit the WebDriver

**Why:**
- To close the browser session and free up resources.

**What:**
- Ensure that the WebDriver is quit in the `finally` block to close the browser even if an error occurs.

### 8. Process the Tweet

**Why:**
- To provide a function that can be called to fetch and print the tweet text from a given URL.

**What:**
- Call the `fetch_tweet_text` function and print the original tweet text.

### 9. Main Execution

**Why:**
- To execute the script and process a specific tweet URL when the script is run directly.

**What:**
- Define the tweet URL and call the `process_tweet` function to fetch and print the tweet text.


Once all of this is executed, we could use the extracted text for sentiment analysis. As said, this does not unfortunately not work due to Selenium and Streamlit complications. However, with the code here we build the basis for that and it could be used in the future (e.g. next years students) to extend the idea or at least take the code to not write it again. :)


In [1]:
import re
import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC



def fetch_tweet_text(url):
    # Setup Chrome WebDriver
    # Set up the Selenium webdriver
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")  # Run Chrome in headless mode
    driver = webdriver.Chrome(options=options)

    # Open the tweet URL
    print("Opening tweet URL")
    driver.get(url)
    url
    print("URL opened")
    time.sleep(10)  # Allow time for the page to load
    print("Page loaded")

    try:
        # Wait for the tweet text element to be present
        tweet_text_element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, 'div.css-146c3p1.r-bcqeeo.r-1ttztb7.r-qvutc0.r-37j5jr.r-1inkyih.r-16dba41.r-bnwqim.r-135wba7'))
        )
        tweet_text = tweet_text_element.text
    except Exception as e:
        print(f"Error fetching tweet text: {e}")
        tweet_text = ""
    finally:
        driver.quit()
    
    return tweet_text

def process_tweet(url):
    tweet_text = fetch_tweet_text(url)
    print("Original Tweet Text:", tweet_text)
    return tweet_text

if __name__ == '__main__':
    tweet_url = 'https://x.com/BTCTN/status/1806553790734282834'
    process_tweet(tweet_url)


Opening tweet URL
URL opened
Page loaded
Original Tweet Text: we're live with this week's TOKEN NARRATIVES, now on X spaces, youtube, and facebook! listen to the 
@VerseEcosystem
 / 
@BitcoinCom
 team chat this week with our special guests from 
@zano_project
 !  $BTC $BCH $ETH


# Conclusion
In this script, we successfully set up Selenium to fetch the text content of a tweet from a given URL. By leveraging Selenium's capabilities, we handled the opening of the tweet URL, waiting for the text element to load, extracting the tweet text, and managing potential errors gracefully. The script is designed to run in a headless mode, making it suitable for automated environments where GUI is not available.