<a href="https://colab.research.google.com/github/jokefun022/Google-Colab/blob/main/Scraped_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import csv

with open("tweets.csv", "w", encoding="utf-8", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Tweet ID", "Text"])
    # Iterate directly over the tweets list
    for tweet_data in tweets:
        # Based on your scraping code, tweets is a list of lists, where each inner
        # list is [tweet.date, tweet.content]. You want to write the id and text.
        # However, your previous scraping code was appending [tweet.date, tweet.content].
        # The traceback suggests you might have intended to use an object with .id and .text.
        # Let's assume the intended structure is that `tweets` is a list of objects/dictionaries
        # that have 'id' and 'text' attributes/keys. If the structure is [date, content],
        # you'll need to adjust how you access the data below.
        # Assuming tweet_data is an object with .id and .text attributes (like the original snscrape Tweet object):
        # writer.writerow([tweet_data.id, tweet_data.text])

        # Based on your provided scraping loop:
        # tweets.append([tweet.date, tweet.content])
        # This means `tweets` is a list of lists, where each inner list is [date, content].
        # You cannot access `.id` and `.text` on these inner lists.
        # It appears there's a mismatch between how you populate `tweets` and how you try to use it here.

        # To fix this based on how you *are* populating `tweets` (with [date, content]),
        # you need to decide which fields you want to write to the CSV.
        # If you want the content, it's the second element (index 1) of the inner list.
        # The original header was "Tweet ID", "Text". This suggests you want the ID and Text.
        # The way `tweets` is populated currently ([date, content]) does not include the ID.

        # Let's revise the scraping part to include the ID and text, consistent with the CSV writing part.
        # Original snscrape objects have `id` and `rawContent` (or `content`).
        # Modify the scraping loop earlier in your notebook to:
        # for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
        #     if i > 100:
        #         break
        #     tweets.append([tweet.id, tweet.rawContent]) # Or tweet.content depending on the exact version
        #     time.sleep(2)

        # With the scraping code modified as above, the tweet_data will be a list [tweet.id, tweet.rawContent].
        # Then, you can access the ID and Text by index:
        writer.writerow([tweet_data[0], tweet_data[1]])

# If you cannot easily change the scraping code that produces the `tweets` list
# and it *is* a list of objects that have `id` and `text` attributes,
# then the fix is simply removing `.data`:
# for tweet in tweets:
#     writer.writerow([tweet.id, tweet.text])

# Given the traceback and the global variable state, the most likely scenario is that `tweets` was
# populated incorrectly for this writing loop, or the loop is written assuming a different structure for `tweets`.
# The fix below assumes you can modify the scraping loop to store [tweet.id, tweet.rawContent].


In [None]:
pip install snscrape



In [None]:
!snscrape --jsonl --max-results 1000 --progress twitter-search "tum ❤️ since:2024-01-01 until:2025-01-01" > tweets.json

2025-05-30 11:13:45.132  ERROR  snscrape.base  Error retrieving https://twitter.com/search?f=live&lang=en&q=tum+%E2%9D%A4%EF%B8%8F+since%3A2024-01-01+until%3A2025-01-01&src=spelling_expansion_revert_click: SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=tum+%E2%9D%A4%EF%B8%8F+since%3A2024-01-01+until%3A2025-01-01&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1016)')))"))
2025-05-30 11:13:45.133  CRITICAL  snscrape.base  4 requests to https://twitter.com/search?f=live&lang=en&q=tum+%E2%9D%A4%EF%B8%8F+since%3A2024-01-01+until%3A2025-01-01&src=spelling_expansion_revert_click failed, giving up.
2025-05-30 11:13:45.133  CRITICAL  snscrape.base  Errors: SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f

In [None]:
!pip install snscrape

import snscrape.modules.twitter as sntwitter

query = "elonmusk since:2024-01-01 until:2024-12-31"
tweets = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
    tweets.append([tweet.date, tweet.content])
!pip install snscrape

import snscrape.modules.twitter as sntwitter

query = "elonmusk since:2024-01-01 until:2024-12-31"
tweets = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
    tweets.append([tweet.date, tweet.content])

print(tweets[:5])
!snscrape --max-results 100 twitter-search "from:elonmusk since:2024-01-01 until:2024-12-31"
# Or in Python:
import snscrape.modules.twitter as sntwitter

query = "elonmusk since:2024-01-01 until:2024-12-31"
tweets = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
    tweets.append([tweet.date, tweet.content])

print(tweets[:5])

query = "elonmusk since:2024-01-01 until:2024-12-31"
tweets = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
    tweets.append([tweet.date, tweet.content])

print(tweets[:5])


In [None]:
# %%
!pip install --upgrade snscrape  # Upgrade snscrape

import snscrape.modules.twitter as sntwitter
import time # Import the time module

query = "elonmusk since:2024-01-01 until:2024-12-31"
tweets = []
# Limit the number of tweets to try and reduce the chance of failure
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
    # Stop after a reasonable number of tweets to avoid being blocked
    if i > 100: # You can adjust this number
        break
    tweets.append([tweet.date, tweet.content])
    # Add a delay between requests
    time.sleep(2) # Increased delay to 2 seconds, adjust as needed

print(tweets[:5])

# The following parts of your original code were redundant or unlikely to work
# reliably due to the scraping issues, so they are commented out or removed.

# !pip install snscrape # Redundant installation

# import snscrape.modules.twitter as sntwitter # Redundant import

# query = "elonmusk since:2024-01-01 until:2024-12-31"
# tweets = []
# for tweet in sntwitter.TwitterSearchScraper(query).get_items():
#     tweets.append([tweet.date, tweet.content])

# print(tweets[:5])
# !snscrape --max-results 100 twitter-search "from:elonmusk since:2024-01-01 until:2024-12-31" # This might still fail
# # Or in Python:
# import snscrape.modules.twitter as sntwitter # Redundant import

# query = "elonmusk since:2024-01-01 until:2024-12-31"
# tweets = []
# for tweet in sntwitter.TwitterSearchScraper(query).get_items():
#     tweets.append([tweet.date, tweet.content])

# print(tweets[:5])

# query = "elonmusk since:2024-01-01 until:2024-12-31"
# tweets = []
# for tweet in sntwitter.TwitterSearchScraper(query).get_items():
#     tweets.append([tweet.date, tweet.content])

# print(tweets[:5])




ERROR:snscrape.base:Error retrieving https://twitter.com/search?f=live&lang=en&q=elonmusk+since%3A2024-01-01+until%3A2024-12-31&src=spelling_expansion_revert_click: SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=elonmusk+since%3A2024-01-01+until%3A2024-12-31&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1016)')))"))
CRITICAL:snscrape.base:4 requests to https://twitter.com/search?f=live&lang=en&q=elonmusk+since%3A2024-01-01+until%3A2024-12-31&src=spelling_expansion_revert_click failed, giving up.
CRITICAL:snscrape.base:Errors: SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=elonmusk+since%3A2024-01-01+until%3A2024-12-31&src=spelling_expansion_revert_click (Caused by SSLError(SSLC

ScraperException: 4 requests to https://twitter.com/search?f=live&lang=en&q=elonmusk+since%3A2024-01-01+until%3A2024-12-31&src=spelling_expansion_revert_click failed, giving up.