### **X API**
The X API can be used to programmatically retrieve and analyze X data, as well as build for the conversation on X.

Over the years, the X API has grown by adding additional levels of access for developers to be able to scale their access to enhance and research the public conversation.

Read carefully the documentation here https://docs.x.com/x-api/introduction

To fully understand what the API does, you have to answer the following questions:
1. What does this API do, and what problems does it solve?
2. What authentication methods does the API support (e.g., API keys, OAuth 2.0)?
3. What are the parameters we need to pass?
4. How do you obtain the necessary credentials?
5. What is the pricing model of the API?
6. What are the access levels and limitations?
7. How many requests can you make in a given time period?
8. What are the data formats and response structures?
9. What kind of projects can you build with this API?
10. Does the API offer official or third-party libraries for your programming language?

In this guide we will learn how to get the recent post.
If you are using postman, you can follow https://docs.x.com/x-api/posts/search/quickstart/recent-search

If you are using python, follow this https://docs.x.com/x-api/posts/recent-search

In [None]:
""" 
Objective: Get the recent post from X
"""

import requests
import time

# This endpoint returns posts from the last 7 days that match a search query
url = "https://api.x.com/2/tweets/search/recent"

# Authorization by Bearer Token
headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"}

# Query parameters
querystring = {"query":"donald trump",
               "max_results":"10"}


# TODO: Go to your dashboard and get your Bearer Token
# TODO: Replace YOUR_QUERY with your search query
# TODO: Get the response and check the status code
# TODO: Analyze the response, what data is returned?

wait_time = 1  # Initial wait time in seconds
max_retries = 5
retry_count = 0

while retry_count < max_retries:
    response = requests.get(url, headers=headers, params=querystring)
    
    if response.status_code == 200:
        data = response.json()
        print("\nResponse Data Analysis:")
        print(f"Number of tweets: {len(data.get('data', []))}")
        print("\nSample tweet data:")
        for tweet in data.get('data', [])[:2]:
            print(f"\nTweet ID: {tweet['id']}")
            print(f"Text: {tweet['text']}")
            print("---")
        break
    elif response.status_code == 429:
        print(f"Rate limited. Waiting {wait_time} seconds...")
        time.sleep(wait_time)
        wait_time *= 2  
        retry_count += 1
    else:
        print(f"Error: {response.text}")
        break

if retry_count == max_retries:
    print("Max retries reached. Please try again later.")





Rate limited. Waiting 1 seconds...
Rate limited. Waiting 2 seconds...
Rate limited. Waiting 4 seconds...
Rate limited. Waiting 8 seconds...
Rate limited. Waiting 16 seconds...
Max retries reached. Please try again later.


In [None]:
""" 
Objective: Understanding response data
"""
# TODO: From the previous request, print the response text
# TODO: Parse the data into more readable format and print the data
# TODO: Clean the data and print it

import json
from datetime import datetime

# Print raw response text
print("Raw Response:")
print(json.dumps(response.json(), indent=2))

# Parse and format the data
if response.status_code == 200:
    data = response.json()
    
    # Clean and format the data
    cleaned_tweets = []
    for tweet in data.get('data', []):
        cleaned_tweet = {
            'id': tweet['id'],
            'text': tweet['text'].replace('\n', ' ').strip(),
            'created_at': datetime.strptime(tweet['created_at'], 
                                          '%Y-%m-%dT%H:%M:%S.%fZ').strftime('%Y-%m-%d %H:%M:%S'),
            'author_id': tweet['author_id'],
            'metrics': {
                'retweets': tweet['public_metrics']['retweet_count'],
                'likes': tweet['public_metrics']['like_count'],
                'replies': tweet['public_metrics']['reply_count']
            }
        }
        cleaned_tweets.append(cleaned_tweet)
    
    # Print formatted data
    print("\nCleaned and Formatted Data:")
    for tweet in cleaned_tweets:
        print("\nTweet Details:")
        print(f"ID: {tweet['id']}")
        print(f"Created: {tweet['created_at']}")
        print(f"Text: {tweet['text']}")
        print(f"Metrics: {tweet['metrics']}")
        print("-" * 50)

In [5]:
""" 
Objective: Build a custom query for more specific data
"""
# TODO: Build a custom query and send a new request by using https://developer.x.com/apitools/
# TODO: Experiment with different query parameters and see what data is returned

# Custom query with specific search criteria
import requests
import time
url = "https://api.x.com/2/tweets/search/recent"
headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"}

# Search parameters
querystring = {
    "query": "#python min_retweets:100 min_faves:50 -is:reply lang:en",
    "max_results": "10",
    "tweet.fields": "created_at,public_metrics,referenced_tweets",
    "expansions": "author_id",
    "user.fields": "username,verified,description"
}

# Send request with retry logic
wait_time = 1
max_retries = 5
retry_count = 0

while retry_count < max_retries:
    response = requests.get(url, headers=headers, params=querystring)
    
    if response.status_code == 200:
        data = response.json()
        users = {u['id']: u for u in data.get('includes', {}).get('users', [])}
        
        print(f"\nFound {len(data.get('data', []))} popular Python tweets:")
        for tweet in data.get('data', []):
            author = users.get(tweet['author_id'], {})
            metrics = tweet['public_metrics']
            
            print(f"\n@{author.get('username', 'unknown')}:")
            print(f"Tweet: {tweet['text']}")
            # print(f"Retweets: {metrics['retweet_count']}")
            print(f"Likes: {metrics['like_count']}")
            print(f"Created: {tweet['created_at']}")
            print("-" * 50)
        break
        
    elif response.status_code == 429:
        print(f"Rate limited. Waiting {wait_time} seconds...")
        time.sleep(wait_time)
        wait_time *= 2
        retry_count += 1
    else:
        print(f"Error: {response.text}")
        break

if retry_count == max_retries:
    print("Max retries reached. Please try again later.")

Rate limited. Waiting 1 seconds...
Rate limited. Waiting 2 seconds...
Rate limited. Waiting 4 seconds...
Rate limited. Waiting 8 seconds...
Rate limited. Waiting 16 seconds...
Max retries reached. Please try again later.


In [None]:
""" 
Objective: Understanding rate limit on API requests
"""
# TODO: Send another request and check the status code
# TODO: Did you get blocked? if so, why? explain it!

""" 
Expected Output:
{'title': 'Too Many Requests',
 'detail': 'Too Many Requests',
 'type': 'about:blank',
 'status': 429}
"""

import requests
url = "https://api.x.com/2/tweets/search/recent"
headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"}
querystring = {"query": "python", "max_results": "10"}

response1 = requests.get(url, headers=headers, params=querystring)
print("First request status code:", response1.status_code)

response2 = requests.get(url, headers=headers, params=querystring)
print("\nSecond request status code:", response2.status_code)
print("\nResponse content:")
print(response2.json())

#I got blocked with a 429 status code ("Too Many Requests"). This happened because:
# We made two requests in rapid succession

First request status code: 200

Second request status code: 429

Response content:
{'title': 'Too Many Requests', 'detail': 'Too Many Requests', 'type': 'about:blank', 'status': 429}


In [None]:
""" 
Objective: Bypassing unknown rate limit.
"""

# TODO: Send another request to https://api.x.com/2/tweets/search/recent make sure its failed
# TODO: Once the request is failed, loop it until the status code is 200 and add waiting time between each request
# TODO: For every failed request, double the waiting time
# TODO: Once the status code is 200, print the response and the waiting time
# TODO: What is the rate limit?

import requests, time

url = "https://api.x.com/2/tweets/search/recent"

headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"}
querystring = {"query": "python", "max_results": "10"}

wait_time = 1  
success = False
total_wait_time = 0

# Loop until successful response
while not success:
    # Send request
    response = requests.get(url, headers=headers, params=querystring)
    
    if response.status_code == 200:
        # Success - print response and total wait time
        print(f"\nSuccess after waiting {total_wait_time} seconds!")
        print("\nResponse data:")
        print(json.dumps(response.json(), indent=2))
        success = True
    else:
        # Failed request - wait and double the wait time
        print(f"Request failed. Status code: {response.status_code}")
        print(f"Waiting {wait_time} seconds before retry...")
        time.sleep(wait_time)
        total_wait_time += wait_time
        wait_time *= 2  # Double the wait time

print(f"\nRate limit appears to be approximately {total_wait_time} seconds between requests")

Request failed. Status code: 429
Waiting 1 seconds before retry...
Request failed. Status code: 429
Waiting 2 seconds before retry...
Request failed. Status code: 429
Waiting 4 seconds before retry...
Request failed. Status code: 429
Waiting 8 seconds before retry...
Request failed. Status code: 429
Waiting 16 seconds before retry...
Request failed. Status code: 429
Waiting 32 seconds before retry...
Request failed. Status code: 429
Waiting 64 seconds before retry...
Request failed. Status code: 429
Waiting 128 seconds before retry...
Request failed. Status code: 429
Waiting 256 seconds before retry...


In [1]:
""" 
Objective: Using another endpoint
"""
# TODO: Try to use Users endpoint, for example get the list of Users that are being followed by the provided User ID

# Get list of followed users for a specific user ID
import requests
import time
import json

# Users endpoint for following list
user_id = "2244994945"  # Example user ID
url = f"https://api.x.com/2/users/{user_id}/following"

# Authorization header
headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"}

# Query parameters
params = {
    "max_results": 10,
    "user.fields": "username,name,description,public_metrics"
}

# Send request with retry logic
wait_time = 1
max_retries = 5
retry_count = 0

while retry_count < max_retries:
    response = requests.get(url, headers=headers, params=params)
    
    if response.status_code == 200:
        data = response.json()
        print(f"\nFound {len(data.get('data', []))} followed users:")
        
        for user in data.get('data', []):
            print(f"\nUsername: @{user['username']}")
            print(f"Name: {user['name']}")
            print(f"Description: {user.get('description', 'No description')}")
            print(f"Metrics: {user.get('public_metrics', {})}")
            print("-" * 50)
        break
        
    elif response.status_code == 429:
        print(f"Rate limited. Waiting {wait_time} seconds...")
        time.sleep(wait_time)
        wait_time *= 2
        retry_count += 1
    else:
        print(f"Error: {response.text}")
        break

if retry_count == max_retries:
    print("Max retries reached. Please try again later.")

Error: {"client_id":"30341910","detail":"When authenticating requests to the Twitter API v2 endpoints, you must use keys and tokens from a Twitter developer App that is attached to a Project. You can create a project via the developer portal.","registration_url":"https://developer.twitter.com/en/docs/projects/overview","title":"Client Forbidden","required_enrollment":"Appropriate Level of API Access","reason":"client-not-enrolled","type":"https://api.twitter.com/2/problems/client-forbidden"}


### **Reflection**
Why there should be any rate limit?

Rate limits are essential in APIs for several important reasons:

1. Server Resource Management
   
   - Prevents server overload
   - Ensures fair distribution of resources
   - Maintains system stability and performance
2. Cost Control
   
   - Helps manage infrastructure costs
   - Enables tiered pricing models
   - Controls bandwidth usage
3. Security
   
   - Prevents abuse and DDoS attacks
   - Limits scraping and data harvesting
   - Protects against brute force attempts
4. Service Quality
   
   - Ensures consistent service for all users
   - Maintains response time reliability
   - Prevents degradation of service quality

### **Exploration**
- Expand the current X post scraping to create a sentiment analysis
- Explore another tools and libraries to simplify scraping process here https://docs.x.com/x-api/tools-and-libraries/overview#python
- Explore about another publicly available API (Facebook, LinkedIn, Zillow, etc)

1. Here's an example of how to add sentiment analysis to the X post scraping using NLTK's VADER sentiment analyzer:

In [None]:
import requests
import json
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
import pandas as pd

# Download required NLTK data
nltk.download('vader_lexicon')

# Initialize VADER sentiment analyzer
sia = SentimentIntensityAnalyzer()

# X API setup
url = "https://api.x.com/2/tweets/search/recent"
headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"}

# Search parameters
querystring = {
    "query": "python programming",
    "max_results": "100",
    "tweet.fields": "created_at,public_metrics"
}

# Get tweets and analyze sentiment
response = requests.get(url, headers=headers, params=querystring)

if response.status_code == 200:
    tweets_data = []
    data = response.json()
    
    for tweet in data.get('data', []):
        # Get sentiment scores
        sentiment = sia.polarity_scores(tweet['text'])
        
        # Create tweet dictionary with sentiment
        tweet_dict = {
            'created_at': tweet['created_at'],
            'text': tweet['text'],
            'retweets': tweet['public_metrics']['retweet_count'],
            'likes': tweet['public_metrics']['like_count'],
            'compound_sentiment': sentiment['compound'],
            'positive_score': sentiment['pos'],
            'negative_score': sentiment['neg'],
            'neutral_score': sentiment['neu']
        }
        tweets_data.append(tweet_dict)
    
    # Create DataFrame
    df = pd.DataFrame(tweets_data)
    
    # Basic sentiment analysis
    print("\nSentiment Analysis Results:")
    print(f"Average sentiment score: {df['compound_sentiment'].mean():.3f}")
    print(f"Positive tweets: {len(df[df['compound_sentiment'] > 0])} ({len(df[df['compound_sentiment'] > 0])/len(df)*100:.1f}%)")
    print(f"Negative tweets: {len(df[df['compound_sentiment'] < 0])} ({len(df[df['compound_sentiment'] < 0])/len(df)*100:.1f}%)")
    print(f"Neutral tweets: {len(df[df['compound_sentiment'] == 0])} ({len(df[df['compound_sentiment'] == 0])/len(df)*100:.1f}%)")
    
    # Display most positive and negative tweets
    print("\nMost Positive Tweet:")
    most_positive = df.loc[df['compound_sentiment'].idxmax()]
    print(f"Text: {most_positive['text']}")
    print(f"Sentiment Score: {most_positive['compound_sentiment']:.3f}")
    
    print("\nMost Negative Tweet:")
    most_negative = df.loc[df['compound_sentiment'].idxmin()]
    print(f"Text: {most_negative['text']}")
    print(f"Sentiment Score: {most_negative['compound_sentiment']:.3f}")
    
    # Create visualization
    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(10, 6))
    plt.hist(df['compound_sentiment'], bins=20)
    plt.title('Distribution of Sentiment Scores')
    plt.xlabel('Sentiment Score')
    plt.ylabel('Number of Tweets')
    plt.show()
    
else:
    print(f"Error: {response.text}")

2. Here's an example using Tweepy, a popular Python library for X API:

Install tweepy and pandas libraries using pip:

#pip install tweepy pandas

In [None]:
import tweepy
import pandas as pd
from datetime import datetime

# Authentication credentials
bearer_token = "AAAAAAAAAAAAAAAAAAAAABb7zgEAAAAAzA%2BoyKCFlOxcVC%2F4ru43DcXcjIA%3DGaOypInPohZPPo423mceve3h20Dt6uq9V3bKDnCCt45Uv1XkWh"

# Create client instance
client = tweepy.Client(bearer_token=bearer_token)

# Search tweets with Tweepy
query = "python programming -is:retweet lang:en"
tweets = client.search_recent_tweets(
    query=query,
    max_results=100,
    tweet_fields=['created_at', 'public_metrics'],
    user_fields=['username', 'name'],
    expansions=['author_id']
)

# Process tweets data
if tweets.data:
    # Create user dictionary for easy lookup
    users = {user.id: user for user in tweets.includes['users']}
    
    # Prepare data for DataFrame
    tweets_list = []
    for tweet in tweets.data:
        tweets_list.append({
            'created_at': tweet.created_at,
            'text': tweet.text,
            'author': users[tweet.author_id].username,
            'retweets': tweet.public_metrics['retweet_count'],
            'likes': tweet.public_metrics['like_count'],
            'replies': tweet.public_metrics['reply_count']
        })
    
    # Create DataFrame
    df = pd.DataFrame(tweets_list)
    
    # Display results
    print("\nRecent Python Programming Tweets:")
    print(f"Total tweets collected: {len(df)}")
    print("\nMost engaged tweets:")
    print(df.nlargest(5, 'likes')[['author', 'text', 'likes', 'retweets']])
    
    # Save to CSV
    df.to_csv('python_tweets.csv', index=False)
    print("\nData saved to python_tweets.csv")

else:
    print("No tweets found")

3. Here's an example using the GitHub API, which is publicly available and has generous rate limits for basic usage:

In [None]:
import requests
import pandas as pd
from datetime import datetime

# GitHub API endpoint for searching repositories
url = "https://api.github.com/search/repositories"

# Search parameters
params = {
    "q": "language:python stars:>1000",
    "sort": "stars",
    "order": "desc",
    "per_page": 20
}

# Send request
response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    repos = []
    
    for repo in data['items']:
        repo_data = {
            'name': repo['name'],
            'owner': repo['owner']['login'],
            'stars': repo['stargazers_count'],
            'forks': repo['forks_count'],
            'language': repo['language'],
            'description': repo['description'],
            'created_at': datetime.strptime(repo['created_at'], '%Y-%m-%dT%H:%M:%SZ').strftime('%Y-%m-%d'),
            'url': repo['html_url']
        }
        repos.append(repo_data)
    
    # Create DataFrame
    df = pd.DataFrame(repos)
    
    # Display results
    print("\nTop Python Repositories on GitHub:")
    print(f"Total repositories found: {data['total_count']}")
    print("\nTop 5 repositories by stars:")
    print(df[['name', 'owner', 'stars', 'forks']].head())
    
    # Save to CSV
    df.to_csv('top_python_repos.csv', index=False)
    print("\nData saved to top_python_repos.csv")
    
    # Get rate limit info
    rate_limit = requests.get('https://api.github.com/rate_limit').json()
    print(f"\nAPI Rate Limit Remaining: {rate_limit['resources']['search']['remaining']}")

else:
    print(f"Error: {response.status_code}")
    print(response.json())