# HTTP Requests and Working with APIs

Working with APIs (Application Programming Interfaces) is essential in modern data science and NLP projects. APIs allow you to fetch data from external sources, interact with web services, and integrate different systems.

## Why APIs Matter for Data Science/NLP:
- **Data acquisition**: Fetch real-time data from Twitter, news APIs, financial APIs
- **NLP services**: Use cloud-based NLP APIs (Google Translate, sentiment analysis)
- **Machine learning**: Deploy and consume ML models via APIs
- **Data integration**: Connect different data sources and services

## Topics Covered:
- HTTP basics (GET, POST, PUT, DELETE)
- Using the requests library
- Handling JSON responses
- Authentication methods
- Error handling and retries
- Rate limiting and best practices
- Working with real APIs

## HTTP Basics and the Requests Library

In [None]:
# Install requests if not available
# !pip install requests

import requests
import json
import time
from datetime import datetime

print(f"Requests version: {requests.__version__}")

# Basic GET request
response = requests.get('https://httpbin.org/get')

print(f"\nStatus Code: {response.status_code}")
print(f"Content Type: {response.headers.get('content-type')}")
print(f"Response Length: {len(response.text)} characters")

# Print formatted JSON response
print("\nResponse JSON:")
print(json.dumps(response.json(), indent=2))

## HTTP Methods and Status Codes

In [None]:
# Different HTTP methods
base_url = 'https://httpbin.org'

# GET request with parameters
params = {'name': 'Alice', 'age': 30, 'city': 'New York'}
get_response = requests.get(f'{base_url}/get', params=params)

print("GET Request:")
print(f"URL: {get_response.url}")
print(f"Status: {get_response.status_code}")
print(f"Query parameters in response: {get_response.json()['args']}")
print()

# POST request with JSON data
data = {
    'user': 'john_doe',
    'message': 'Hello from Python!',
    'timestamp': datetime.now().isoformat()
}

post_response = requests.post(f'{base_url}/post', json=data)

print("POST Request:")
print(f"Status: {post_response.status_code}")
print(f"Sent data: {post_response.json()['json']}")
print()

# PUT request
put_response = requests.put(f'{base_url}/put', json={'action': 'update', 'id': 123})
print(f"PUT Status: {put_response.status_code}")

# DELETE request
delete_response = requests.delete(f'{base_url}/delete')
print(f"DELETE Status: {delete_response.status_code}")

In [None]:
# Understanding HTTP status codes
status_codes = {
    200: "OK - Success",
    201: "Created - Resource created successfully",
    400: "Bad Request - Client error",
    401: "Unauthorized - Authentication required",
    403: "Forbidden - Access denied",
    404: "Not Found - Resource not found",
    429: "Too Many Requests - Rate limited",
    500: "Internal Server Error - Server error",
    503: "Service Unavailable - Server overloaded"
}

print("Common HTTP Status Codes:")
print("=" * 40)
for code, description in status_codes.items():
    print(f"{code}: {description}")

# Test different status codes
print("\nTesting different status codes:")
status_tests = [200, 404, 500]

for status in status_tests:
    response = requests.get(f'{base_url}/status/{status}')
    print(f"Requested {status}, got {response.status_code}: {response.reason}")

## Working with Headers and Authentication

In [None]:
# Working with headers
headers = {
    'User-Agent': 'Python Data Science Bot 1.0',
    'Accept': 'application/json',
    'Content-Type': 'application/json'
}

response = requests.get(f'{base_url}/headers', headers=headers)

print("Request headers sent:")
sent_headers = response.json()['headers']
for key, value in sent_headers.items():
    if key in ['User-Agent', 'Accept', 'Content-Type']:
        print(f"  {key}: {value}")
print()

# Basic Authentication
print("Testing Basic Authentication:")
auth_response = requests.get(
    f'{base_url}/basic-auth/user/pass',
    auth=('user', 'pass')
)
print(f"Auth Status: {auth_response.status_code}")
if auth_response.status_code == 200:
    print(f"Authenticated user: {auth_response.json()['user']}")

# Bearer Token Authentication (simulated)
print("\nBearer Token Authentication:")
token = 'your-secret-token-here'
bearer_headers = {
    'Authorization': f'Bearer {token}',
    'Content-Type': 'application/json'
}

token_response = requests.get(f'{base_url}/bearer', headers=bearer_headers)
print(f"Token Status: {token_response.status_code}")
if token_response.status_code == 200:
    print("Token authentication successful!")

## Error Handling and Retries

In [None]:
import time
from requests.exceptions import RequestException, ConnectionError, Timeout

def make_request_with_retry(url, max_retries=3, delay=1):
    """
    Make HTTP request with retry logic and proper error handling.
    """
    for attempt in range(max_retries):
        try:
            print(f"Attempt {attempt + 1}/{max_retries}: {url}")
            
            response = requests.get(url, timeout=5)
            
            # Check if request was successful
            response.raise_for_status()
            
            print(f"✅ Success! Status: {response.status_code}")
            return response
            
        except ConnectionError:
            print(f"❌ Connection error on attempt {attempt + 1}")
        except Timeout:
            print(f"⏰ Timeout error on attempt {attempt + 1}")
        except requests.exceptions.HTTPError as e:
            print(f"❌ HTTP error on attempt {attempt + 1}: {e}")
            # Don't retry for client errors (4xx)
            if 400 <= response.status_code < 500:
                break
        except RequestException as e:
            print(f"❌ Request error on attempt {attempt + 1}: {e}")
        
        # Wait before retrying
        if attempt < max_retries - 1:
            print(f"⏳ Waiting {delay} seconds before retry...")
            time.sleep(delay)
            delay *= 2  # Exponential backoff
    
    print(f"💥 All {max_retries} attempts failed")
    return None

# Test with a successful URL
print("Testing successful request:")
success_response = make_request_with_retry('https://httpbin.org/get')
print()

# Test with a failing URL
print("Testing failing request:")
fail_response = make_request_with_retry('https://httpbin.org/status/500')
print()

# Test with non-existent URL
print("Testing non-existent URL:")
notfound_response = make_request_with_retry('https://httpbin.org/status/404')

## Working with JSON APIs

In [None]:
# Example: Working with a public API (JSONPlaceholder - fake REST API)
def fetch_posts(limit=5):
    """
    Fetch posts from JSONPlaceholder API.
    """
    url = 'https://jsonplaceholder.typicode.com/posts'
    
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        
        posts = response.json()[:limit]  # Get first 'limit' posts
        
        print(f"📝 Fetched {len(posts)} posts:")
        print("=" * 50)
        
        for post in posts:
            print(f"ID: {post['id']}")
            print(f"Title: {post['title'][:50]}{'...' if len(post['title']) > 50 else ''}")
            print(f"Body: {post['body'][:100]}{'...' if len(post['body']) > 100 else ''}")
            print("-" * 30)
        
        return posts
        
    except Exception as e:
        print(f"Error fetching posts: {e}")
        return None

# Fetch and display posts
posts_data = fetch_posts(3)

if posts_data:
    print(f"\n📊 Data structure of first post:")
    print(json.dumps(posts_data[0], indent=2))

In [None]:
# Create, Update, Delete operations with REST API
def api_operations_demo():
    base_url = 'https://jsonplaceholder.typicode.com'
    
    # 1. CREATE - POST request
    print("1️⃣ Creating a new post:")
    new_post = {
        'title': 'My New Post from Python',
        'body': 'This is the content of my new post created via API.',
        'userId': 1
    }
    
    create_response = requests.post(f'{base_url}/posts', json=new_post)
    print(f"Status: {create_response.status_code}")
    
    if create_response.status_code == 201:
        created_post = create_response.json()
        print(f"Created post ID: {created_post['id']}")
        print(f"Title: {created_post['title']}")
        post_id = created_post['id']
    else:
        post_id = 1  # Use existing post for demo
    
    print()
    
    # 2. READ - GET request
    print("2️⃣ Reading a specific post:")
    read_response = requests.get(f'{base_url}/posts/{post_id}')
    print(f"Status: {read_response.status_code}")
    
    if read_response.status_code == 200:
        post = read_response.json()
        print(f"Post {post['id']}: {post['title'][:40]}...")
    
    print()
    
    # 3. UPDATE - PUT request
    print("3️⃣ Updating the post:")
    updated_post = {
        'id': post_id,
        'title': 'Updated Post Title',
        'body': 'This post has been updated via Python API call.',
        'userId': 1
    }
    
    update_response = requests.put(f'{base_url}/posts/{post_id}', json=updated_post)
    print(f"Status: {update_response.status_code}")
    
    if update_response.status_code == 200:
        updated = update_response.json()
        print(f"Updated title: {updated['title']}")
    
    print()
    
    # 4. DELETE - DELETE request
    print("4️⃣ Deleting the post:")
    delete_response = requests.delete(f'{base_url}/posts/{post_id}')
    print(f"Status: {delete_response.status_code}")
    
    if delete_response.status_code == 200:
        print("Post deleted successfully!")

# Run the API operations demo
api_operations_demo()

## Rate Limiting and Best Practices

In [None]:
import time
from datetime import datetime, timedelta

class RateLimiter:
    """
    Simple rate limiter to control API request frequency.
    """
    def __init__(self, max_requests=10, time_window=60):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = []
    
    def can_make_request(self):
        now = datetime.now()
        
        # Remove old requests outside the time window
        cutoff_time = now - timedelta(seconds=self.time_window)
        self.requests = [req_time for req_time in self.requests if req_time > cutoff_time]
        
        # Check if we can make another request
        return len(self.requests) < self.max_requests
    
    def make_request(self):
        if self.can_make_request():
            self.requests.append(datetime.now())
            return True
        return False
    
    def wait_time(self):
        if not self.requests:
            return 0
        
        oldest_request = min(self.requests)
        wait_until = oldest_request + timedelta(seconds=self.time_window)
        wait_seconds = (wait_until - datetime.now()).total_seconds()
        return max(0, wait_seconds)

def fetch_with_rate_limiting(urls, requests_per_minute=30):
    """
    Fetch multiple URLs with rate limiting.
    """
    rate_limiter = RateLimiter(max_requests=requests_per_minute, time_window=60)
    results = []
    
    for i, url in enumerate(urls):
        print(f"\nProcessing URL {i+1}/{len(urls)}: {url}")
        
        # Check rate limiting
        if not rate_limiter.can_make_request():
            wait_time = rate_limiter.wait_time()
            print(f"⏳ Rate limit reached. Waiting {wait_time:.1f} seconds...")
            time.sleep(wait_time + 1)
        
        # Make the request
        if rate_limiter.make_request():
            try:
                response = requests.get(url, timeout=5)
                results.append({
                    'url': url,
                    'status': response.status_code,
                    'content_length': len(response.content)
                })
                print(f"✅ Success: {response.status_code}")
            except Exception as e:
                results.append({
                    'url': url,
                    'status': 'error',
                    'error': str(e)
                })
                print(f"❌ Error: {e}")
    
    return results

# Test rate limiting with a few URLs
test_urls = [
    'https://httpbin.org/get?page=1',
    'https://httpbin.org/get?page=2',
    'https://httpbin.org/get?page=3',
]

print("Testing rate limiting:")
results = fetch_with_rate_limiting(test_urls, requests_per_minute=2)  # Very low limit for demo

print("\n📊 Results summary:")
for result in results:
    print(f"  {result['url']}: {result['status']}")

## Real-World Example: Weather API

In [None]:
# Example using a free weather API (OpenWeatherMap)
# Note: You would need to get a free API key from openweathermap.org

def get_weather_data(city, api_key='demo_key'):
    """
    Fetch weather data for a city (demo function - would need real API key).
    """
    # This is a simulation - in reality you'd use a real API key
    base_url = "http://api.openweathermap.org/data/2.5/weather"
    
    params = {
        'q': city,
        'appid': api_key,
        'units': 'metric'  # Celsius
    }
    
    # Simulate API response structure
    simulated_response = {
        'name': city,
        'main': {
            'temp': 22.5,
            'feels_like': 24.2,
            'humidity': 65,
            'pressure': 1013
        },
        'weather': [{
            'main': 'Clear',
            'description': 'clear sky',
            'icon': '01d'
        }],
        'wind': {
            'speed': 3.2,
            'deg': 180
        }
    }
    
    print(f"🌤️ Weather data for {city}:")
    print(f"Temperature: {simulated_response['main']['temp']}°C")
    print(f"Feels like: {simulated_response['main']['feels_like']}°C")
    print(f"Humidity: {simulated_response['main']['humidity']}%")
    print(f"Condition: {simulated_response['weather'][0]['description'].title()}")
    print(f"Wind: {simulated_response['wind']['speed']} m/s")
    
    return simulated_response

# Demo weather function
cities = ['London', 'New York', 'Tokyo']
weather_data = {}

for city in cities:
    print(f"\nFetching weather for {city}...")
    weather_data[city] = get_weather_data(city)
    time.sleep(0.5)  # Be nice to the API

print("\n" + "="*50)
print("API Best Practices Demonstrated:")
print("="*50)
best_practices = [
    "✅ Used proper error handling",
    "✅ Added delays between requests",
    "✅ Used appropriate HTTP methods",
    "✅ Included User-Agent headers",
    "✅ Implemented retry logic",
    "✅ Respected rate limits",
    "✅ Structured response data properly"
]

for practice in best_practices:
    print(practice)

## Session Management and Connection Pooling

In [None]:
# Using sessions for improved performance and persistent settings
def demo_sessions():
    # Create a session
    session = requests.Session()
    
    # Set default headers for all requests
    session.headers.update({
        'User-Agent': 'Python Data Science API Client 1.0',
        'Accept': 'application/json'
    })
    
    # Set default timeout
    session.timeout = 10
    
    print("🔗 Using session for multiple requests:")
    print("=" * 40)
    
    urls = [
        'https://httpbin.org/get?test=1',
        'https://httpbin.org/get?test=2',
        'https://httpbin.org/get?test=3'
    ]
    
    start_time = time.time()
    
    for i, url in enumerate(urls, 1):
        try:
            response = session.get(url)
            print(f"Request {i}: Status {response.status_code}, Time: {response.elapsed.total_seconds():.3f}s")
        except Exception as e:
            print(f"Request {i}: Error - {e}")
    
    total_time = time.time() - start_time
    print(f"\nTotal time: {total_time:.3f}s")
    
    # Close the session
    session.close()
    
    return total_time

# Compare with individual requests
def demo_individual_requests():
    print("\n🔀 Using individual requests:")
    print("=" * 40)
    
    urls = [
        'https://httpbin.org/get?test=1',
        'https://httpbin.org/get?test=2',
        'https://httpbin.org/get?test=3'
    ]
    
    start_time = time.time()
    
    for i, url in enumerate(urls, 1):
        try:
            response = requests.get(url, timeout=10)
            print(f"Request {i}: Status {response.status_code}, Time: {response.elapsed.total_seconds():.3f}s")
        except Exception as e:
            print(f"Request {i}: Error - {e}")
    
    total_time = time.time() - start_time
    print(f"\nTotal time: {total_time:.3f}s")
    
    return total_time

# Run both demos
session_time = demo_sessions()
individual_time = demo_individual_requests()

print(f"\n📊 Performance comparison:")
print(f"Session time: {session_time:.3f}s")
print(f"Individual time: {individual_time:.3f}s")
if session_time < individual_time:
    improvement = ((individual_time - session_time) / individual_time) * 100
    print(f"Sessions were {improvement:.1f}% faster! 🚀")
else:
    print("Individual requests were faster (could be due to network variance)")

## Key Takeaways

### Essential HTTP/API Concepts:
1. **HTTP Methods**: GET (retrieve), POST (create), PUT (update), DELETE (remove)
2. **Status Codes**: 200 (success), 404 (not found), 500 (server error), etc.
3. **Headers**: Metadata about requests/responses (authentication, content type, etc.)
4. **Authentication**: Basic auth, Bearer tokens, API keys
5. **JSON**: Standard format for API data exchange

### Best Practices:
1. **Always handle errors** gracefully with try/except blocks
2. **Use sessions** for multiple requests to the same API
3. **Implement rate limiting** to respect API limits
4. **Add retry logic** with exponential backoff
5. **Set appropriate timeouts** to avoid hanging requests
6. **Use proper User-Agent headers** to identify your application
7. **Validate and sanitize** API responses before use

### For Data Science/NLP Projects:
- **Data Collection**: Fetch real-time data from Twitter, news, financial APIs
- **NLP Services**: Use cloud-based APIs for translation, sentiment analysis
- **Model Deployment**: Create APIs to serve machine learning models
- **Data Integration**: Connect multiple data sources and services

### Common API Patterns:
```python
# Basic pattern
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()  # Raises exception for bad status codes
data = response.json()

# With error handling
try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response.json()
except requests.RequestException as e:
    print(f"API request failed: {e}")
    return None
```

## Practice Exercises

1. **Build a news aggregator** using a free news API
2. **Create a weather dashboard** fetching data from multiple cities
3. **Implement a social media sentiment analyzer** using Twitter API
4. **Build a stock price tracker** with financial APIs
5. **Create a translation service** using Google Translate API
6. **Develop a movie recommendation system** using movie database APIs
7. **Build an API client library** with proper error handling and rate limiting

## Next Steps

Master API integration to:
- **Access real-time data** for machine learning projects
- **Deploy your models** as web services
- **Integrate multiple data sources** efficiently
- **Build data pipelines** that fetch and process external data
- **Create interactive applications** that respond to live data

APIs are the bridge between your Python code and the vast amount of data and services available on the internet!