# Python Crash Course - Chapter 16: Downloading Data

This notebook contains exercises from Chapter 16 of Python Crash Course by Eric Matthes. This chapter focuses on downloading data from online sources, working with APIs, and processing web-based datasets for visualization and analysis.

## Learning Objectives:
- Download data from web APIs using the requests library
- Work with CSV files and data parsing
- Process JSON data from web services
- Handle API responses and error conditions
- Create visualizations from real-world datasets
- Work with datetime data and time series
- Understand API rate limiting and best practices
- Process large datasets efficiently

---

## Setup: Required Imports

First, let's import the libraries we'll need for this chapter:

In [None]:
# Required imports for Chapter 16 exercises
import requests
import json
import csv
from datetime import datetime
import matplotlib.pyplot as plt
from urllib.error import URLError

# Test imports and show versions
print(f"Requests version: {requests.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print("All imports successful!")
print("Ready to download and visualize web data!")

## 16-1 San Francisco

In [None]:
# Exercise 16-1: San Francisco
# Are temperatures in San Francisco more like temperatures in Sitka or temperatures in Death Valley?
# Download some data for San Francisco, and generate a high-low temperature plot for San Francisco
# to make a comparison.

# Note: You'll need to find a weather data source or use historical weather data
# Example sources: OpenWeatherMap API, NOAA data, or local CSV files

# Here I will write the code and corresponding comments to complete the training tasks

## 16-2 Sitka-Death Valley Comparison

In [None]:
# Exercise 16-2: Sitka-Death Valley Comparison
# The temperature scales on the Sitka and Death Valley graphs reflect the different ranges
# of the data. To accurately compare the temperature range in Sitka to that of Death Valley,
# you need identical scales on the y-axis. Change the settings for the y-axis on one or both
# of the charts in Figures 16-5 and 16-6, and make a direct comparison between temperature
# ranges in Sitka and Death Valley (or any two places you want to compare).

# Sample data structure for weather data
sitka_data = {
    'dates': [],
    'highs': [],
    'lows': []
}

death_valley_data = {
    'dates': [],
    'highs': [],
    'lows': []
}

# Here I will write the code and corresponding comments to complete the training tasks

## 16-3 Rainfall

In [None]:
# Exercise 16-3: Rainfall
# Choose a location you're curious about, and make a visualization that plots its rainfall.
# Start by focusing on one month's data, and then once your code is working,
# see if you can pull in a full year's worth of data.

def parse_rainfall_data(filename):
    """Parse rainfall data from CSV file."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def plot_rainfall(dates, rainfall_amounts, location):
    """Create a rainfall visualization."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## 16-4 Explore

In [None]:
# Exercise 16-4: Explore
# Generate a few more visualizations that examine any other weather aspect you're curious about
# for any locations you're curious about.

# Ideas for exploration:
# - Wind speed patterns
# - Humidity levels
# - Pressure changes
# - Seasonal variations
# - Multiple cities comparison

# Here I will write the code and corresponding comments to complete the training tasks

## 16-5 Testing python-requests

In [None]:
# Exercise 16-5: Testing python-requests
# Visit the home page for the python-requests project (at https://requests.readthedocs.io/)
# and look at the status of the project. In particular, look at the Issues and Pull Requests
# to get a sense of the project's activity.

# Test the requests library with a simple API call
def test_requests_library():
    """Test the requests library with a simple API call."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## 16-6 Refactoring

In [None]:
# Exercise 16-6: Refactoring
# The loop that pulls data from all_python_repos.json is getting pretty long.
# Create a function called get_repo_dict() that takes one repository dictionary and
# returns the values you're plotting. Call this function once for each repository dictionary.

def get_repo_dict(repo):
    """Extract relevant information from a repository dictionary."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def fetch_python_repos():
    """Fetch Python repositories from GitHub API."""
    # GitHub API endpoint for Python repositories
    url = 'https://api.github.com/search/repositories'
    
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def plot_repo_data(repo_data):
    """Create visualization of repository data."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## 16-7 Automated Testing

In [None]:
# Exercise 16-7: Automated Testing
# When testing whether the key 'stargazers_count' is in the repository dictionary,
# write a test that will pass whether the key is 'stargazers_count' or 'watchers_count'.
# This will help make your code more robust, because GitHub sometimes uses one
# term and sometimes the other.

def safe_get_stars(repo):
    """Safely get star count from repository data, handling different key names."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def test_repo_data_access():
    """Test different ways of accessing repository data."""
    # Create test repository data with different key structures
    test_repo_1 = {
        'name': 'test-repo-1',
        'stargazers_count': 1500,
        'html_url': 'https://github.com/test/repo1'
    }
    
    test_repo_2 = {
        'name': 'test-repo-2',
        'watchers_count': 2000,
        'html_url': 'https://github.com/test/repo2'
    }
    
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## 16-8 Recent Repositories

In [None]:
# Exercise 16-8: Recent Repositories
# Modify the API call in python_repos.py so it generates a chart showing the most recently
# created Python projects on GitHub.

def fetch_recent_python_repos():
    """Fetch recently created Python repositories from GitHub API."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def parse_creation_dates(repos):
    """Parse and format repository creation dates."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def plot_recent_repos(repos):
    """Create visualization of recently created repositories."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## 16-9 Testing python-requests

In [None]:
# Exercise 16-9: Testing python-requests
# Look at the information returned by the API call in python_repos.py.
# Make a chart showing Python projects that have the most forks.

def fetch_most_forked_repos():
    """Fetch Python repositories sorted by fork count."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def plot_fork_data(repos):
    """Create visualization of most forked repositories."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## Working with API Keys and Authentication

In [None]:
# Practice working with APIs that require authentication
# Learn best practices for handling API keys securely

import os
from urllib.parse import urlencode

class APIClient:
    """A generic API client with authentication support."""
    
    def __init__(self, base_url, api_key=None):
        """Initialize API client."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass
    
    def make_request(self, endpoint, params=None):
        """Make authenticated API request."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass
    
    def handle_rate_limit(self, response):
        """Handle API rate limiting gracefully."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass

# Here I will write the code and corresponding comments to complete the training tasks

## Error Handling and Robust API Calls

In [None]:
# Practice robust error handling for web requests
# Learn to handle network errors, API errors, and data parsing issues

def robust_api_call(url, params=None, max_retries=3):
    """Make a robust API call with error handling and retries."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def validate_api_response(response_data, required_fields):
    """Validate that API response contains required fields."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def safe_data_extraction(data, key_path):
    """Safely extract nested data from API responses."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## Working with Large Datasets

In [None]:
# Practice techniques for handling large datasets from APIs
# Learn pagination, streaming, and memory-efficient processing

def paginated_api_call(base_url, params=None, page_size=100):
    """Handle paginated API responses to get all data."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def stream_large_dataset(url, chunk_size=1024):
    """Stream large files without loading everything into memory."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

def process_data_in_chunks(data, chunk_size=1000):
    """Process large datasets in manageable chunks."""
    # Here I will write the code and corresponding comments to complete the training tasks
    pass

# Here I will write the code and corresponding comments to complete the training tasks

## Data Caching and Performance

In [None]:
# Implement caching strategies to improve performance and reduce API calls
# Learn when and how to cache API responses

import pickle
import time
from pathlib import Path

class DataCache:
    """Simple file-based cache for API responses."""
    
    def __init__(self, cache_dir='cache', cache_duration=3600):
        """Initialize cache with directory and duration settings."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass
    
    def get(self, key):
        """Get cached data if it exists and is not expired."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass
    
    def set(self, key, data):
        """Cache data with timestamp."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass
    
    def is_expired(self, timestamp):
        """Check if cached data has expired."""
        # Here I will write the code and corresponding comments to complete the training tasks
        pass

# Here I will write the code and corresponding comments to complete the training tasks

---

## Summary

Congratulations! You've completed all the exercises for Chapter 16 on Downloading Data. You should now be comfortable with:

**Key Concepts Practiced:**
- **Web APIs**: Making HTTP requests to retrieve data from online services
- **JSON Processing**: Parsing and extracting data from JSON API responses
- **CSV Data Handling**: Reading and processing comma-separated value files
- **Data Visualization**: Creating charts from real-world datasets
- **Error Handling**: Robust programming practices for network operations
- **Authentication**: Working with API keys and secure access methods

**Technical Skills Developed:**
- **HTTP Requests**: Using the requests library for web data access
- **API Integration**: Understanding REST APIs and response handling
- **Data Parsing**: Converting raw data into usable Python structures
- **Date/Time Processing**: Working with timestamps and time series data
- **Performance Optimization**: Caching, pagination, and efficient data processing
- **Security Best Practices**: Safe handling of API credentials and rate limiting

**Real-World Applications:**
- **Weather Analysis**: Processing meteorological data for insights
- **Social Media Analytics**: Analyzing trends and engagement metrics
- **Financial Data**: Stock prices, market trends, and economic indicators
- **Scientific Research**: Accessing research databases and datasets
- **Business Intelligence**: Integrating external data sources for analysis
- **IoT and Sensors**: Collecting and processing sensor data streams

**Programming Best Practices:**
- **Code Refactoring**: Breaking complex operations into reusable functions
- **Error Recovery**: Handling network failures and malformed data gracefully
- **Testing Strategies**: Validating data integrity and API responses
- **Memory Management**: Efficient processing of large datasets
- **Documentation**: Clear code comments and function documentation

**Professional Development:**
- **API Documentation**: Reading and understanding API specifications
- **Rate Limiting**: Respecting service limitations and usage policies
- **Data Ethics**: Understanding terms of service and data usage rights
- **Monitoring**: Tracking API usage and performance metrics
- **Scalability**: Designing systems that can handle growing data needs

**Next Steps:**
- Explore additional APIs in domains that interest you
- Practice with different data formats (XML, GraphQL, Protocol Buffers)
- Learn about more advanced authentication methods (OAuth, JWT)
- Move on to Chapter 17: Working with APIs
- Consider building a complete data pipeline project

**Advanced Topics to Explore:**
- **Async Programming**: Using async/await for concurrent API calls
- **Database Integration**: Storing API data in databases for analysis
- **Data Streaming**: Real-time data processing and visualization
- **Machine Learning**: Using downloaded data for predictive modeling
- **Web Scraping**: Extracting data from websites without APIs

---

*Note: Working with external data sources is a fundamental skill in modern programming. The ability to integrate, process, and visualize real-world data opens up countless possibilities for meaningful applications. These skills are essential for data science, web development, scientific computing, and business applications. Keep practicing with different APIs and data sources to build your expertise!*