# Data Acquisition via API Calls (NewsAPI)

---

## Phase 1: Setup and Security

### Install Dependencies

In [1]:
# Install the necessary library for loading environment variables (like API keys)
!pip install python-dotenv

print("Dependencies installed successfully.")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Dependencies installed successfully.


### Import Modules
Import all necessary Python libraries. Note that `requests` is the primary tool for submitting HTTP requests.

In [2]:
import numpy as np
import pandas as pd
import os
import requests
import json
import dotenv
from datetime import datetime

print("Modules imported.")

Modules imported.


### Load Environment Variables
This loads your `.env` file, which is crucial for securely handling your NewsAPI key without exposing it in the notebook.

In [3]:
# Load the .env file to access environment variables (e.g., your API Key)
dotenv.load_dotenv()

print("Environment variables loaded.")

Environment variables loaded.


### Retrieve API Key
We access the stored API key using os.getenv().

In [4]:
# Get the News API key from the environment variables
newskey = os.getenv('newskey')

# We won't print the key, but we confirm it's loaded (e.g., by checking its length)
if newskey:
    print("News API key successfully retrieved.")
else:
    print("ERROR: News API key not found. Check your .env file.")

News API key successfully retrieved.


---

## Phase 2: Building the API Request Components
An HTTP request requires four main components: URL root, endpoint, headers (for security/metadata), and parameters (the query).

### Build User Agent Header
Many APIs require a User-agent to identify the client application. We use httpbin.org to dynamically grab a standard User-agent string.

In [5]:
# 1. Get a standard User-agent string
r = requests.get('https://httpbin.org/user-agent')
useragent = json.loads(r.text)['user-agent']

# 2. Build the full headers dictionary
# The API key is sent via the 'X-Api-Key' header, a common secure method.
headers = {'User-agent': useragent,
           'X-Api-Key': newskey}

print(f"User-agent: {useragent}")
print("Headers dictionary created, containing the API key.")

User-agent: python-requests/2.32.4
Headers dictionary created, containing the API key.


### Define URL Root and Endpoint
This defines the fixed address for the NewsAPI service we want to use.

In [6]:
# Define the fixed URL parts
root = 'https://newsapi.org'
endpoint = '/v2/everything' # This endpoint searches all articles

print(f"API Base URL: {root + endpoint}")

API Base URL: https://newsapi.org/v2/everything


### Define Query Parameters
Parameters are used to customize the search (e.g., topic, language, date).

In [7]:
# Define the search parameters as a Python dictionary
params = {'q': '"tallest mountain"',  # Topic to search for (using quotes for exact phrase)
         'searchIn': 'content',      # Search within the article content
         'language': 'en',           # Restrict to English articles
         'pageSize': 100}            # Request up to 100 articles

print("Query parameters defined:")
print(params)

Query parameters defined:
{'q': '"tallest mountain"', 'searchIn': 'content', 'language': 'en', 'pageSize': 100}


---

## Phase 3: API Execution and Parsing

### Submit the GET Request
This cell sends the request and checks the response status. A `<Response [200]>` indicates success.

In [8]:
# Combine the components and submit the GET request
r = requests.get(root + endpoint,
                headers = headers,
                params = params)

# Display the response object (it should show <Response [200]>)
print(f"Request submitted. Status: {r}")

Request submitted. Status: <Response [200]>


### Parse the JSON Response
The response (`r.text`) is a single string containing the JSON data. We use `json.loads()` to convert this string into a usable Python dictionary.

In [9]:
# Convert the JSON response string into a Python dictionary
myjson = json.loads(r.text)

print(f"JSON response converted to a Python dictionary (Type: {type(myjson)})")
print("Top-level keys in the response:")
print(list(myjson.keys()))

JSON response converted to a Python dictionary (Type: <class 'dict'>)
Top-level keys in the response:
['status', 'totalResults', 'articles']


### View Raw JSON Data Structure (Optional)
This cell is often left commented out to avoid printing a massive wall of text but serves as a way to inspect the data structure.

In [10]:
# Uncomment this line to inspect the full structure of the JSON dictionary
# print(json.dumps(myjson, indent=4))

### Normalize JSON to DataFrame
The key data is nested under the articles key. Pandas’ json_normalize() function flattens this nested data into a clean, tabular DataFrame.

In [11]:
# Use json_normalize to extract the list of article dictionaries ('articles')
news_df = pd.json_normalize(myjson, record_path = ['articles'])

print(f"DataFrame created with {len(news_df)} articles.")
print("First 5 rows of the DataFrame:")
display(news_df.head())

DataFrame created with 16 articles.
First 5 rows of the DataFrame:


Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,source.id,source.name
0,Lydia Mansel,This Historic Train Climbs the Tallest Mountai...,Riding the Mount Washington Cog Railway is one...,https://www.travelandleisure.com/mount-washing...,https://s.yimg.com/ny/api/res/1.2/JmSXRiaGiWkv...,2025-09-20T16:00:00Z,Key Points\r\n<ul><li>The Mount Washington Cog...,,Travel+Leisure
1,TMZ Staff,Hundreds of Mt. Everest Climbers Stuck on Moun...,"Nearly 1,000 climbers were trapped on Mt. Ever...",https://www.tmz.com/2025/10/05/hundreds-mt-eve...,https://imagez.tmz.com/image/6c/16by9/2025/10/...,2025-10-05T21:08:25Z,"Nearly 1,000 climbers were trapped on Mt. Ever...",,TMZ
2,Lyndal Rowlands,More than 350 trekkers escape blizzard-hit Eve...,Rescued trekkers reach China's Qudang township...,https://www.aljazeera.com/news/2025/10/6/more-...,https://www.aljazeera.com/wp-content/uploads/2...,2025-10-06T01:01:28Z,Rescuers have guided more than 350 people to s...,al-jazeera-english,Al Jazeera English
3,Express Web Desk,"Polish skier becomes first to climb, ski down ...","In 2018, the Polish climber was the first pers...",https://indianexpress.com/article/world/polish...,https://images.indianexpress.com/2025/09/polan...,2025-09-28T04:50:59Z,Polish skier Andrzej Bargiel made history this...,,The Indian Express
4,Georgie English,CLEAREST EVER SIGNS OF LIFE ON MARS...,NASA has revealed the clearest signs of life o...,https://www.the-sun.com/tech/15159152/nasa-mar...,https://www.the-sun.com/wp-content/uploads/sit...,2025-09-10T17:34:37Z,NASA has revealed the clearest signs of life o...,,The-sun.com


---

## Phase 4: Data Analysis and Export

### Clean and Prepare for Export
Perform a final step to ensure the data is properly formatted before saving.

In [12]:
# Clean up the publishedAt column (convert to datetime)
news_df['publishedAt'] = pd.to_datetime(news_df['publishedAt'])

# Select a final set of columns for the CSV
final_df = news_df[['publishedAt', 'title', 'description', 'url', 'source.name', 'author']].copy()

print("DataFrame prepared for export.")

DataFrame prepared for export.


### Export to CSV
This is the final Load (L) phase of the data acquisition process.

In [13]:
# Define a filename based on the current date for organization
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f'news_articles_{timestamp}.csv'

# Export the final clean DataFrame to a CSV file
final_df.to_csv(output_filename, index=False)

print(f"Successfully exported {len(final_df)} articles to {output_filename}")

Successfully exported 16 articles to news_articles_20251007_221559.csv


---

## Function Definition (Optional Consolidation)
This final cell provides the consolidation of all steps into a single, reusable function—demonstrating how a script would execute the entire API call process.

In [14]:
def grab_latest_articles():
    """
    Consolidates all steps: builds request, calls API, parses JSON, and returns a DataFrame.
    """
    
    # Prompt the user
    topic = input("Please enter your topic of interest: ")
    
    # Build our Headers
    newskey = os.getenv('newskey')
    r = requests.get('https://httpbin.org/user-agent')
    useragent = json.loads(r.text)['user-agent']
    headers = {'User-agent': useragent,
               'X-Api-Key': newskey}

    # Build our URL and Parameters
    root = 'https://newsapi.org'
    endpoint = '/v2/everything'
    params = {'q': topic,
              'searchIn': 'content',
              'language': 'en',
              'pageSize': 100}
    
    # Submit our Request
    r = requests.get(root + endpoint,
                headers = headers,
                params = params)
    
    # Create and return the pandas dataframe
    myjson = json.loads(r.text)
    news_df = pd.json_normalize(myjson, record_path = ['articles'])
    
    return news_df

In [15]:
grab_latest_articles()

Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,source.id,source.name
0,Jake Lahut,Melania Trump’s AI Era Is Upon Us,The ever elusive first lady has emerged with a...,https://www.wired.com/story/melania-trumps-ai-...,https://media.wired.com/photos/68bf2d04f603ec4...,2025-09-10T15:00:00Z,Its unclear how Melanias initiative will follo...,wired,Wired
1,Charles Pulliam-Moore,How generative AI boosters are trying to break...,"This is The Stepback, a weekly newsletter brea...",https://www.theverge.com/column/785975/hollywo...,https://platform.theverge.com/wp-content/uploa...,2025-09-26T17:13:36Z,<ul><li></li><li></li><li></li></ul>\r\nAI sta...,the-verge,The Verge
2,Andrew J. Hawkins,Robotaxis as public transit? Waymo thinks so,Waymo is teaming up with tech transit startup ...,https://www.theverge.com/news/780156/waymo-via...,https://platform.theverge.com/wp-content/uploa...,2025-09-17T18:16:29Z,<ul><li></li><li></li><li></li></ul>\r\nThe co...,the-verge,The Verge
3,Ed Cara,Trump Shares Bizarre AI Video Promising Magic ...,"In a since-deleted Truth Social post, an AI ve...",https://gizmodo.com/trump-shares-bizarre-ai-vi...,https://gizmodo.com/app/uploads/2025/09/Trumpt...,2025-09-29T16:10:59Z,President Donald Trump seems to have gone off ...,,Gizmodo.com
4,Gizmodo Team,Welcome to Gizmodo’s New Look,"Cleaner, faster, and easier to battle it out i...",https://gizmodo.com/welcome-to-gizmodos-new-lo...,https://gizmodo.com/app/uploads/2025/09/Img-Ne...,2025-09-25T13:44:02Z,Today were launching a better reading experien...,,Gizmodo.com
...,...,...,...,...,...,...,...,...,...
95,msmash,"After Years of Resistance, Apple Might Finally...",An anonymous reader shares a report: After yea...,https://apple.slashdot.org/story/25/09/17/1662...,https://a.fsdn.com/sd/topics/applelaptop_64.png,2025-09-17T16:06:00Z,After years of dismissing the idea of putting ...,,Slashdot.org
96,msmash,Pakistan Spying On Millions Through Phone-Tapp...,Pakistan has built surveillance systems that i...,https://tech.slashdot.org/story/25/09/09/11325...,https://a.fsdn.com/sd/topics/communications_64...,2025-09-09T10:54:00Z,Pakistan has built surveillance systems that i...,,Slashdot.org
97,Reuters,Australian watchdog says budget retailer Kmart...,The Office of the Australian Information Commi...,https://www.yahoo.com/news/articles/australian...,https://s.yimg.com/cv/apiv2/social/images/yaho...,2025-09-18T00:38:55Z,(Reuters) -An Australian regulator has found W...,,Yahoo Entertainment
98,,President Trump in UK for historic second stat...,The visit will see a crowded mix of royal page...,https://www.bbc.com/news/articles/cz9jyzl4532o,https://ichef.bbci.co.uk/news/1024/branded_new...,2025-09-16T22:01:23Z,Sean CoughlanRoyal correspondent\r\nFirst Lady...,,BBC News
