# Data Acquisition via API Calls (NewsAPI)

---

## Phase 1: Setup and Security

### Install Dependencies

In [7]:
# Install the necessary library for loading environment variables (like API keys)
!pip install python-dotenv

print("Dependencies installed successfully.")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Dependencies installed successfully.


### Import Modules
Import all necessary Python libraries. Note that `requests` is the primary tool for submitting HTTP requests.

In [8]:
import numpy as np
import pandas as pd
import os
import requests
import json
import dotenv
from datetime import datetime

print("Modules imported.")

Modules imported.


### Load Environment Variables
This loads your `.env` file, which is crucial for securely handling your NewsAPI key without exposing it in the notebook.

In [9]:
# Load the .env file to access environment variables (e.g., your API Key)
dotenv.load_dotenv()

print("Environment variables loaded.")

Environment variables loaded.


### Retrieve API Key
We access the stored API key using os.getenv().

In [10]:
# Get the News API key from the environment variables
newskey = os.getenv('newskey')

# We won't print the key, but we confirm it's loaded (e.g., by checking its length)
if newskey:
    print("News API key successfully retrieved.")
else:
    print("ERROR: News API key not found. Check your .env file.")

News API key successfully retrieved.


---

## Phase 2: Building the API Request Components
An HTTP request requires four main components: URL root, endpoint, headers (for security/metadata), and parameters (the query).

### Build User Agent Header
Many APIs require a User-agent to identify the client application. We use httpbin.org to dynamically grab a standard User-agent string.

In [11]:
# 1. Get a standard User-agent string
r = requests.get('https://httpbin.org/user-agent')
useragent = json.loads(r.text)['user-agent']

# 2. Build the full headers dictionary
# The API key is sent via the 'X-Api-Key' header, a common secure method.
headers = {'User-agent': useragent,
           'X-Api-Key': newskey}

print(f"User-agent: {useragent}")
print("Headers dictionary created, containing the API key.")

User-agent: python-requests/2.32.4
Headers dictionary created, containing the API key.


### Define URL Root and Endpoint
This defines the fixed address for the NewsAPI service we want to use.

In [12]:
# Define the fixed URL parts
root = 'https://newsapi.org'
endpoint = '/v2/everything' # This endpoint searches all articles

print(f"API Base URL: {root + endpoint}")

API Base URL: https://newsapi.org/v2/everything


### Define Query Parameters
Parameters are used to customize the search (e.g., topic, language, date).

In [13]:
# Define the search parameters as a Python dictionary
params = {'q': '"tallest mountain"',  # Topic to search for (using quotes for exact phrase)
         'searchIn': 'content',      # Search within the article content
         'language': 'en',           # Restrict to English articles
         'pageSize': 100}            # Request up to 100 articles

print("Query parameters defined:")
print(params)

Query parameters defined:
{'q': '"tallest mountain"', 'searchIn': 'content', 'language': 'en', 'pageSize': 100}


---

## Phase 3: API Execution and Parsing

### Submit the GET Request
This cell sends the request and checks the response status. A `<Response [200]>` indicates success.

In [14]:
# Combine the components and submit the GET request
r = requests.get(root + endpoint,
                headers = headers,
                params = params)

# Display the response object (it should show <Response [200]>)
print(f"Request submitted. Status: {r}")

Request submitted. Status: <Response [200]>


### Parse the JSON Response
The response (`r.text`) is a single string containing the JSON data. We use `json.loads()` to convert this string into a usable Python dictionary.

In [15]:
# Convert the JSON response string into a Python dictionary
myjson = json.loads(r.text)

print(f"JSON response converted to a Python dictionary (Type: {type(myjson)})")
print("Top-level keys in the response:")
print(list(myjson.keys()))

JSON response converted to a Python dictionary (Type: <class 'dict'>)
Top-level keys in the response:
['status', 'totalResults', 'articles']


### View Raw JSON Data Structure (Optional)
This cell is often left commented out to avoid printing a massive wall of text but serves as a way to inspect the data structure.

In [16]:
# Uncomment this line to inspect the full structure of the JSON dictionary
# print(json.dumps(myjson, indent=4))

### Normalize JSON to DataFrame
The key data is nested under the articles key. Pandas’ json_normalize() function flattens this nested data into a clean, tabular DataFrame.

In [17]:
# Use json_normalize to extract the list of article dictionaries ('articles')
news_df = pd.json_normalize(myjson, record_path = ['articles'])

print(f"DataFrame created with {len(news_df)} articles.")
print("First 5 rows of the DataFrame:")
display(news_df.head())

DataFrame created with 21 articles.
First 5 rows of the DataFrame:


Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,source.id,source.name
0,Lydia Mansel,This Historic Train Climbs the Tallest Mountai...,Riding the Mount Washington Cog Railway is one...,https://www.travelandleisure.com/mount-washing...,https://s.yimg.com/ny/api/res/1.2/JmSXRiaGiWkv...,2025-09-20T16:00:00Z,Key Points\r\n<ul><li>The Mount Washington Cog...,,Travel+Leisure
1,Express Web Desk,"Polish skier becomes first to climb, ski down ...","In 2018, the Polish climber was the first pers...",https://indianexpress.com/article/world/polish...,https://images.indianexpress.com/2025/09/polan...,2025-09-28T04:50:59Z,Polish skier Andrzej Bargiel made history this...,,The Indian Express
2,Georgie English,CLEAREST EVER SIGNS OF LIFE ON MARS...,NASA has revealed the clearest signs of life o...,https://www.the-sun.com/tech/15159152/nasa-mar...,https://www.the-sun.com/wp-content/uploads/sit...,2025-09-10T17:34:37Z,NASA has revealed the clearest signs of life o...,,The-sun.com
3,ABC News,"Sudan landslide claims 1,000 lives, village 'c...",The Sudan Liberation Movement/Army is appealin...,https://www.abc.net.au/news/2025-09-02/sudan-l...,https://live-production.wcms.abc-cdn.net.au/73...,2025-09-02T07:52:51Z,"At least 1,000 people were killed in a landsli...",abc-news-au,ABC News (AU)
4,RTÉ News,"Sudan landslide kills more than 1,000, says re...","A ""massive"" landslide in Sudan's western Darfu...",https://www.rte.ie/news/world/2025/0902/153131...,https://www.rte.ie/images/00231b4a-1600.jpg,2025-09-02T06:17:44Z,"A ""massive"" landslide in Sudan's western Darfu...",rte,RTE


---

## Phase 4: Data Analysis and Export

### Clean and Prepare for Export
Perform a final step to ensure the data is properly formatted before saving.

In [18]:
# Clean up the publishedAt column (convert to datetime)
news_df['publishedAt'] = pd.to_datetime(news_df['publishedAt'])

# Select a final set of columns for the CSV
final_df = news_df[['publishedAt', 'title', 'description', 'url', 'source.name', 'author']].copy()

print("DataFrame prepared for export.")

DataFrame prepared for export.


### Export to CSV
This is the final Load (L) phase of the data acquisition process.

In [19]:
# Define a filename based on the current date for organization
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f'news_articles_{timestamp}.csv'

# Export the final clean DataFrame to a CSV file
final_df.to_csv(output_filename, index=False)

print(f"Successfully exported {len(final_df)} articles to {output_filename}")

Successfully exported 21 articles to news_articles_20251001_160645.csv


---

## Function Definition (Optional Consolidation)
This final cell provides the consolidation of all steps into a single, reusable function—demonstrating how a script would execute the entire API call process.

In [20]:
def grab_latest_articles():
    """
    Consolidates all steps: builds request, calls API, parses JSON, and returns a DataFrame.
    """
    
    # Prompt the user
    topic = input("Please enter your topic of interest: ")
    
    # Build our Headers
    newskey = os.getenv('newskey')
    r = requests.get('https://httpbin.org/user-agent')
    useragent = json.loads(r.text)['user-agent']
    headers = {'User-agent': useragent,
               'X-Api-Key': newskey}

    # Build our URL and Parameters
    root = 'https://newsapi.org'
    endpoint = '/v2/everything'
    params = {'q': topic,
              'searchIn': 'content',
              'language': 'en',
              'pageSize': 100}
    
    # Submit our Request
    r = requests.get(root + endpoint,
                headers = headers,
                params = params)
    
    # Create and return the pandas dataframe
    myjson = json.loads(r.text)
    news_df = pd.json_normalize(myjson, record_path = ['articles'])
    
    return news_df

In [21]:
grab_latest_articles()

Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,source.id,source.name
0,Ben Rich,Strong winds sweeping across UK - but why isn'...,There is a chance of travel disruption on Mond...,https://www.bbc.com/weather/articles/c4gwrg7ep58o,https://ichef.bbci.co.uk/ace/branded_weather/1...,2025-09-15T10:08:41Z,The forecast for this week remains very unsett...,,BBC News
1,Terrence O’Brien,Americans want AI to stay out of their persona...,A new study from Pew suggests that Americans a...,https://www.theverge.com/ai-artificial-intelli...,https://platform.theverge.com/wp-content/uploa...,2025-09-17T18:00:28Z,<ul><li></li><li></li><li></li></ul>\r\nTheyre...,the-verge,The Verge
2,Chris Fawkes,Heavy rain set to continue across UK,The rain is set to continue into the weekend a...,https://www.bbc.com/weather/articles/cz69gpy99w7o,https://ichef.bbci.co.uk/ace/branded_weather/1...,2025-09-17T14:32:46Z,Heavy rain is set to drench much of the UK ove...,,BBC News
3,"sanujb6@gmail.com (Sanuj Bhatia) , Sanuj Bhatia","Google is phasing out the Wear OS Weather app,...",Older Wear OS watches keep the existing Weathe...,https://www.androidcentral.com/apps-software/w...,https://cdn.mos.cms.futurecdn.net/cm3nSuJTMJFT...,2025-09-12T19:01:00Z,What you need to know\r\n<ul><li>Google will s...,,Android Central
4,Simon King,Is a UK heatwave in the weather forecast for S...,Simon King guides us through the weather we're...,https://www.bbc.com/weather/articles/c8jp4dm2mv2o,https://ichef.bbci.co.uk/ace/branded_weather/1...,2025-09-01T11:57:31Z,The Met Office released the new set of storm n...,,BBC News
...,...,...,...,...,...,...,...,...,...
95,Julian Chokkattu,Gear News of the Week: Veo 3 Comes to Google P...,Plus: The Polar Loop looks a lot like the Whoo...,https://www.wired.com/story/gear-news-of-the-w...,https://media.wired.com/photos/68bbbf8a0a1d5d5...,2025-09-06T10:00:00Z,Google via Julian Chokkattu\r\nA few months ag...,wired,Wired
96,Sean McDowell,South Kansas City residents speak out after El...,Police say a 49-year old grocery store owner i...,https://fox4kc.com/news/south-kansas-city-resi...,https://media.zenfs.com/en/wdaf_articles_412/a...,2025-09-15T22:43:52Z,"KANSAS CITY, Mo. Police say a 49-year old groc...",,WDAF FOX4 Kansas City
97,Dave Leval,GWU ups security after controversial Charlie K...,An employee at the Mount Vernon campus of Geor...,https://www.dcnewsnow.com/news/local-news/wash...,https://media.zenfs.com/en/wdvm_articles_310/2...,2025-09-14T02:06:49Z,WASHINGTON (DC News Now) An employee at the Mo...,,Dcnewsnow.com
98,weather.com meteorologists,Forecast Cone Of Uncertainty: Facts And Myths,"Each tropical depression, storm and hurricane ...",https://weather.com/safety/hurricane/news/2024...,https://s.yimg.com/ny/api/res/1.2/FBfgO9Gz3Csv...,2025-09-26T15:09:00Z,It's a staple of every hurricane season: the f...,,The Weather Channel
