# Process Raw News Data About Nvidia

This notebook processes raw JSON news data about Nvidia and related tech topics from TechCrunch articles. It:
1. Loads the raw JSON data
2. Normalizes and cleans the data into a pandas DataFrame
3. Exports the processed data to CSV and Excel formats

## Import Required Libraries

In [5]:
import pandas as pd
import os
from pathlib import Path

## Configuration

Define paths and settings to make the notebook more maintainable and configurable.

In [6]:
INPUT_PATH = Path('../data/raw/news_response_2025-04-11_2025-04-25.json')
OUTPUT_DIR = Path('../data/processed')
CSV_OUTPUT = OUTPUT_DIR / 'news_articles.csv'
EXCEL_OUTPUT = OUTPUT_DIR / 'news_articles.xlsx'

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

SELECTED_COLUMNS = ['source.name', 'author', 'title', 'description', 'content', 'url']

## Data Loading and Processing

Load the raw JSON news data and convert it to a structured DataFrame.

In [7]:
def load_news_data(file_path):
    """Load news data from JSON file and extract articles."""
    try:
        raw_data = pd.read_json(file_path)
        return raw_data['articles']
    except Exception as e:
        print(f"Error loading news data: {e}")
        return None

# Load raw news data
articles = load_news_data(INPUT_PATH)

# Process into DataFrame
if articles is not None:
    df_news = pd.json_normalize(articles)
    df_selected = df_news[SELECTED_COLUMNS]
    
    # Display info about the loaded data
    source_counts = df_selected['source.name'].value_counts()
    print(f"Successfully loaded data with {len(df_selected)} news articles from {', '.join(source_counts.index)}.")

Successfully loaded data with 10 news articles from TechCrunch.


## Preview the Processed Data

In [11]:
df_selected

Unnamed: 0,source.name,author,title,description,content,url
0,TechCrunch,Anthony Ha,"Trump exempts smartphones, laptops, and semico...",The Trump administration is carving out big ta...,The Trump administration is carving out big ta...,https://techcrunch.com/2025/04/12/trump-exempt...
1,TechCrunch,Rebecca Szkutak,US government imposes license requirement on N...,Nvidia's H20 was the most advanced AI chip the...,Semiconductor giant Nvidia is facing unexpecte...,https://techcrunch.com/2025/04/15/nvidia-h20-c...
2,TechCrunch,Kyle Wiggers,Nvidia says it plans to manufacture some AI ch...,Nvidia says that it has commissioned more than...,Nvidia says it has commissioned more than a mi...,https://techcrunch.com/2025/04/14/nvidia-says-...
3,TechCrunch,Ingrid Lunden,"Hammerspace, an unstructured data wrangler use...",Artificial intelligence services at their hear...,Artificial intelligence services at their hear...,https://techcrunch.com/2025/04/16/hammerspace-...
4,TechCrunch,Rebecca Szkutak,Here are the 19 US AI startups that have raise...,U.S.-based AI startups continue to rake in ven...,Last year was monumental for the AI industry i...,https://techcrunch.com/2025/04/23/here-are-the...
5,TechCrunch,Kyle Wiggers,AMD takes $800M charge on US license requireme...,AMD says that the U.S. government's license co...,AMD says that the U.S. government’s license co...,https://techcrunch.com/2025/04/16/amd-takes-80...
6,TechCrunch,Kyle Wiggers,A Chinese AI video startup appears to be block...,"A Chinese startup, Sand AI, appears to be bloc...","A China-based startup, Sand AI, has released a...",https://techcrunch.com/2025/04/22/a-chinese-ai...
7,TechCrunch,Karyne Levy,Week in Review: Google loses a major antitrust...,Welcome back to Week in Review! We've got tons...,Welcome back to Week in Review! We’ve got tons...,https://techcrunch.com/2025/04/19/week-in-revi...
8,TechCrunch,Kate Park,RLWRLD raises $14.8M to build a foundational m...,"As robotics has advanced, industry has steadil...","As robotics has advanced, industry has steadil...",https://techcrunch.com/2025/04/14/rlwrld-raise...
9,TechCrunch,Dominic-madori Davis,Here are all the tech companies rolling back D...,Companies around America have started cutting ...,Companies around America have started cutting ...,https://techcrunch.com/2025/04/17/here-are-all...


## Export Processed Data

Save the processed data to CSV and Excel formats for further analysis.

In [9]:
def export_data(df, csv_path, excel_path):
    """Export dataframe to CSV and Excel formats."""
    try:
        # Export to CSV
        df.to_csv(csv_path, index=False, encoding='utf-8')
        
        # Export to Excel
        df.to_excel(excel_path, index=False)
        
        return True
    except Exception as e:
        print(f"Error exporting data: {e}")
        return False

# Export the data
if export_data(df_selected, CSV_OUTPUT, EXCEL_OUTPUT):
    print(f"Data successfully exported to:")
    print(f"- CSV: {CSV_OUTPUT}")
    print(f"- Excel: {EXCEL_OUTPUT}")

Data successfully exported to:
- CSV: ../data/processed/news_articles.csv
- Excel: ../data/processed/news_articles.xlsx


## Conclusion

This notebook has successfully:
1. Loaded raw news data from a JSON file
2. Processed and normalized the data into a structured format
3. Performed basic analysis on the dataset
4. Exported the processed data to CSV and Excel formats for further use

The processed data includes 10 TechCrunch articles from April 2025, with 2 directly related to Nvidia. The data can now be used for further analysis, sentiment analysis, or as part of a larger dataset for tech news tracking.