# JSON Basics for Data Science

Welcome to this hands-on tutorial on JSON (JavaScript Object Notation)! JSON is a lightweight data format that's widely used for data exchange and storage in data science projects.

## Learning Objectives
By the end of this notebook, you will:
- Create and manipulate JSON objects in Python
- Work with nested JSON data structures
- Convert JSON data to Pandas DataFrames
- Save and load JSON files

Let's get started!

## Part 1: Creating Simple JSON Objects

In Python, JSON objects are represented as dictionaries. Let's create a simple JSON object representing a person.

In [33]:
import json

# Create a simple JSON object
person = {
    "name": "Alice Johnson",
    "age": 28,
    "city": "New York",
    "is_student": False
}

print("Person object:")
print(json.dumps(person, indent=2))

Person object:
{
  "name": "Alice Johnson",
  "age": 28,
  "city": "New York",
  "is_student": false
}


### TODO 1: Create Your Own JSON Object

Create a JSON object representing a book with the following properties:
- title
- author
- year_published
- pages
- is_available

In [34]:
# TODO: Create your book JSON object here
book = {
    # Add your properties here
}

print("Your book object:")
print(json.dumps(book, indent=2))

Your book object:
{}


## Part 2: Working with Nested JSON Data

Real-world JSON data often contains nested structures - objects within objects, and arrays (lists) of objects.

In [35]:
# Create nested JSON data representing a company
company = {
    "name": "TechCorp",
    "founded": 2010,
    "headquarters": {
        "city": "San Francisco",
        "state": "CA",
        "address": "123 Tech Street"
    },
    "employees": [
        {
            "name": "Alice Johnson",
            "role": "Data Scientist",
            "years": 3
        },
        {
            "name": "Bob Smith",
            "role": "Engineer",
            "years": 5
        },
        {
            "name": "Carol White",
            "role": "Designer",
            "years": 2
        }
    ]
}

print("Company data:")
print(json.dumps(company, indent=2))

# Accessing nested data
print(f"\nHeadquarters city: {company['headquarters']['city']}")
print(f"First employee name: {company['employees'][0]['name']}")

Company data:
{
  "name": "TechCorp",
  "founded": 2010,
  "headquarters": {
    "city": "San Francisco",
    "state": "CA",
    "address": "123 Tech Street"
  },
  "employees": [
    {
      "name": "Alice Johnson",
      "role": "Data Scientist",
      "years": 3
    },
    {
      "name": "Bob Smith",
      "role": "Engineer",
      "years": 5
    },
    {
      "name": "Carol White",
      "role": "Designer",
      "years": 2
    }
  ]
}

Headquarters city: San Francisco
First employee name: Alice Johnson


### TODO 2: Create Nested JSON Structure

Create a JSON object representing a school with:
- name
- location (nested object with city and state)
- students (list of at least 3 student objects, each with name, grade, and age)

In [36]:
# TODO: Create your school JSON object here
school = {
    # Add your structure here
}

print("Your school data:")
print(json.dumps(school, indent=2))

Your school data:
{}


## Part 3: Converting JSON to Pandas DataFrames

Pandas is a powerful data analysis library. We can easily convert JSON data into DataFrames for analysis.

In [37]:
import pandas as pd

# Convert the employees list to a DataFrame
employees_df = pd.DataFrame(company['employees'])

print("Employees DataFrame:")
print(employees_df)

# Calculate some statistics
print(f"\nAverage years of experience: {employees_df['years'].mean():.1f}")
print(f"Total employees: {len(employees_df)}")

Employees DataFrame:
            name            role  years
0  Alice Johnson  Data Scientist      3
1      Bob Smith        Engineer      5
2    Carol White        Designer      2

Average years of experience: 3.3
Total employees: 3


### TODO 3: Convert Your Data to DataFrame

Convert your school's students list to a Pandas DataFrame and calculate:
- The average age of students
- The total number of students

In [38]:
# TODO: Convert students to DataFrame and calculate statistics
# students_df = pd.DataFrame(...)


## Part 4: Saving and Loading JSON Files

Let's learn how to save JSON data to files and load it back.

In [39]:
# Save JSON to a file
with open('output/02_company_data.json', 'w') as f:
    json.dump(company, f, indent=2)

print("Data saved to output/02_company_data.json")

# Load JSON from a file
with open('output/02_company_data.json', 'r') as f:
    loaded_company = json.load(f)

print("\nLoaded data:")
print(f"Company name: {loaded_company['name']}")
print(f"Number of employees: {len(loaded_company['employees'])}")

Data saved to output/02_company_data.json

Loaded data:
Company name: TechCorp
Number of employees: 3


### TODO 4: Save and Load Your School Data

1. Save your school object to a file called 'school_data.json'
2. Load it back and verify it loaded correctly by printing the school name and number of students

In [40]:
# TODO: Save your school data

# TODO: Load it back and verify


## Part 5: Working with JSON Strings

Sometimes we receive JSON data as strings (e.g., from APIs). Let's practice converting between JSON strings and Python objects.

In [41]:
# Convert Python object to JSON string
json_string = json.dumps(person)
print("JSON string:")
print(json_string)
print(f"Type: {type(json_string)}")

# Convert JSON string back to Python object
person_from_string = json.loads(json_string)
print("\nPython object:")
print(person_from_string)
print(f"Type: {type(person_from_string)}")

JSON string:
{"name": "Alice Johnson", "age": 28, "city": "New York", "is_student": false}
Type: <class 'str'>

Python object:
{'name': 'Alice Johnson', 'age': 28, 'city': 'New York', 'is_student': False}
Type: <class 'dict'>


## Challenge: Real-World Data

Create a JSON structure representing a weather forecast for 5 days. Each day should include:
- date
- high temperature
- low temperature
- conditions (e.g., "sunny", "rainy")
- humidity percentage

Then:
1. Convert it to a DataFrame
2. Calculate the average high and low temperatures
3. Save it to a file
4. Create a simple visualization (you can share this output!)

In [42]:
# TODO: Create your weather forecast
weather_forecast = {
    "location": "Your City",
    "forecast": [
        # Add your 5-day forecast here
    ]
}

# Convert to DataFrame
forecast_df = pd.DataFrame(weather_forecast['forecast'])
print(forecast_df)

# Calculate averages
print(f"\nAverage high: {forecast_df['high'].mean():.1f}°F")
print(f"Average low: {forecast_df['low'].mean():.1f}°F")

# Save to file
with open('weather_forecast.json', 'w') as f:
    json.dump(weather_forecast, f, indent=2)

# Simple visualization
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.plot(forecast_df['date'], forecast_df['high'], marker='o', label='High', color='red')
plt.plot(forecast_df['date'], forecast_df['low'], marker='o', label='Low', color='blue')
plt.xlabel('Date')
plt.ylabel('Temperature (°F)')
plt.title(f"5-Day Weather Forecast - {weather_forecast['location']}")
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Empty DataFrame
Columns: []
Index: []


KeyError: 'high'

## Summary

Congratulations! You've learned:
- ✅ How to create JSON objects in Python
- ✅ How to work with nested JSON structures
- ✅ How to convert JSON to Pandas DataFrames
- ✅ How to save and load JSON files
- ✅ How to work with JSON strings

JSON is a fundamental skill for data science. You'll use it frequently when:
- Working with APIs
- Storing configuration data
- Exchanging data between systems
- Archiving structured data

**Share your outputs!** Take screenshots of your DataFrames and visualizations to show what you've learned.