# Import data from JSON files

Load structured data from JSON files into Pixeltable tables for processing and analysis.


## Problem

You have data in JSON format—from APIs, exports, or application logs. You need to load this data for processing with AI models or combining with other data sources.

| Source | Records | Use case |
|--------|---------|----------|
| api_response.json | 1,000 | Analyze API data |
| user_events.json | 50,000 | Process event logs |
| products.json | 500 | Enrich with AI descriptions |


## Solution

**What's in this recipe:**
- Import JSON files directly into tables
- Import from URLs (APIs, remote files)
- Handle nested JSON structures

You use `pxt.io.import_json()` to create a table from a JSON file or URL. The JSON must be an array of objects, where each object becomes a row.


### Setup


In [None]:
%pip install -qU pixeltable


In [None]:
import pixeltable as pxt
import json
import tempfile
from pathlib import Path


### Create sample JSON file

First, create a sample JSON file to demonstrate the import process:


In [None]:
# Create sample JSON data (array of objects)
sample_data = [
    {'id': 1, 'title': 'Introduction to ML', 'author': 'Alice', 'tags': ['ml', 'intro'], 'rating': 4.5},
    {'id': 2, 'title': 'Deep Learning Basics', 'author': 'Bob', 'tags': ['dl', 'neural'], 'rating': 4.8},
    {'id': 3, 'title': 'NLP Fundamentals', 'author': 'Carol', 'tags': ['nlp', 'text'], 'rating': 4.2},
    {'id': 4, 'title': 'Computer Vision', 'author': 'Dave', 'tags': ['cv', 'images'], 'rating': 4.6},
    {'id': 5, 'title': 'Reinforcement Learning', 'author': 'Eve', 'tags': ['rl', 'agents'], 'rating': 4.3}
]

# Save to temporary JSON file
temp_dir = tempfile.mkdtemp()
json_path = Path(temp_dir) / 'articles.json'

with open(json_path, 'w') as f:
    json.dump(sample_data, f, indent=2)

print(f'Created: {json_path}')
print(json.dumps(sample_data[:2], indent=2))


### Import JSON file

Use `import_json` to create a table directly from a JSON file:


In [None]:
# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')


In [None]:
# Import JSON file into a new table
articles = pxt.io.import_json(
    'json_demo.articles',
    filepath_or_url=str(json_path)
)


In [None]:
# View imported data
articles.collect()


### Import from URL

You can import JSON directly from a URL—useful for APIs and remote data:


In [None]:
# Import from a public JSON URL
# Using JSONPlaceholder API as an example
posts = pxt.io.import_json(
    'json_demo.posts',
    filepath_or_url='https://jsonplaceholder.typicode.com/posts'
)


In [None]:
# View first few rows
posts.head(5)


### Import from Python dictionaries

Use `import_rows` to import data from Python dictionaries directly—useful when you have data in memory:


In [None]:
# Import from a list of dictionaries
events = [
    {'event': 'page_view', 'user_id': 101, 'timestamp': '2024-01-15T10:30:00'},
    {'event': 'click', 'user_id': 101, 'timestamp': '2024-01-15T10:31:00'},
    {'event': 'purchase', 'user_id': 102, 'timestamp': '2024-01-15T10:32:00'},
]

event_table = pxt.io.import_rows('json_demo.events', events)


In [None]:
# View imported events
event_table.collect()


### Add computed columns

Once imported, you can enrich the data with computed columns:


In [None]:
# Add a computed column combining title and author
articles.add_computed_column(
    summary=articles.title + ' by ' + articles.author
)


In [None]:
# View with computed column
articles.select(articles.title, articles.author, articles.summary).collect()


## Explanation

**JSON format requirements:**

The JSON file must contain an array of objects at the top level:

```json
[
  {"col1": "value1", "col2": 123},
  {"col1": "value2", "col2": 456}
]
```

**When to use each method:**

| Method | Use case |
|--------|----------|
| `import_json(file)` | JSON files on disk |
| `import_json(url)` | Remote JSON APIs |
| `import_rows(list)` | Data already in Python |

**Nested JSON handling:**

Nested objects and arrays are stored as JSON columns. You can access nested fields using Pixeltable's JSON path syntax in computed columns.


## See also

- [Import CSV files](./data-import-csv.ipynb) - For CSV and Excel imports
- [Import Parquet files](./data-import-parquet.ipynb) - For Parquet data
- [Extract fields from JSON](./workflow-json-extraction.ipynb) - Parse LLM response fields
