# JSON Handling in Python for Translation & Localization

Welcome to this **beginner-friendly** notebook on **JSON** handling in Python! We’ll cover:
1. Basic concepts of JSON.
2. Reading & parsing JSON using the built-in `json` library.
3. Common operations: accessing, modifying, and splitting JSON data.
4. **Hands-on exercises** with placeholders.
5. An **AI prompt** example to generate code automatically.
6. **Advanced Handling** with JSONPath or nested data structures—some cool stuff!

JSON is often used in translation/localization workflows for storing key-value pairs of text (e.g., UI strings, configuration, or multi-language resources). Let's dive in!

## 1. Introduction to JSON

- **JSON (JavaScript Object Notation)** is a lightweight format for storing and transporting data.
- It’s based on **key-value pairs** and **arrays**, making it easy to parse and generate.
- Example:
```json
{
  "segment": "Hello",
  "lang": "en"
}
```
Here, `segment` is a key, and `"Hello"` is its value. Similarly, `lang` is a key with value `"en"`.

JSON is commonly used for **configuration** files, **web APIs**, and **localization** (like storing translations by language).

## 2. Reading & Parsing JSON

Python’s built-in `json` module allows you to **load** JSON from a file (or string) into a Python object, typically **dictionaries** and **lists**.

In [None]:
# Basic example of reading JSON
import json

with open('example.json', 'r', encoding='utf-8') as f:
    data = json.load(f)  # data is now a Python dict/list structure

print(type(data))
print(data)

### Anatomy of the Code
- `json.load(f)`: Reads the file-like object `f` and **deserializes** the JSON into Python objects.
- Typically, you’ll get a **dict** (for JSON objects) or a **list** (for JSON arrays).

## 3. Accessing & Modifying JSON Data
Once loaded, JSON data in Python behaves like normal dictionaries/lists, so you can use familiar operations to **access**, **modify**, or **delete** keys.

In [None]:
# Example: Suppose data is a dict with a 'segment' key and a 'lang' key.
# We'll print them, modify them, then store them back to a file.

print("Segment:", data.get('segment'))  # e.g. 'Hello'
print("Language:", data.get('lang'))    # e.g. 'en'

# Modify the language
data['lang'] = 'de'
print("Updated Language:", data['lang'])

# Save changes back to a file
with open('modified_example.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

print("JSON saved with updated language.")

### Real-Life Example
- If you have a JSON file with **UI strings** (e.g., `{"buttons":{"save":"Save","cancel":"Cancel"}}`), you can load it, adjust certain text, and save it back.
- This is helpful for **translation** or **localization**—especially if each key is a language code or if nested structures contain language variants.

## 4. Splitting & Restructuring JSON for Translation
If you have **multiple languages** in a single JSON, you might want to **split** them into separate files, or **extract** text for only one language.

### Example: Multi-language JSON
```json
{
  "greetings": {
    "en": "Hello",
    "de": "Hallo",
    "fr": "Bonjour"
  },
  "farewells": {
    "en": "Goodbye",
    "de": "Tschüss",
    "fr": "Au revoir"
  }
}
```

In [None]:
# Example code to split each language into separate JSON files.
import json

with open('multi_lang.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Let's gather all languages by checking the keys in each category
languages = set()
for category, translations in data.items():
    # 'translations' is a dict, e.g.: {"en": "Hello", "de": "Hallo", "fr": "Bonjour"}
    for lang_code in translations.keys():
        languages.add(lang_code)

print("Languages found:", languages)

# Now build separate dicts for each language
lang_dicts = {lang: {} for lang in languages}

for category, translations in data.items():
    for lang_code, text_value in translations.items():
        if category not in lang_dicts[lang_code]:
            lang_dicts[lang_code][category] = text_value
        else:
            # In case we want to merge or handle duplicates
            lang_dicts[lang_code][category] = text_value

# Write out each language file
for lang_code, content in lang_dicts.items():
    filename = f'{lang_code}_data.json'
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(content, f, ensure_ascii=False, indent=2)
    print(f"Wrote {filename} for language '{lang_code}'")

### Real-Life Example
- If your JSON includes keys for multiple languages (e.g., `"en": "Save", "de": "Speichern"`), you can **split** them into `en_data.json`, `de_data.json`, etc.
- Translators might only need the English text, so you can send them a file with just the English keys.

## 5. Hands-On Exercises

**Goal**: Practice reading JSON, extracting info, and modifying it.

### Exercise #1: Inspect & Modify
1. Create a file named `my_example.json` with content like:
```json
{
  "segment": "Hello",
  "lang": "en",
  "note": "Sample text"
}
```
2. **Parse** the file with the `json` module.
3. Print out each **key** and **value**.
4. Set a new key `"status"` with the value `"review"`.
5. Change the value of the `"segment"` to `"Hi there"`.
6. **Save** to a new file `my_example_modified.json`.

In [None]:
# EXERCISE #1 (POSSIBLE SOLUTION SKELETON)
import json

with open('my_example.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# 1) Print out key-value pairs
for key, value in data.items():
    print(f"{key}: {value}")

# 2) Set a new key 'status' = 'review'
data['status'] = 'review'

# 3) Change 'segment' to "Hi there"
data['segment'] = "Hi there"

# 4) Save
with open('my_example_modified.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=2)
print("Exercise #1 done! Check 'my_example_modified.json'.")

### Exercise #2: Splitting by Language Keys
1. Create `my_multilang.json` with multiple categories (e.g., `"buttons"`, `"labels"`) and within each, different languages (`"en"`, `"de"`, `"fr"`, etc.).
2. Parse it.
3. Group text by **language code**.
4. Write each language group to a separate JSON file (`en_data.json`, `de_data.json`, etc.).
5. **Hint**: Use a dictionary to collect keys and values for each language code.

In [None]:
# EXERCISE #2 (POSSIBLE SOLUTION OUTLINE)
import json

with open('my_multilang.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

languages_found = set()
for category, translations in data.items():
    for lang_code in translations.keys():
        languages_found.add(lang_code)

lang_dicts = {lang: {} for lang in languages_found}

for category, translations in data.items():
    for lang_code, text_value in translations.items():
        if category not in lang_dicts[lang_code]:
            lang_dicts[lang_code][category] = text_value
        else:
            lang_dicts[lang_code][category] = text_value

for lang_code, content in lang_dicts.items():
    filename = f'{lang_code}_data.json'
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(content, f, ensure_ascii=False, indent=2)
    print(f"Wrote {filename} for language '{lang_code}'")

## 6. Using AI to Generate Similar Logic
Now that you’ve learned how to manually parse and modify JSON, let’s see how an **AI** tool might help. Below is a **prompt** you could paste into ChatGPT or GitHub Copilot, followed by a possible AI-generated code snippet.

### AI Prompt (Comment)
```
# Generate Python code using the json library to:
# 1. Parse 'my_multilang.json'.
# 2. Print each top-level key and its subkeys.
# 3. Add a new key 'status' = 'pending' for each top-level object.
# 4. Save the modified JSON to 'ai_modified.json'.
```

_Below is an example of what the AI might produce._

In [None]:
# (Example) AI-Generated Implementation
import json

def ai_modify_json():
    with open('my_multilang.json', 'r', encoding='utf-8') as f:
        data = json.load(f)

    for category, translations in data.items():
        print(f"Category: {category}")
        print("Subkeys:", list(translations.keys()))
        # Add a new key 'status'
        translations['status'] = 'pending'

    # Save back
    with open('ai_modified.json', 'w', encoding='utf-8') as out:
        json.dump(data, out, ensure_ascii=False, indent=2)

    print("AI-based modification complete! Check 'ai_modified.json'.")

# Let's just call the function for demonstration
ai_modify_json()

## 7. Advanced Handling with JSONPath or Nested Data Structures

While Python’s `json` module is great for basic read/write, sometimes you need more powerful **search** or **manipulation** of deeply nested structures. This is where something like **[JSONPath](https://pypi.org/project/jsonpath-ng/)** can help.

### 7.1 Installing JSONPath-NG
```bash
pip install jsonpath-ng
```

### 7.2 Example with JSONPath
```python
from jsonpath_ng import parse
import json

data = {
    "segments": [
        {"lang": "en", "text": "Hello"},
        {"lang": "de", "text": "Hallo"},
        {"lang": "fr", "text": "Bonjour"}
    ]
}

expression = parse("$.segments[?(@.lang == 'en')].text")
matches = expression.find(data)

for match in matches:
    print("EN text found:", match.value)

# You could then update that text:
for match in matches:
    # match.context is the location in the data structure
    match.context.value['text'] = "Hi there"  # replace 'Hello' with 'Hi there'

print(data)  # see the updated dictionary
```

JSONPath allows queries like:
- `$.segments[*]` to find all items in `segments` array.
- `$.segments[?(@.lang == 'en')]` to find all segments where `lang` is `'en'`.

This can be very helpful if you have **deeply nested** or **complex** JSON structures for localization.

## 8. Summary & Next Steps
You now have:
1. A **basic understanding** of JSON structure and Python’s `json` module.
2. **Hands-on** experience parsing, modifying, splitting, and merging JSON data.
3. An introduction to how **AI** can auto-generate similar code.
4. A glimpse of **advanced JSONPath** usage, which can be extremely powerful for large or complex data.

**Next**:
- Dive deeper into **JSONPath** or other libraries if your projects require complex queries or transformations.
- Integrate these scripts with your **translation pipeline** to handle real-world, large-scale JSON documents.
- Learn about **schema validation** (e.g., using `jsonschema`) if you need to ensure the JSON structure is correct.

Happy JSON Handling!