# Working with Data

In this section, we'll learn how to work with data in Python, including reading and writing files, and handling different data formats.

## Reading and Writing Text Files

Python makes it easy to read from and write to files. The basic steps are:
1. Open the file using the `open()` function
2. Read from or write to the file
3. Close the file when you're done

### Writing to a Text File

In [None]:
# Writing to a file
file = open("sample.txt", "w")  # Open in write mode ('w')
file.write("Hello, World!\n")    # Write a line of text
file.write("This is a sample file.\n")  # Write another line
file.close()  # Close the file

print("File has been written.")

### Reading from a Text File

In [None]:
# Reading from a file
file = open("sample.txt", "r")  # Open in read mode ('r')
content = file.read()  # Read the entire file
file.close()  # Close the file

print("File content:")
print(content)

### Reading a File Line by Line

In [None]:
# Reading a file line by line
file = open("sample.txt", "r")
print("Reading line by line:")
for line in file:
    print(f"Line: {line.strip()}")  # strip() removes the newline character
file.close()

### Using `with` Statement (Context Manager)

A better way to work with files is to use the `with` statement, which automatically closes the file when you're done with it, even if an error occurs.

In [None]:
# Using the 'with' statement
with open("sample.txt", "r") as file:
    content = file.read()
    print("Content read using 'with' statement:")
    print(content)
# File is automatically closed when the 'with' block ends

### Appending to a File

If you want to add content to an existing file without overwriting it, you can open the file in append mode ('a').

In [None]:
# Appending to a file
with open("sample.txt", "a") as file:  # Open in append mode ('a')
    file.write("This line is appended.\n")
    file.write("Another appended line.\n")

# Read the updated file
with open("sample.txt", "r") as file:
    content = file.read()
    print("Updated file content:")
    print(content)

## Working with CSV Files

CSV (Comma-Separated Values) is a common format for storing tabular data. Python's `csv` module makes it easy to read and write CSV files.

In [None]:
import csv

# Writing to a CSV file
data = [
    ["Name", "Age", "City"],  # Header row
    ["Alice", 30, "New York"],
    ["Bob", 25, "Los Angeles"],
    ["Charlie", 35, "Chicago"]
]

with open("people.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)  # Write all rows at once

print("CSV file has been written.")

In [None]:
# Reading from a CSV file
with open("people.csv", "r") as file:
    reader = csv.reader(file)
    print("CSV file content:")
    for row in reader:
        print(row)

### Using CSV with Dictionaries

The `csv` module also provides a `DictReader` and `DictWriter` for working with CSV files using dictionaries, which can be more convenient.

In [None]:
# Writing to a CSV file using dictionaries
data = [
    {"Name": "Alice", "Age": 30, "City": "New York"},
    {"Name": "Bob", "Age": 25, "City": "Los Angeles"},
    {"Name": "Charlie", "Age": 35, "City": "Chicago"}
]

with open("people_dict.csv", "w", newline="") as file:
    fieldnames = ["Name", "Age", "City"]
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    writer.writeheader()  # Write the header row
    writer.writerows(data)  # Write all rows at once

print("CSV file has been written using DictWriter.")

In [None]:
# Reading from a CSV file using dictionaries
with open("people_dict.csv", "r") as file:
    reader = csv.DictReader(file)
    print("CSV file content using DictReader:")
    for row in reader:
        print(row)

## Working with JSON

JSON (JavaScript Object Notation) is a popular data format used for data exchange. Python's `json` module makes it easy to work with JSON data.

In [1]:
import json

# Creating a Python dictionary
person = {
    "name": "Alice",
    "age": 30,
    "city": "New York",
    "is_student": False,
    "courses": ["Python", "Data Science", "Machine Learning"],
    "address": {
        "street": "123 Main St",
        "zip": "10001"
    }
}

# Converting Python dictionary to JSON string
json_string = json.dumps(person, indent=4)  # indent for pretty printing
print("JSON string:")
print(json_string)

JSON string:
{
    "name": "Alice",
    "age": 30,
    "city": "New York",
    "is_student": false,
    "courses": [
        "Python",
        "Data Science",
        "Machine Learning"
    ],
    "address": {
        "street": "123 Main St",
        "zip": "10001"
    }
}


In [None]:
# Writing JSON to a file
with open("person.json", "w") as file:
    json.dump(person, file, indent=4)

print("JSON file has been written.")

In [None]:
# Reading JSON from a file
with open("person.json", "r") as file:
    loaded_person = json.load(file)

print("Loaded from JSON file:")
print(loaded_person)

# Accessing values in the loaded dictionary
print(f"Name: {loaded_person['name']}")
print(f"Age: {loaded_person['age']}")
print(f"Courses: {', '.join(loaded_person['courses'])}")
print(f"Street: {loaded_person['address']['street']}")

## Error Handling

When working with files and data, it's important to handle potential errors, such as file not found or invalid data format.

In [None]:
# Error handling when opening a file
try:
    with open("nonexistent_file.txt", "r") as file:
        content = file.read()
except FileNotFoundError:
    print("Error: The file does not exist.")
except PermissionError:
    print("Error: You don't have permission to access this file.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

In [None]:
# Error handling when parsing JSON
invalid_json = "{'name': 'Alice', 'age': 30}"  # Invalid JSON (single quotes instead of double quotes)

try:
    parsed_data = json.loads(invalid_json)
    print(parsed_data)
except json.JSONDecodeError as e:
    print(f"Error parsing JSON: {e}")

## Real-World Example: Data Analysis

Let's put everything together in a real-world example where we read data from a CSV file, process it, and save the results.

In [None]:
# First, let's create a sample sales data CSV file
sales_data = [
    ["Date", "Product", "Price", "Quantity"],
    ["2023-01-15", "Laptop", 1200, 5],
    ["2023-01-15", "Mouse", 25, 10],
    ["2023-01-16", "Keyboard", 45, 8],
    ["2023-01-16", "Monitor", 150, 3],
    ["2023-01-17", "Laptop", 1200, 2],
    ["2023-01-17", "Headphones", 80, 7],
    ["2023-01-18", "Mouse", 25, 15],
    ["2023-01-18", "Keyboard", 45, 5]
]

with open("sales.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(sales_data)

print("Sales data CSV file has been created.")

In [None]:
# Now, let's analyze the sales data
def analyze_sales_data(filename):
    """
    Analyze sales data from a CSV file.
    
    Parameters:
    filename (str): The name of the CSV file
    
    Returns:
    dict: A dictionary containing the analysis results
    """
    try:
        # Read the sales data
        sales = []
        with open(filename, "r") as file:
            reader = csv.reader(file)
            header = next(reader)  # Skip the header row
            for row in reader:
                date, product, price, quantity = row
                sales.append({
                    "date": date,
                    "product": product,
                    "price": float(price),
                    "quantity": int(quantity),
                    "total": float(price) * int(quantity)
                })
        
        # Calculate total sales
        total_sales = sum(item["total"] for item in sales)
        
        # Calculate sales by product
        sales_by_product = {}
        for item in sales:
            product = item["product"]
            if product not in sales_by_product:
                sales_by_product[product] = 0
            sales_by_product[product] += item["total"]
        
        # Calculate sales by date
        sales_by_date = {}
        for item in sales:
            date = item["date"]
            if date not in sales_by_date:
                sales_by_date[date] = 0
            sales_by_date[date] += item["total"]
        
        # Find the best-selling product
        best_selling_product = max(sales_by_product.items(), key=lambda x: x[1])
        
        # Find the day with the highest sales
        best_sales_day = max(sales_by_date.items(), key=lambda x: x[1])
        
        # Return the analysis results
        return {
            "total_sales": total_sales,
            "sales_by_product": sales_by_product,
            "sales_by_date": sales_by_date,
            "best_selling_product": best_selling_product,
            "best_sales_day": best_sales_day
        }
    
    except FileNotFoundError:
        print(f"Error: The file '{filename}' does not exist.")
        return None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# Analyze the sales data
results = analyze_sales_data("sales.csv")

if results:
    print(f"Total Sales: ${results['total_sales']:.2f}")
    
    print("\nSales by Product:")
    for product, amount in results['sales_by_product'].items():
        print(f"{product}: ${amount:.2f}")
    
    print("\nSales by Date:")
    for date, amount in results['sales_by_date'].items():
        print(f"{date}: ${amount:.2f}")
    
    best_product, best_product_sales = results['best_selling_product']
    print(f"\nBest-selling product: {best_product} (${best_product_sales:.2f})")
    
    best_day, best_day_sales = results['best_sales_day']
    print(f"Best sales day: {best_day} (${best_day_sales:.2f})")
    
    # Save the results to a JSON file
    with open("sales_analysis.json", "w") as file:
        # Convert the results to a format that can be serialized to JSON
        serializable_results = {
            "total_sales": results["total_sales"],
            "sales_by_product": results["sales_by_product"],
            "sales_by_date": results["sales_by_date"],
            "best_selling_product": {
                "product": results["best_selling_product"][0],
                "sales": results["best_selling_product"][1]
            },
            "best_sales_day": {
                "date": results["best_sales_day"][0],
                "sales": results["best_sales_day"][1]
            }
        }
        json.dump(serializable_results, file, indent=4)
    
    print("\nAnalysis results have been saved to 'sales_analysis.json'.")

## Practice Exercise

Let's practice what we've learned with an exercise:

### Exercise: Student Grade Tracker

Create a program that reads student grades from a CSV file, calculates statistics (average, highest, lowest), and saves the results to a JSON file. The CSV file should have the following format:

```
Name,Math,Science,English,History
Alice,85,92,78,88
Bob,90,85,92,95
Charlie,78,80,85,81
```

In [None]:
# First, let's create the sample grades CSV file
grades_data = [
    ["Name", "Math", "Science", "English", "History"],
    ["Alice", 85, 92, 78, 88],
    ["Bob", 90, 85, 92, 95],
    ["Charlie", 78, 80, 85, 81],
    ["Diana", 95, 88, 92, 85],
    ["Evan", 82, 79, 88, 90]
]

with open("student_grades.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(grades_data)

print("Student grades CSV file has been created.")

In [None]:
# Your code here
