# Reading CSV files
### Using the csv Module
The csv module is part of Python's standard library and provides functionality to read and write CSV files.

Basic Example:

In [2]:
import csv

# Open the CSV file
with open('data/data.csv', mode='r') as file:
    reader = csv.reader(file)
    # Iterate through rows
    for row in reader:
        print(row)

['User ID', 'User Name', 'Age', 'Location', 'Registration Date', 'Phone', 'Email', 'Favorite Meal', 'Total Orders']
['U001', 'Alice Johnson', '28', 'New York', '2023-01-15', '123-456-7890', 'alice@email.com', 'Dinner', '12']
['U002', 'Bob Smith', '35', 'Los Angeles', '2023-02-20', '987-654-3210', 'bob@email.com', 'Lunch', '8']
['U003', 'Charlie Lee', '42', 'Chicago', '2023-03-10', '555-123-4567', 'charlie@email.com', 'Breakfast', '15']
['U004', 'David Brown', '27', 'San Francisco', '2023-04-05', '444-333-2222', 'david@email.com', 'Dinner', '10']
['U005', 'Emma White', '30', 'Seattle', '2023-05-22', '777-888-9999', 'emma@email.com', 'Lunch', '9']
['U006', 'Frank Green', '25', 'Austin', '2023-06-15', '888-777-6666', 'frank@email.com', 'Dinner', '7']
['U007', 'Grace King', '38', 'Boston', '2023-07-02', '999-888-7777', 'grace@email.com', 'Breakfast', '14']
['U008', 'Henry Lee', '31', 'Miami', '2023-08-11', '101-202-3030', 'henry@email.com', 'Dinner', '5']
['U009', 'Irene Moore', '33', 'Dal

##### Features of csv.reader:
- Reads rows as lists.
- Handles basic CSV formats but requires manual handling for data types.
##### Reading with a Header:
If your CSV file has a header row, you can use csv.DictReader to read rows as dictionaries.

In [4]:
import csv

# Open the CSV file
with open('data/data.csv', mode='r') as file:
    reader = csv.DictReader(file)
    # Iterate through rows
    for row in reader:
        print(row)  # Row is a dictionary with column headers as keys

{'User ID': 'U001', 'User Name': 'Alice Johnson', 'Age': '28', 'Location': 'New York', 'Registration Date': '2023-01-15', 'Phone': '123-456-7890', 'Email': 'alice@email.com', 'Favorite Meal': 'Dinner', 'Total Orders': '12'}
{'User ID': 'U002', 'User Name': 'Bob Smith', 'Age': '35', 'Location': 'Los Angeles', 'Registration Date': '2023-02-20', 'Phone': '987-654-3210', 'Email': 'bob@email.com', 'Favorite Meal': 'Lunch', 'Total Orders': '8'}
{'User ID': 'U003', 'User Name': 'Charlie Lee', 'Age': '42', 'Location': 'Chicago', 'Registration Date': '2023-03-10', 'Phone': '555-123-4567', 'Email': 'charlie@email.com', 'Favorite Meal': 'Breakfast', 'Total Orders': '15'}
{'User ID': 'U004', 'User Name': 'David Brown', 'Age': '27', 'Location': 'San Francisco', 'Registration Date': '2023-04-05', 'Phone': '444-333-2222', 'Email': 'david@email.com', 'Favorite Meal': 'Dinner', 'Total Orders': '10'}
{'User ID': 'U005', 'User Name': 'Emma White', 'Age': '30', 'Location': 'Seattle', 'Registration Date': 

### Using the pandas Library
pandas is a powerful library for data manipulation and analysis. It provides the read_csv() function to read CSV files into a DataFrame.

##### Features of pandas.read_csv:
- Automatically handles headers and data types.
- Allows filtering, slicing, and data manipulation.
- Handles missing values and large datasets efficiently.

##### Additional Options:
- Specify Delimiter: delimiter=',' (for non-comma-separated values, e.g., tab-delimited files).
- Skip Rows: skiprows=3 (skip the first 3 rows).
- Select Columns: usecols=['col1', 'col2'].
- Handle Missing Data: na_values=['NA', '?'].

Basic Example:

In [6]:
import pandas as pd

# Read the CSV file
df = pd.read_csv('data/data.csv')

# Display the DataFrame
print(df)

  User ID      User Name  Age       Location Registration Date         Phone  \
0    U001  Alice Johnson   28       New York        2023-01-15  123-456-7890   
1    U002      Bob Smith   35    Los Angeles        2023-02-20  987-654-3210   
2    U003    Charlie Lee   42        Chicago        2023-03-10  555-123-4567   
3    U004    David Brown   27  San Francisco        2023-04-05  444-333-2222   
4    U005     Emma White   30        Seattle        2023-05-22  777-888-9999   
5    U006    Frank Green   25         Austin        2023-06-15  888-777-6666   
6    U007     Grace King   38         Boston        2023-07-02  999-888-7777   
7    U008      Henry Lee   31          Miami        2023-08-11  101-202-3030   
8    U009    Irene Moore   33         Dallas        2023-09-01  202-303-4040   
9    U010     Jack White   29        Phoenix        2023-10-10  303-404-5050   

               Email Favorite Meal  Total Orders  
0    alice@email.com        Dinner            12  
1      bob@email.

### Other Libraries
- NumPy: For numerical data in CSV files

`import numpy as np`

`data = np.loadtxt('data/data.csv', delimiter=',', skiprows=1)`

`print(data)`

- Dask: For handling very large CSV files.

`import dask.dataframe as dd`

`df = dd.read_csv('large_data.csv')`

`print(df.head())`

### Best Practices
1. Error Handling: Use try-except blocks to handle file errors

In [9]:
try:
    df = pd.read_csv('data.csv')
except FileNotFoundError:
    print("File not found.")

File not found.


2. Data Validation: Check the structure and content of the CSV file before processing.
3. Performance: Use libraries like pandas or dask for large files.

# Saving in Python data
1. Saving Text Data

You can save plain text data to files using Python's built-in file handling.

Modes:
- "w": Write (overwrites the file).
- "a": Append (adds to the existing file).
- "r": Read.


Example:

In [10]:
data = "Hello, this is a text file."
with open("example.txt", "w") as file:
    file.write(data)

2. Saving Structured Data

* CSV (Comma-Separated Values): Useful for tabular data.

Use the csv module.


In [20]:
import csv

data = [["Name", "Age"], ["Alice", 30], ["Bob", 25]]
with open("data.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

* JSON (JavaScript Object Notation): Suitable for hierarchical or nested data.

Use the json module.

In [18]:
import json

data = {"name": "Alice", "age": 30, "languages": ["English", "Spanish"]}
with open("data.json", "w") as file:
    json.dump(data, file)

### Saving Binary Data
- For non-text files like images, audio, or serialized Python objects, use binary mode ("wb").
- Pickle: Saves Python objects

In [16]:
import pickle

data = {"key": "value", "number": 42}
with open("data.pkl", "wb") as file:
    pickle.dump(data, file)

### Saving Data in Databases
- SQLite (built-in database):

Use the sqlite3 module to save and query structured data.


In [25]:
import sqlite3

connection = sqlite3.connect("example.db")
cursor = connection.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER, name TEXT)")
cursor.execute("INSERT INTO users (id, name) VALUES (?, ?)", (1, "Alice"))
connection.commit()
connection.close()

### Saving Data with Libraries
Pandas: For DataFrame manipulation and storage

In [15]:
import pandas as pd

data = pd.DataFrame({"Name": ["Alice", "Bob"], "Age": [30, 25]})
data.to_csv("data.csv", index=False)
data.to_excel("data.xlsx", index=False)

### Saving Large Data
HDF5: For large datasets (e.g., h5py or pandas with HDF5 format).
Parquet: Efficient for big data (e.g., with pyarrow or fastparquet).

## Choosing the Right Method
- Plain text: Logs, simple configurations.
- CSV/Excel: Tabular data.
- JSON: Nested or hierarchical data.
- Pickle: Python-specific objects (not human-readable).
- Database: When query and persistence are important.
- Binary formats: For performance and space efficiency.

# Loading Python data objects 
Loading Python data objects refers to the process of retrieving data that was previously saved or serialized into a file or other storage medium. This process is essential in scenarios where you need to work with data across multiple sessions or share data between systems.

### Tips for Loading Data
- File Format: Ensure the format of the saved file matches the method used to load it.
- Error Handling: Use try-except blocks to handle errors during loading (e.g., file not found or invalid format).
- Security: Avoid loading untrusted data with modules like pickle, as it may execute malicious code.

### Common Ways to Load Python Data Objects
1. Using pickle

The pickle module is used to serialize and deserialize Python objects.

In [17]:
with open("data.pkl", "rb") as file:
    loaded_data = pickle.load(file)
print(loaded_data)

{'key': 'value', 'number': 42}


2. Using json

The json module is used to work with JSON data, which is human-readable and language-independent.

In [19]:
with open("data.json", "r") as file:
    loaded_data = json.load(file)
print(loaded_data)

{'name': 'Alice', 'age': 30, 'languages': ['English', 'Spanish']}


3. Using csv

The csv module is used to handle tabular data in CSV (Comma-Separated Values) format.

In [21]:
with open("data.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

['Name', 'Age']
['Alice', '30']
['Bob', '25']


4. Using pandas for DataFrames

pandas is a powerful library for handling tabular data.

In [22]:
data = pd.read_csv("data.csv")
print(data)

    Name  Age
0  Alice   30
1    Bob   25


6. Using Databases

Python can interface with databases to store and retrieve structured data using libraries like sqlite3, SQLAlchemy, or others.

In [28]:
# Create and insert data
connection = sqlite3.connect("example.db")
cursor = connection.cursor()

# Load data
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

connection.close()

[(1, 'Alice')]
