# Additional information to Data Formats and API

Summary of options

## Data Serialization and Formats
Serialization is the process of converting data structures or objects into a format that can be easily stored and shared. Common data formats in Python include:
- **JSON**: Lightweight and commonly used for APIs.
- **CSV**: Ideal for tabular data, especially in spreadsheets.
- **YAML**: Human-readable and used for configurations.
- **Parquet**: Columnar storage format optimized for big data applications.


## JSON

- **Markdown:** Introduction to JSON, its structure, and its typical uses.
- **Code cell:** Reading and writing JSON.
- **Expected output**: Print statement showing the loaded JSON data.
    

In [22]:
import pandas as pd

In [23]:
!mkdir data

mkdir: data: File exists


In [1]:
import json
    
data = {
    "name": "Alice",
    "age": 30,
    "is_member": True,
    "hobbies": ["reading", "biking", "coding"]
}

# Serialize to JSON
with open('data/data.json', 'w') as f:
    json.dump(data, f)

# Deserialize JSON
with open('data/data.json', 'r') as f:
    data_loaded = json.load(f)

print("Loaded JSON data:", data_loaded)

Loaded JSON data: {'name': 'Alice', 'age': 30, 'is_member': True, 'hobbies': ['reading', 'biking', 'coding']}


In [5]:
pd.DataFrame(data_loaded)

Unnamed: 0,name,age,is_member,hobbies
0,Alice,30,True,reading
1,Alice,30,True,biking
2,Alice,30,True,coding


### CSV

- **Markdown:** Explanation of CSV format and use cases.
- **Code cell:** Writing data to a CSV file and reading from it.

In [8]:
import csv
    
# Sample data
data = [
    ["Name", "Age", "Occupation"],
    ["Alice", 30, "Engineer"],
    ["Bob", 25, "Designer"],
    ["Charlie", 35, "Teacher"]
]

# Write to CSV
with open('data/data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

# Read CSV
csv_data = []
with open('data/data.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
        csv_data.append(row)

['Name', 'Age', 'Occupation']
['Alice', '30', 'Engineer']
['Bob', '25', 'Designer']
['Charlie', '35', 'Teacher']


In [10]:
pd.DataFrame(csv_data)

Unnamed: 0,0,1,2
0,Name,Age,Occupation
1,Alice,30,Engineer
2,Bob,25,Designer
3,Charlie,35,Teacher


### YAML

- **Markdown:** Overview of YAML and its use in configurations.
- **Code cell:** Writing and reading YAML data.

In [3]:
# !pip3 install pyyaml

In [11]:
import yaml

data = {
    "name": "Alice",
    "age": 30,
    "is_member": True,
    "hobbies": ["reading", "biking", "coding"]
}

# Write YAML
with open('data/data.yaml', 'w') as f:
    yaml.dump(data, f)

# Read YAML
with open('data/data.yaml', 'r') as f:
    data_loaded = yaml.safe_load(f)

print("Loaded YAML data:", data_loaded)

Loaded YAML data: {'age': 30, 'hobbies': ['reading', 'biking', 'coding'], 'is_member': True, 'name': 'Alice'}


### Parquet

- **Markdown:** Introduction to Parquet, commonly used for large datasets and data processing.
- **Code cell:** Writing to and reading from a Parquet file using Pandas.

In [12]:
# !pip3 install parquet
# !pip3 install pyarrow

In [13]:
import pandas as pd

# Sample data
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [30, 25, 35],
    'Occupation': ['Engineer', 'Designer', 'Teacher']
})

# Write Parquet
df.to_parquet('data/data.parquet')

# Read Parquet
df_loaded = pd.read_parquet('data/data.parquet')
print("Loaded Parquet data:")
print(df_loaded)

Loaded Parquet data:
      Name  Age Occupation
0    Alice   30   Engineer
1      Bob   25   Designer
2  Charlie   35    Teacher


In [14]:
pd.DataFrame(df_loaded)

Unnamed: 0,Name,Age,Occupation
0,Alice,30,Engineer
1,Bob,25,Designer
2,Charlie,35,Teacher


## Data Connections and Databases

### `Data Connections in Python: SQL and NoSQL`

1. **Introduction to Databases**
    - **Markdown:** Overview of SQL and NoSQL databases, their differences, and common use cases.
2. **SQL with SQLite**
    - **Markdown:** Introduction to SQLite, a lightweight SQL database.
    - **Code cell:** Creating a table, inserting data, and querying.

In [15]:
import sqlite3

conn = sqlite3.connect('data/example2.db') # Create a database file
cursor = conn.cursor() # Create a cursor object

cursor.execute('''
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY,
    name TEXT,
    age INTEGER,
    occupation TEXT
)
''')

users = [(1, 'Alice', 30, 'Engineer'),
            (2, 'Bob', 25, 'Designer'),
            (3, 'Charlie', 35, 'Teacher')]

cursor.executemany("INSERT INTO users VALUES (?, ?, ?, ?)", users)
conn.commit()

cursor.execute("SELECT * FROM users")
results = cursor.fetchall()

print("Database records:", results)

conn.close()

Database records: [(1, 'Alice', 30, 'Engineer'), (2, 'Bob', 25, 'Designer'), (3, 'Charlie', 35, 'Teacher')]


In [17]:
pd.DataFrame(results)

Unnamed: 0,0,1,2,3
0,1,Alice,30,Engineer
1,2,Bob,25,Designer
2,3,Charlie,35,Teacher


In [18]:
conn = sqlite3.connect('data/example2.db') # Create a database file
cursor = conn.cursor() # Create a cursor object

db_sel = cursor.execute("SELECT * FROM users")
db2 = db_sel.fetchall()

conn.close()

In [19]:
db2

[(1, 'Alice', 30, 'Engineer'),
 (2, 'Bob', 25, 'Designer'),
 (3, 'Charlie', 35, 'Teacher')]

## SQL, sqlite example from chatGPT

- `prompt`: use a sqlite and create a sample database of 5 columns and 20 items of random fruits and trees.

In [21]:
import sqlite3
import random

# Sample fruit and tree data to use
fruit_names = ["Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", "Grapefruit", "Honeydew", "Indian Fig", "Jackfruit",
               "Kiwi", "Lemon", "Mango", "Nectarine", "Orange", "Papaya", "Quince", "Raspberry", "Strawberry", "Tangerine"]
tree_types = ["Oak", "Maple", "Pine", "Birch", "Cedar", "Spruce", "Palm", "Redwood", "Cypress", "Sycamore",
              "Willow", "Beech", "Aspen", "Fir", "Hemlock", "Sequoia", "Eucalyptus", "Acacia", "Hickory", "Chestnut"]

# Create a SQLite database in memory and a cursor
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# Create the table with 5 columns: ID, FruitName, TreeType, Quantity, and PricePerUnit
cursor.execute('''
CREATE TABLE FruitsAndTrees (
    ID INTEGER PRIMARY KEY,
    FruitName TEXT,
    TreeType TEXT,
    Quantity INTEGER,
    PricePerUnit REAL
)
''')

# Populate the table with 20 random entries
for i in range(20):
    fruit_name = random.choice(fruit_names)
    tree_type = random.choice(tree_types)
    quantity = random.randint(10, 100)  # Random quantity between 10 and 100
    price_per_unit = round(random.uniform(0.5, 5.0), 2)  # Random price between 0.5 and 5.0

    cursor.execute('''
    INSERT INTO FruitsAndTrees (FruitName, TreeType, Quantity, PricePerUnit)
    VALUES (?, ?, ?, ?)
    ''', (fruit_name, tree_type, quantity, price_per_unit))

# Commit the transaction
conn.commit()

# Query the database to verify entries
cursor.execute("SELECT * FROM FruitsAndTrees")
sample_data = cursor.fetchall()

# Close the database connection after querying
conn.close()

sample_data


[(1, 'Mango', 'Maple', 38, 1.61),
 (2, 'Orange', 'Aspen', 18, 3.94),
 (3, 'Cherry', 'Palm', 69, 0.5),
 (4, 'Fig', 'Sycamore', 33, 4.67),
 (5, 'Honeydew', 'Maple', 16, 3.16),
 (6, 'Date', 'Pine', 59, 3.55),
 (7, 'Indian Fig', 'Hickory', 19, 0.59),
 (8, 'Fig', 'Willow', 45, 4.1),
 (9, 'Papaya', 'Hemlock', 79, 4.83),
 (10, 'Date', 'Chestnut', 59, 2.1),
 (11, 'Kiwi', 'Beech', 71, 1.65),
 (12, 'Honeydew', 'Sequoia', 35, 3.69),
 (13, 'Quince', 'Hickory', 48, 3.8),
 (14, 'Orange', 'Beech', 87, 3.83),
 (15, 'Papaya', 'Sycamore', 99, 4.42),
 (16, 'Indian Fig', 'Sequoia', 46, 4.45),
 (17, 'Jackfruit', 'Willow', 26, 1.02),
 (18, 'Lemon', 'Acacia', 88, 4.07),
 (19, 'Elderberry', 'Cypress', 96, 0.52),
 (20, 'Grapefruit', 'Cypress', 35, 2.41)]

### **NoSQL with MongoDB**

- **Markdown:** Overview of NoSQL databases, specifically MongoDB.
- **Code cell:** Inserting and querying documents with MongoDB (requires MongoDB server or MongoDB Atlas setup).

In [13]:
# from pymongo import MongoClient

# client = MongoClient("mongodb://localhost:27017/")
# db = client["seminar_db"]
# collection = db["users"]

# users = [
#     {"name": "Alice", "age": 30, "occupation": "Engineer"},
#     {"name": "Bob", "age": 25, "occupation": "Designer"},
#     {"name": "Charlie", "age": 35, "occupation": "Teacher"}
# ]
# collection.insert_many(users)

# for user in collection.find():
#     print("MongoDB record:", user)

## Working with Requests and HTTP

### `Making HTTP Requests in Python`

1. **Introduction to HTTP and Requests**
    - **Markdown:** Basic overview of HTTP methods and requests.
2. **GET and POST Requests**
    - **Markdown:** Explanation of GET vs. POST requests.
    - **Code cell:** Making sample GET and POST requests with the `requests` library.

In [14]:
# !pip3 install requests

In [15]:
import requests

response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
print("GET response:", response.json())

payload = {"title": "New Post", "body": "This is the body", "userId": 1}
response = requests.post("https://jsonplaceholder.typicode.com/posts", json=payload)
print("POST response:", response.json())

GET response: {'userId': 1, 'id': 1, 'title': 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit', 'body': 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto'}
POST response: {'title': 'New Post', 'body': 'This is the body', 'userId': 1, 'id': 101}


## Working with APIs

### `API Interaction with Python`

1. **Introduction to APIs**
    - **Markdown:** Overview of RESTful APIs and common practices.
2. **GitHub API Example**
    - **Markdown:** Walkthrough of accessing GitHub API to get repository data.
    - **Code cell:** Making a GET request to GitHub’s API.

In [16]:
url = "https://api.github.com/repos/python/cpython"
response = requests.get(url)
data = response.json()

print("Repository Name:", data["name"])
print("Stars:", data["stargazers_count"])
print("Forks:", data["forks_count"])

Repository Name: cpython
Stars: 63442
Forks: 30377
