### **Export Data with Pandas**
Pandas is a powerful and user-friendly Python library that is widely used for data manipulation and analysis. It helps you work with structured data (like spreadsheets or databases) efficiently and intuitively. For someone who isn’t a programmer, you can think of Pandas as a versatile tool for organizing and processing data—similar to using a digital spreadsheet (like Excel) but with the added capability of handling large datasets programmatically.

In [2]:
import pandas as pd
import numpy as np

"""
Practice Exercise: Pandas Basics
Library to install pandas, numpy, openpyxl.

Complete each function below by following the TODO instructions. 
Each function includes the objective of the task and the expected output.

Use Pandas official documentation for reference.
https://pandas.pydata.org/docs/user_guide/index.html
"""

'\nPractice Exercise: Pandas Basics\nLibrary to install pandas, numpy, openpyxl.\n\nComplete each function below by following the TODO instructions. \nEach function includes the objective of the task and the expected output.\n\nUse Pandas official documentation for reference.\nhttps://pandas.pydata.org/docs/user_guide/index.html\n'

In [3]:
"""
Objective: Convert data list into a Pandas DataFrame
"""
name = ["John Doe", "Nadia", "Serena", "Tessa", "Una"]
age = [25, 31, 23, 17, 23]
city = ["New York", "London", "Paris", "Tokyo", "Sydney"]

# TODO: Pair the list into a dictionary
dict_pair = {
    'name': name,
    'age': age,
    'city': city
}
# TODO: Create a dataframe from the dictionary
df = pd.DataFrame(dict_pair)
# TODO: Validate the dataframe output and the object type
df

Unnamed: 0,name,age,city
0,John Doe,25,New York
1,Nadia,31,London
2,Serena,23,Paris
3,Tessa,17,Tokyo
4,Una,23,Sydney


In [4]:
"""
Objective: Convert data dictionaries into a Pandas DataFrame
"""
dict_1 = {"name": "John Doe", "age": 25, "city": "New York"}
dict_2 = {"name": "Nadia", "age": 31, "city": "London"}
dict_3 = {"name": "Serena", "age": 23, "city": "Paris"}
dict_4 = {"name": "Tessa", "age": 17, "city": "Tokyo"}
dict_5 = {"name": "Una", "age": 23, "city": "Sydney"}

# TODO: Pair the dictionary into a list\
new_list = []
new_list.extend([dict_1, dict_2, dict_3, dict_4, dict_5])
# TODO: Create a dataframe from the list
df = pd.DataFrame(new_list)
# TODO: Validate the dataframe output and the object type
df

Unnamed: 0,name,age,city
0,John Doe,25,New York
1,Nadia,31,London
2,Serena,23,Paris
3,Tessa,17,Tokyo
4,Una,23,Sydney


In [5]:
"""
Objective: Adding new columns to a Pandas DataFrame
"""
# TODO: Assign the new list into a dataframe column name that not exist yet
new_cols = [1,2,3,4,5]
df = df.assign(salary=new_cols)
# TODO: Validate the dataframe output
df

Unnamed: 0,name,age,city,salary
0,John Doe,25,New York,1
1,Nadia,31,London,2
2,Serena,23,Paris,3
3,Tessa,17,Tokyo,4
4,Una,23,Sydney,5


In [6]:
"""
Objective: Adding new rows to a Pandas DataFrame
"""
new_row = {"name": "Victoria", "age": 30, "city": "New York", "is_married": True}

# TODO: Create a dataframe from the dictionary
df2 = pd.DataFrame([new_row])
# TODO: Concatenate previous dataframe with new dataframe and ignore index
df3 = pd.concat([df, df2])
# TODO: Validate the dataframe output
df3

Unnamed: 0,name,age,city,salary,is_married
0,John Doe,25,New York,1.0,
1,Nadia,31,London,2.0,
2,Serena,23,Paris,3.0,
3,Tessa,17,Tokyo,4.0,
4,Una,23,Sydney,5.0,
0,Victoria,30,New York,,True


In [7]:
""" 
Objective: Renaming columns
"""
# TODO: Create a dictionary of {old column: new column} in columns variable
columns = {
    'name': 'Name',
    'age': 'Age',
    'city': 'City',
    'salary': 'Salary',
    'is_married': 'Married'
}
# TODO: Use .rename(columns=columns) and assign columns variable as parameter
df4 = df3.rename(columns=columns)
# TODO: Check the new renamed dataframe
df4

Unnamed: 0,Name,Age,City,Salary,Married
0,John Doe,25,New York,1.0,
1,Nadia,31,London,2.0,
2,Serena,23,Paris,3.0,
3,Tessa,17,Tokyo,4.0,
4,Una,23,Sydney,5.0,
0,Victoria,30,New York,,True


In [8]:
"""
Objective: Export as CSV
"""
# TODO: Use .to_csv(filename) to export as csv file
df4.to_csv("csv.csv")

In [9]:
"""
Objective: Export as Excel without index
"""
# TODO: Use .to_excel(filename, index=False) to export as excel file without index
df4.to_excel("excel.xlsx", index=False)

### **Reflection**
Is there any difference in data represented as a csv or an excel using Pandas?

In a modern editor, there is not much visual difference between CSV and Excel for tabular data, as most editors preview CSV files in a tabular format with rows and columns, similar to Excel. However, in reality, the data in a CSV file is separated by commas and new lines.

### **Exploration**
Pandas has .read_html() methods that dirrectly reading HTML content or even a URL. Can we replace the need of Requests+BeautifulSoup by just using pandas.read_html()?
Try by scraping https://www.scrapingcourse.com/table-parsing using requests to get the HTML and pandas to extract the HTML content.

In [10]:
url2 = "https://www.scrapingcourse.com/table-parsing"
import requests as req
from io import StringIO

response = req.get(url2)
pd.read_html(StringIO(str(response.text)))

[    Product ID                 Name       Category    Price In Stock
 0            1               Laptop    Electronics  $999.99      Yes
 1            2           Smartphone    Electronics  $599.99      Yes
 2            3           Headphones          Audio  $149.99       No
 3            4         Coffee Maker     Appliances   $79.99      Yes
 4            5        Running Shoes         Sports   $89.99      Yes
 5            6          Smart Watch    Electronics  $249.99      Yes
 6            7              Blender     Appliances   $39.99       No
 7            8             Yoga Mat         Sports   $29.99      Yes
 8            9       Wireless Mouse    Electronics   $24.99      Yes
 9           10            Desk Lamp           Home   $34.99      Yes
 10          11     Portable Speaker          Audio   $79.99       No
 11          12  Electric Toothbrush  Personal Care   $49.99      Yes
 12          13             Backpack    Accessories   $59.99      Yes
 13          14     