### **Export Data with Pandas**
Pandas is a powerful and user-friendly Python library that is widely used for data manipulation and analysis. It helps you work with structured data (like spreadsheets or databases) efficiently and intuitively. For someone who isn’t a programmer, you can think of Pandas as a versatile tool for organizing and processing data—similar to using a digital spreadsheet (like Excel) but with the added capability of handling large datasets programmatically.

In [None]:
import pandas as pd
import numpy as np

"""
Practice Exercise: Pandas Basics
Library to install pandas, numpy, openpyxl.

Complete each function below by following the TODO instructions. 
Each function includes the objective of the task and the expected output.

Use Pandas official documentation for reference.
https://pandas.pydata.org/docs/user_guide/index.html
"""

In [4]:
import pandas as pd

"""
Objective: Convert data list into a Pandas DataFrame
"""
name = ["John Doe", "Nadia", "Serena", "Tessa", "Una"]
age = [25, 31, 23, 17, 23]
city = ["New York", "London", "Paris", "Tokyo", "Sydney"]

# TODO: Pair the list into a dictionary
# TODO: Create a dataframe from the dictionary
# TODO: Validate the dataframe output and the object type
def create_dataframe():
    data = {
        "name": name,
        "age": age,
        "city": city
    }
    df = pd.DataFrame(data)
    return df

print("DataFrame:")
df = create_dataframe()
print(df)


DataFrame:
       name  age      city
0  John Doe   25  New York
1     Nadia   31    London
2    Serena   23     Paris
3     Tessa   17     Tokyo
4       Una   23    Sydney


In [5]:
"""
Objective: Convert data dictionaries into a Pandas DataFrame
"""
dict_1 = {"name": "John Doe", "age": 25, "city": "New York"}
dict_2 = {"name": "Nadia", "age": 31, "city": "London"}
dict_3 = {"name": "Serena", "age": 23, "city": "Paris"}
dict_4 = {"name": "Tessa", "age": 17, "city": "Tokyo"}
dict_5 = {"name": "Una", "age": 23, "city": "Sydney"}

# TODO: Pair the dictionary into a list
# TODO: Create a dataframe from the list
# TODO: Validate the dataframe output and the object type
def create_dataframe_from_dicts():
    data = [dict_1, dict_2, dict_3, dict_4, dict_5]
    df = pd.DataFrame(data)
    return df
print("\nDataFrame from dictionaries:")
df = create_dataframe_from_dicts()
print(df)





DataFrame from dictionaries:
       name  age      city
0  John Doe   25  New York
1     Nadia   31    London
2    Serena   23     Paris
3     Tessa   17     Tokyo
4       Una   23    Sydney


In [6]:
"""
Objective: Adding new columns to a Pandas DataFrame
"""
# TODO: Assign the new list into a dataframe column name that not exist yet
# TODO: Validate the dataframe output
print("\nAdding new columns to a DataFrame:")
def add_new_column():
    df = create_dataframe()
    df["country"] = ["USA", "UK", "France", "Japan", "Australia"]
    return df
df = add_new_column()
print(df)





Adding new columns to a DataFrame:
       name  age      city    country
0  John Doe   25  New York        USA
1     Nadia   31    London         UK
2    Serena   23     Paris     France
3     Tessa   17     Tokyo      Japan
4       Una   23    Sydney  Australia


In [7]:
"""
Objective: Adding new rows to a Pandas DataFrame
"""
new_row = {"name": "Victoria", "age": 30, "city": "New York", "is_married": True}

# TODO: Create a dataframe from the dictionary
# TODO: Concatenate previous dataframe with new dataframe and ignore index
# TODO: Validate the dataframe output
print("\nAdding new rows to a DataFrame:")
def add_new_row():
    df = create_dataframe()
    new_df = pd.DataFrame([new_row])
    df = pd.concat([df, new_df], ignore_index=True)
    return df
df = add_new_row()
print(df)





Adding new rows to a DataFrame:
       name  age      city is_married
0  John Doe   25  New York        NaN
1     Nadia   31    London        NaN
2    Serena   23     Paris        NaN
3     Tessa   17     Tokyo        NaN
4       Una   23    Sydney        NaN
5  Victoria   30  New York       True


In [8]:
""" 
Objective: Renaming columns
"""
# TODO: Create a dictionary of {old column: new column} in columns variable
# TODO: Use .rename(columns=columns) and assign columns variable as parameter
# TODO: Check the new renamed dataframe
print("\nRenaming columns:")
def rename_columns():
    df = create_dataframe()
    columns = {
        "name": "full_name",
        "age": "years_old",
        "city": "location"
    }
    df.rename(columns=columns, inplace=True)
    return df
df = rename_columns()
print(df)



Renaming columns:
  full_name  years_old  location
0  John Doe         25  New York
1     Nadia         31    London
2    Serena         23     Paris
3     Tessa         17     Tokyo
4       Una         23    Sydney


In [9]:
"""
Objective: Export as CSV
"""
# TODO: Use .to_csv(filename) to export as csv file
print("\nExporting as CSV:")
def export_as_csv():
    df = create_dataframe()
    df.to_csv("data.csv", index=False)
    return "data.csv"
filename = export_as_csv()
print(f"Exported to {filename}")


Exporting as CSV:
Exported to data.csv


In [None]:
"""
Objective: Export as Excel without index
"""
# TODO: Use .to_excel(filename, index=False) to export as excel file without index
print("\nExporting as Excel:")
def export_as_excel():
    df = create_dataframe()
    df.to_excel("data.xlsx", index=False)
    return "data.xlsx"
filename = export_as_excel()
print(f"Exported to {filename}")

### **Reflection**
Is there any difference in data represented as a csv or an excel using Pandas?

(answer here)

### **Exploration**
Pandas has .read_html() methods that dirrectly reading HTML content or even a URL. Can we replace the need of Requests+BeautifulSoup by just using pandas.read_html()?
Try by scraping https://www.scrapingcourse.com/table-parsing using requests to get the HTML and pandas to extract the HTML content.