### **Export Data with Pandas**
Pandas is a powerful and user-friendly Python library that is widely used for data manipulation and analysis. It helps you work with structured data (like spreadsheets or databases) efficiently and intuitively. For someone who isn’t a programmer, you can think of Pandas as a versatile tool for organizing and processing data—similar to using a digital spreadsheet (like Excel) but with the added capability of handling large datasets programmatically.

In [1]:
import pandas as pd
import numpy as np

"""
Practice Exercise: Pandas Basics
Library to install pandas, numpy, openpyxl.

Complete each function below by following the TODO instructions. 
Each function includes the objective of the task and the expected output.

Use Pandas official documentation for reference.
https://pandas.pydata.org/docs/user_guide/index.html
"""

'\nPractice Exercise: Pandas Basics\nLibrary to install pandas, numpy, openpyxl.\n\nComplete each function below by following the TODO instructions. \nEach function includes the objective of the task and the expected output.\n\nUse Pandas official documentation for reference.\nhttps://pandas.pydata.org/docs/user_guide/index.html\n'

In [4]:
import pandas as pd

"""
Objective: Convert data list into a Pandas DataFrame
"""
name = ["John Doe", "Nadia", "Serena", "Tessa", "Una"]
age = [25, 31, 23, 17, 23]
city = ["New York", "London", "Paris", "Tokyo", "Sydney"]

# TODO: Pair the list into a dictionary
data = {
    "Name": name,
    "Age": age,
    "City": city
}

# TODO: Create a dataframe from the dictionary
df = pd.DataFrame(data)

# TODO: Validate the dataframe output and the object type
# print(type(df))
print(df)

<class 'pandas.core.frame.DataFrame'>
       Name  Age      City
0  John Doe   25  New York
1     Nadia   31    London
2    Serena   23     Paris
3     Tessa   17     Tokyo
4       Una   23    Sydney


In [5]:
"""
Objective: Convert data dictionaries into a Pandas DataFrame
"""
dict_1 = {"name": "John Doe", "age": 25, "city": "New York"}
dict_2 = {"name": "Nadia", "age": 31, "city": "London"}
dict_3 = {"name": "Serena", "age": 23, "city": "Paris"}
dict_4 = {"name": "Tessa", "age": 17, "city": "Tokyo"}
dict_5 = {"name": "Una", "age": 23, "city": "Sydney"}

# TODO: Pair the dictionary into a list
data = [dict_1, dict_2, dict_3, dict_4, dict_5]

# TODO: Create a dataframe from the list
dataFrame = pd.DataFrame(data)

# TODO: Validate the dataframe output and the object type
print(dataFrame)

       name  age      city
0  John Doe   25  New York
1     Nadia   31    London
2    Serena   23     Paris
3     Tessa   17     Tokyo
4       Una   23    Sydney


In [6]:
"""
Objective: Adding new columns to a Pandas DataFrame
"""
# TODO: Assign the new list into a dataframe column name that not exist yet
dataFrame["Country"] = ["USA", "UK", "France", "Japan", "Australia"]

# TODO: Validate the dataframe output
print(dataFrame)



       name  age      city    Country
0  John Doe   25  New York        USA
1     Nadia   31    London         UK
2    Serena   23     Paris     France
3     Tessa   17     Tokyo      Japan
4       Una   23    Sydney  Australia


In [7]:
"""
Objective: Adding new rows to a Pandas DataFrame
"""
new_row = {"name": "Victoria", "age": 30, "city": "New York", "is_married": True}

# TODO: Create a dataframe from the dictionary
new_row_df = pd.DataFrame([new_row])
# TODO: Concatenate previous dataframe with new dataframe and ignore index
dataFrame = pd.concat([dataFrame, new_row_df], ignore_index=True)
# TODO: Validate the dataframe output
print(dataFrame)

       name  age      city    Country is_married
0  John Doe   25  New York        USA        NaN
1     Nadia   31    London         UK        NaN
2    Serena   23     Paris     France        NaN
3     Tessa   17     Tokyo      Japan        NaN
4       Una   23    Sydney  Australia        NaN
5  Victoria   30  New York        NaN       True


In [8]:
""" 
Objective: Renaming columns
"""

# TODO: Create a dictionary of {old column: new column} in columns variable
rename_column = {"name": "full_name"}

# TODO: Use .rename(columns=columns) and assign columns variable as parameter
dataFrame = dataFrame.rename(columns=rename_column)

# TODO: Check the new renamed dataframe
print(dataFrame)


  full_name  age      city    Country
0  John Doe   25  New York        USA
1     Nadia   31    London         UK
2    Serena   23     Paris     France
3     Tessa   17     Tokyo      Japan
4       Una   23    Sydney  Australia


In [9]:
"""
Objective: Export as CSV
"""
# TODO: Use .to_csv(filename) to export as csv file
dataFrame.to_csv("data.csv")


In [12]:
%pip install openpyxl

import openpyxl
"""
Objective: Export as Excel without index
"""
# TODO: Use .to_excel(filename, index=False) to export as excel file without index
dataFrame.to_excel("data.xlsx", index=False)


Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.9/250.9 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading et_xmlfile-2.0.0-py3-none-any.whl (18 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-2.0.0 openpyxl-3.1.5
Note: you may need to restart the kernel to use updated packages.


### **Reflection**
Is there any difference in data represented as a csv or an excel using Pandas?

yes, the difference is in csv format data is separated with commas (,) for one line of column and if excel file the data is displayed ona a table format with rows and columns

### **Exploration**
Pandas has .read_html() methods that dirrectly reading HTML content or even a URL. Can we replace the need of Requests+BeautifulSoup by just using pandas.read_html()?
Try by scraping https://www.scrapingcourse.com/table-parsing using requests to get the HTML and pandas to extract the HTML content.