# Data Input and Output with Pandas

Pandas provides powerful tools for reading and writing data in various formats. Let's explore the most common methods.


In [2]:
import numpy as np
import pandas as pd

## CSV Files

### CSV Input

In [4]:
df = pd.read_csv('example.csv')
df.head()

Unnamed: 0,Row No,Order Code,Order Date,Ship Date,Delivery Mode,Client ID,Client Name,Category Type,Country,City,...,Postal Code,Region,Item Code,Product Category,Product Type,Product Title,Sale Price,Units Sold,Discount Rate,Net Profit
0,1,ORD-300774,03/02/2021,01/04/2020,First Class,CID-87591,Michael Smith,Home Office,United States,New York,...,62424,North,ITM-567743,Office Supplies,Binders,Filing Cabinet,476.92,1,0.1,-98.42
1,2,ORD-600409,05/07/2020,01/04/2020,Second Class,CID-62245,Emma Brown,Home Office,Canada,Los Angeles,...,20315,Central,ITM-224900,Electronics,Binders,Noise Cancelling Headphones,438.01,8,0.33,467.05
2,3,ORD-457246,10/08/2021,01/04/2020,Second Class,CID-17857,Robert Taylor,Home Office,United States,New York,...,22852,North,ITM-135683,Furniture,Paper,Office Paper,960.73,1,0.09,-20.73
3,4,ORD-106157,05/14/2021,01/06/2020,Second Class,CID-69708,Emma Brown,Home Office,United Kingdom,Los Angeles,...,14912,North,ITM-947921,Electronics,Storage,Smartphone X,811.6,9,0.34,-175.79
4,5,ORD-369570,07/09/2021,01/05/2020,Standard Class,CID-86540,James Wilson,Home Office,United Kingdom,London,...,19438,North,ITM-330322,Electronics,Binders,Wireless Keyboard,1253.69,4,0.47,-431.21


### CSV Output

In [6]:
#let's do some modification on the df and save it as a new df 
df['Total Revenue'] = df['Sale Price'] * df['Units Sold']  # Add new column
df['Profit Margin'] = df['Net Profit'] / df['Total Revenue']  # Add another column
df['Order Year'] = pd.to_datetime(df['Order Date']).dt.year  # Extract year from Order Date

# Save to new CSV
df.to_csv('modified_sales_data.csv', index=False)

## Excel Files
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash. 

### Excel Input

In [9]:
pd.read_excel('excel_example.xlsx')

Unnamed: 0,Row No,Order Code,Order Date,Ship Date,Delivery Mode,Client ID,Client Name,Category Type,Country,City,...,Postal Code,Region,Item Code,Product Category,Product Type,Product Title,Sale Price,Units Sold,Discount Rate,Net Profit
0,1,ORD-300774,03/02/2021,01/04/2020,First Class,CID-87591,Michael Smith,Home Office,United States,New York,...,62424,North,ITM-567743,Office Supplies,Binders,Filing Cabinet,476.92,1,0.10,-98.42
1,2,ORD-600409,05/07/2020,01/04/2020,Second Class,CID-62245,Emma Brown,Home Office,Canada,Los Angeles,...,20315,Central,ITM-224900,Electronics,Binders,Noise Cancelling Headphones,438.01,8,0.33,467.05
2,3,ORD-457246,10/08/2021,01/04/2020,Second Class,CID-17857,Robert Taylor,Home Office,United States,New York,...,22852,North,ITM-135683,Furniture,Paper,Office Paper,960.73,1,0.09,-20.73
3,4,ORD-106157,05/14/2021,01/06/2020,Second Class,CID-69708,Emma Brown,Home Office,United Kingdom,Los Angeles,...,14912,North,ITM-947921,Electronics,Storage,Smartphone X,811.60,9,0.34,-175.79
4,5,ORD-369570,07/09/2021,01/05/2020,Standard Class,CID-86540,James Wilson,Home Office,United Kingdom,London,...,19438,North,ITM-330322,Electronics,Binders,Wireless Keyboard,1253.69,4,0.47,-431.21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,496,ORD-873689,06/07/2021,01/10/2020,Same Day,CID-95729,Robert Taylor,Corporate,Canada,New York,...,25499,West,ITM-714005,Furniture,Appliances,Laser Printer,1373.23,5,0.41,30.28
496,497,ORD-756397,08/16/2020,01/04/2020,Same Day,CID-95965,Emma Brown,Home Office,United States,Los Angeles,...,55766,South,ITM-797959,Office Supplies,Storage,Wireless Keyboard,772.73,1,0.49,-8.46
497,498,ORD-909740,06/21/2020,01/06/2020,Second Class,CID-30373,Olivia Davis,Consumer,Canada,Phoenix,...,75088,East,ITM-243038,Furniture,Paper,Noise Cancelling Headphones,1214.32,7,0.30,-280.60
498,499,ORD-801693,09/23/2020,01/05/2020,Second Class,CID-13147,Michael Smith,Consumer,United States,Houston,...,40273,South,ITM-938918,Furniture,Phones,Wireless Keyboard,807.98,5,0.03,62.99


In [13]:
df.columns

Index(['Row No', 'Order Code', 'Order Date', 'Ship Date', 'Delivery Mode',
       'Client ID', 'Client Name', 'Category Type', 'Country', 'City',
       'State/Province', 'Postal Code', 'Region', 'Item Code',
       'Product Category', 'Product Type', 'Product Title', 'Sale Price',
       'Units Sold', 'Discount Rate', 'Net Profit', 'Total Revenue',
       'Profit Margin', 'Order Year'],
      dtype='object')

### Excel Output


In [17]:
revenue=df['Total Revenue']
profit=df['Net Profit']
# Create Excel writer
with pd.ExcelWriter('financial_reports.xlsx') as writer:
    revenue.to_excel(writer, sheet_name='Total Revenue')
    profit.to_excel(writer, sheet_name='Net Profit')


## Working with HTML Tables in Pandas

### Required Packages
To parse HTML tables in Pandas, ensure these packages are installed:

- **`lxml`** (fast HTML/XML parser)  
- **`html5lib`** (backup parser for complex HTML)  
- **`BeautifulSoup4`** (optional but useful for scraping)  

---

### Installation Instructions

#### **Option 1: Anaconda (Recommended)**
Run in **terminal/command prompt**:
```bash
conda install lxml html5lib beautifulsoup4
```
**Restart Jupyter Notebook** afterward.

#### **Option 2: pip (Non-Anaconda Users)**
```bash
pip install lxml html5lib beautifulsoup4
```

---

### Example: Reading HTML Tables
Pandas can extract tables directly from HTML (e.g., websites or local files):

```python
import pandas as pd

# Read tables from a URL
url = "https://example.com/data.html"
tables = pd.read_html(url)  # Returns a list of DataFrames

# Display the first table
tables[0].head()
```

> **Note**:  
> - `lxml` is the default parser (fastest).  
> - `html5lib` handles malformed HTML but is slower.  
> - `BeautifulSoup4` is used internally by Pandas for parsing.

**Wikipedia List of Countries: Contains clean tables with country names and populations.**

In [21]:
df1 = pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)")

In [22]:
df1[0]

Unnamed: 0,Country or territory,Population (1 July 2022),Population (1 July 2023),Change (%),UN continental region[1],UN statistical subregion[1]
0,World,8021407192,8091734930,+0.88%,–,–
1,India,1425423212,1438069596,+0.89%,Asia,Southern Asia
2,China[a],1425179569,1422584933,−0.18%,Asia,Eastern Asia
3,United States,341534046,343477335,+0.57%,Americas,Northern America
4,Indonesia,278830529,281190067,+0.85%,Asia,South-eastern Asia
...,...,...,...,...,...,...
233,Montserrat (United Kingdom),4453,4420,−0.74%,Americas,Caribbean
234,Falkland Islands (United Kingdom),3490,3477,−0.37%,Americas,South America
235,Tokelau (New Zealand),2290,2397,+4.67%,Oceania,Polynesia
236,Niue (New Zealand),1821,1817,−0.22%,Oceania,Polynesia


**Financial Data (NASDAQ Stocks): S&P 500 components table (useful for financial analysis).**

In [26]:
pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]


Unnamed: 0,Symbol,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded
0,MMM,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902
1,AOS,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916
2,ABT,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888
3,ABBV,AbbVie,Health Care,Biotechnology,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
4,ACN,Accenture,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989
...,...,...,...,...,...,...,...,...
498,XYL,Xylem Inc.,Industrials,Industrial Machinery & Supplies & Components,"White Plains, New York",2011-11-01,1524472,2011
499,YUM,Yum! Brands,Consumer Discretionary,Restaurants,"Louisville, Kentucky",1997-10-06,1041061,1997
500,ZBRA,Zebra Technologies,Information Technology,Electronic Equipment & Instruments,"Lincolnshire, Illinois",2019-12-23,877212,1969
501,ZBH,Zimmer Biomet,Health Care,Health Care Equipment,"Warsaw, Indiana",2001-08-07,1136869,1927


**Climate Data: CO₂ emissions by country (requires table index selection).**

In [29]:
pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions")[1]

Unnamed: 0_level_0,Fossil CO2 emissions 2023,Country/Territory/Region/Group,Fossil CO2 emissions per capita 2023
Unnamed: 0_level_1,ktCO2,Country/Territory/Region/Group,tCO2
0,13259638.95,China,9.24
1,4682039.41,United States,13.83
2,2955181.68,India,2.07
3,2069502.01,Russia,14.45
4,944758.61,Japan,7.54
...,...,...,...
205,19.45,Falkland Islands,6.48
206,17.77,"Saint Helena, Ascension and Tristan da Cunha",4.44
207,2.10,Faroe Islands,0.04
208,39023937.04,Global Total,4.86
