# Pandas Data Input/Output
Some general data files that data scientists utilize are:
- CSV
- Excel
- HTML
- SQL

To find all the files Pandas can read, look here:  
- https://pandas.pydata.org/pandas-docs/stable/reference/io.html

In [48]:
import pandas as pd

### Read CSV Files

In [49]:
# read the example.csv file as a DataFrame
df = pd.read_csv("io_files/example.csv")

df

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [50]:
"""
output the DataFrame as a csv file called "example_output.csv",
ignore the index or else it may cause formatting issues
"""
df.to_csv("io_files/example_output.csv", index=False)

### Read Excel Files
- Can only import the data
- Cannot import formulas, images, or macros

In [51]:
# read "Sheet1" from example.xlsx as a DataFrame
df = pd.read_excel("io_files/example.xlsx", sheet_name="Sheet1")

df

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [52]:
"""
output the DataFrame as a xlsx file called "example_output.xlsx",
ignore the index or else it may cause formatting issues
"""
df.to_excel("io_files/example_output.xlsx", sheet_name="Sheet1")

### Read HTML Files (Web-Scraping)
Pandas uses BeautifulSoup (a Python web-scraping library) to simplify the web-scraping process for us. The read_html method utilizes BeautifulSoup, and it allows us to utilize the parameters to find what we want from an HTML file.
- https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html

In [53]:
# (web-scraping) read the FDIC failed banklist HTML file as best as it can
bank_list = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')

# read_html outputs as a list, not a DataFrame (finds the <li> tag)
print(type(bank_list))

# therefore, cycle through the list until we find what we want
bank_list[0].head()

<class 'list'>


Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date,Updated Date
0,Washington Federal Bank for Savings,Chicago,IL,30570,Royal Savings Bank,"December 15, 2017","February 1, 2019"
1,The Farmers and Merchants State Bank of Argonia,Argonia,KS,17719,Conway Bank,"October 13, 2017","February 21, 2018"
2,Fayette County Bank,Saint Elmo,IL,1802,"United Fidelity Bank, fsb","May 26, 2017","January 29, 2019"
3,"Guaranty Bank, (d/b/a BestBank in Georgia & Mi...",Milwaukee,WI,30003,First-Citizens Bank & Trust Company,"May 5, 2017","March 22, 2018"
4,First NBC Bank,New Orleans,LA,58302,Whitney Bank,"April 28, 2017","January 29, 2019"


# SQL Engine to Read Tables as DataFrames
It's better to use specific libraries to read-in tables from SQL.
- For MySQL: pip install pymysql
- For PostgreSQL: pip install psycopg2

However, we're still going to show how to do it with standard SQL in Pandas.

In [54]:
from sqlalchemy import create_engine

In [55]:
# create a temporary SQL-Lite engine that's running in memory
engine = create_engine("sqlite:///:memory:")

In [56]:
# Pandas will send the DataFrame as a table called "my_table" to the database
df.to_sql("my_table", engine)

In [57]:
# now read the "my_table" table from the SQL-Lite database
sql_df = pd.read_sql("my_table", con=engine)

sql_df

Unnamed: 0,index,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15
