# Data Input and Output

This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. Let's take a look at the most common data types:

In [1]:
import numpy as np
import pandas as pd

# Reading Data set

### CSV file


In [6]:
df = pd.read_csv('tennis.csv')
df

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [7]:
# Import csv file using read_table method by using sep parameter
pd.read_table("test_csv.tsv")

Unnamed: 0,"outlook,temp,humidity,windy,play"
0,"sunny,hot,high,FALSE,no"
1,"sunny,hot,high,TRUE,no"
2,"overcast,hot,high,FALSE,yes"
3,"rainy,mild,high,FALSE,yes"


In [8]:
pd.read_table("test_csv.tsv",sep=",")

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes


In [10]:
# Import data not having column labels
pd.read_csv("test_header.csv")

Unnamed: 0,sunny,hot,high,FALSE,no
0,sunny,hot,high,True,no
1,overcast,hot,high,False,yes
2,rainy,mild,high,False,yes


`In read_csv by default header='infer', means first row consider as a header or column labels.`

In [11]:
pd.read_csv("test_header.csv",header=None) # BY using header=None ,we can say that don't select first row as header

Unnamed: 0,0,1,2,3,4
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes


In [12]:
# Add own Headers to the data by names parameter

pd.read_csv("test_header.csv",names=['outlook','temp','humidity','windy','play'])

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes


In [13]:
# Add own Headers to the data by names parameter

pd.read_csv("test_header.csv",names=['outlook','temp','humidity','windy','play','Ex1','Ex2'])

Unnamed: 0,outlook,temp,humidity,windy,play,Ex1,Ex2
0,sunny,hot,high,False,no,,
1,sunny,hot,high,True,no,,
2,overcast,hot,high,False,yes,,
3,rainy,mild,high,False,yes,,


No error even though we have extra columns

In [14]:
# Importing multi index data
pd.read_csv("csv_mindex.csv")

Unnamed: 0,key1,key2,value1,value2
0,one,a,1,2
1,one,b,3,4
2,one,c,5,6
3,one,d,7,8
4,two,a,9,10
5,two,b,11,12
6,two,c,13,14
7,two,d,15,16


In [16]:
# This is an example of multi level indexes
d = pd.read_csv("csv_mindex.csv",index_col=['key1','key2'])
d # Try to consider key1 and key2 as a row indexes - column name one row having multiple row(row inside a row)

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


In [24]:
pd.read_csv("ex4.csv",skiprows=None,header=None)

Unnamed: 0,0,1,2,3,4
0,# hey!,,,,
1,a,b,c,d,message
2,# just wanted to make things more difficult fo...,,,,
3,# who reads CSV files with computers,anyway?,,,
4,1,2,3,4,hello
5,5,6,7,8,world
6,9,10,11,12,python


In [22]:
pd.read_csv("ex4.csv",skiprows=[0,2,3])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,python


## Excel
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash. 

### To get number of sheets in a single workbooks

In [7]:
import xlrd as xl
workbook = xl.open_workbook("test_data.xlsx")
print("Sheet Names :",workbook.sheet_names())
print("No.of Sheets : ", len(workbook.sheet_names()))

Sheet Names : ['fuses', 'car', 'deck', 'HB', 'chi', 'anova', 'Fer']
No.of Sheets :  7


In [31]:
pd.read_excel('test_data.xlsx',sheetname=0)

  **kwds)


Unnamed: 0,fuses made
0,255
1,254
2,251
3,260
4,255
5,263
6,258
7,250
8,260
9,261


In [34]:
pd.read_excel("test_data.xlsx",sheet_name='fuses')

Unnamed: 0,fuses made
0,255
1,254
2,251
3,260
4,255
5,263
6,258
7,250
8,260
9,261


### Excel Output

In [33]:
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')

## HTML

You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:

    conda install lxml
    conda install html5lib
    conda install BeautifulSoup4

Then restart Jupyter Notebook.
(or use pip install if you aren't using the Anaconda Distribution)

Pandas can read table tabs off of html. For example:

### HTML Input

Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects:

In [36]:
url = "http://www.basketball-reference.com/leagues/NBA_2015_totals.html"
df = pd.read_html(url)

In [37]:
df

[      Rk                 Player  Pos  Age   Tm   G  GS    MP   FG   FGA  ...   \
 0      1             Quincy Acy   PF   24  NYK  68  22  1287  152   331  ...    
 1      2           Jordan Adams   SG   20  MEM  30   0   248   35    86  ...    
 2      3           Steven Adams    C   21  OKC  70  67  1771  217   399  ...    
 3      4            Jeff Adrien   PF   28  MIN  17   0   215   19    44  ...    
 4      5          Arron Afflalo   SG   29  TOT  78  72  2502  375   884  ...    
 5      5          Arron Afflalo   SG   29  DEN  53  53  1750  281   657  ...    
 6      5          Arron Afflalo   SG   29  POR  25  19   752   94   227  ...    
 7      6          Alexis Ajinça    C   26  NOP  68   8   957  181   329  ...    
 8      7         Furkan Aldemir   PF   23  PHI  41   9   540   40    78  ...    
 9      8           Cole Aldrich    C   26  NYK  61  16   976  144   301  ...    
 10     9      LaMarcus Aldridge   PF   29  POR  71  71  2512  659  1415  ...    
 11    10       

In [38]:
type(df)

list

In [39]:
len(df)

1

In [43]:
print(df[0]) # Extracting table from list

      Rk                 Player  Pos  Age   Tm   G  GS    MP   FG   FGA  ...   \
0      1             Quincy Acy   PF   24  NYK  68  22  1287  152   331  ...    
1      2           Jordan Adams   SG   20  MEM  30   0   248   35    86  ...    
2      3           Steven Adams    C   21  OKC  70  67  1771  217   399  ...    
3      4            Jeff Adrien   PF   28  MIN  17   0   215   19    44  ...    
4      5          Arron Afflalo   SG   29  TOT  78  72  2502  375   884  ...    
5      5          Arron Afflalo   SG   29  DEN  53  53  1750  281   657  ...    
6      5          Arron Afflalo   SG   29  POR  25  19   752   94   227  ...    
7      6          Alexis Ajinça    C   26  NOP  68   8   957  181   329  ...    
8      7         Furkan Aldemir   PF   23  PHI  41   9   540   40    78  ...    
9      8           Cole Aldrich    C   26  NYK  61  16   976  144   301  ...    
10     9      LaMarcus Aldridge   PF   29  POR  71  71  2512  659  1415  ...    
11    10            Lavoy Al

In [44]:
print(type(df[0]))

<class 'pandas.core.frame.DataFrame'>


In [2]:
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')

In [3]:
df[0]

Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date,Updated Date
0,The Enloe State Bank,Cooper,TX,10716,"Legend Bank, N. A.","May 31, 2019","June 18, 2019"
1,Washington Federal Bank for Savings,Chicago,IL,30570,Royal Savings Bank,"December 15, 2017","February 1, 2019"
2,The Farmers and Merchants State Bank of Argonia,Argonia,KS,17719,Conway Bank,"October 13, 2017","February 21, 2018"
3,Fayette County Bank,Saint Elmo,IL,1802,"United Fidelity Bank, fsb","May 26, 2017","January 29, 2019"
4,"Guaranty Bank, (d/b/a BestBank in Georgia & Mi...",Milwaukee,WI,30003,First-Citizens Bank & Trust Company,"May 5, 2017","March 22, 2018"
5,First NBC Bank,New Orleans,LA,58302,Whitney Bank,"April 28, 2017","January 29, 2019"
6,Proficio Bank,Cottonwood Heights,UT,35495,Cache Valley Bank,"March 3, 2017","January 29, 2019"
7,Seaway Bank and Trust Company,Chicago,IL,19328,State Bank of Texas,"January 27, 2017","January 29, 2019"
8,Harvest Community Bank,Pennsville,NJ,34951,First-Citizens Bank & Trust Company,"January 13, 2017","May 18, 2017"
9,Allied Bank,Mulberry,AR,91,Today's Bank,"September 23, 2016","May 13, 2019"


## Writing 

In [49]:
d = df[0]

In [51]:
d.to_csv("basketball.csv")