## Lab : Importing and exporting data with python

Functions like the Pandas read_csv() method enable you to work with files effectively. You can use them to save the data and labels from Pandas objects to a file and load them later as Pandas Series or DataFrame instances. Similarly, one can read json,excel,text and many other files.

In [1]:
import pandas as pd

### 1. Reading a csv

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

We will be reading data on births that is stored in csv format. The dataset consists of 4 columns year,month,day and births. We can extract reuired columns using usecols.

In [21]:
df=pd.read_csv("births.csv",usecols=['year','month','day','births'])
df.head()

Unnamed: 0,year,month,day,births
0,1969,1,1.0,4046
1,1969,1,1.0,4440
2,1969,1,2.0,4454
3,1969,1,2.0,4548
4,1969,1,3.0,4548


In [24]:
# One can pass header=None in read_csv to start reading data from row 1 itself, hence with no column headers.

df1=pd.read_csv("births.csv",header=None)
df1.head()

Unnamed: 0,0,1,2,3,4
0,year,month,day,gender,births
1,1969,1,1,F,4046
2,1969,1,1,M,4440
3,1969,1,2,F,4454
4,1969,1,2,M,4548


### 2. Reading an excel file

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

We will be reading data on distances travelled by current and newly designed golf balls. The data is currently stored in Sheet 1 of an excel workbook. The Golf Sheet consists of 2 columns Current and New in that order, while sheet GolfR consists of the same column in reversed order. 

In [4]:
# By default read_excel() reads the first sheet
df2=pd.read_excel("Golf.xlsx")
df2.head()

Unnamed: 0,Current,New
0,264,277
1,261,269
2,267,263
3,272,266
4,258,262


In [5]:
# We can specify the sheet name to read a specific sheet

df3=pd.read_excel("Golf.xlsx",sheet_name="GolfR")
df3.head()

Unnamed: 0,New,Current
0,277,264
1,269,261
2,263,267
3,266,272
4,262,258


### 3. Reading JSON

https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

We will be reading iris.json with contains data on length and width of petals and sepals of 3 flower species. The 3 species are setosa, virginica and versicolor.

In [6]:
df4=pd.read_json("iris.json")
df4.head()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


### 4. Reading a table

https://pandas.pydata.org/docs/reference/api/pandas.read_table.html

We will be reading data on births which is stored in txt format. The dataset consists of 5 columns year,month,day,births and population. We can extract reuired columns using usecols.

In [25]:
df5=pd.read_table("births.csv",sep=",")
df5.head()

Unnamed: 0,year,month,day,gender,births
0,1969,1,1.0,F,4046
1,1969,1,1.0,M,4440
2,1969,1,2.0,F,4454
3,1969,1,2.0,M,4548
4,1969,1,3.0,F,4548


### 5. Read a table of fixed-width formatted lines

https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html

In [8]:
df6=pd.read_fwf("birthsfwf.txt",header=None)
df6

Unnamed: 0,0,1,2,3,4
0,1969,1,1,F,4046
1,1969,1,1,M,4440
2,1969,1,2,F,4454
3,1969,1,2,M,4548


### 6. Reading XML file

https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

We will be reading data on books which is stored in xml format. The data consists of bookid, author, title, genre, price, publish_date and description of the book.

In [27]:
df7=pd.read_xml("books.xml")
df7.head()

Unnamed: 0,id,author,title,genre,price,publish_date,description
0,bk101,"Gambardella, Matthew",XML Developer's Guide,Computer,44.95,2000-10-01,An in-depth look at creating applications \n ...
1,bk102,"Ralls, Kim",Midnight Rain,Fantasy,5.95,2000-12-16,"A former architect battles corporate zombies, ..."
2,bk103,"Corets, Eva",Maeve Ascendant,Fantasy,5.95,2000-11-17,After the collapse of a nanotechnology \n ...
3,bk104,"Corets, Eva",Oberon's Legacy,Fantasy,5.95,2001-03-10,"In post-apocalypse England, the mysterious \n ..."
4,bk105,"Corets, Eva",The Sundered Grail,Fantasy,5.95,2001-09-10,"The two daughters of Maeve, half-sisters, \n ..."


### 7. Reading HTML

https://pandas.pydata.org/docs/reference/api/pandas.read_html.html

In [29]:
html = pd.read_html('https://en.wikipedia.org/wiki/Minnesota', match='Election results from statewide races')
html

[    Year     Office    GOP    DFL Others
 0   2020  President  45.3%  52.4%   2.3%
 1   2020    Senator  43.5%  48.8%   7.7%
 2   2018   Governor  42.4%  53.9%   3.7%
 3   2018    Senator  36.2%  60.3%   3.4%
 4   2018    Senator  42.4%  53.0%   4.6%
 5   2016  President  44.9%  46.4%   8.6%
 6   2014   Governor  44.5%  50.1%   5.4%
 7   2014    Senator  42.9%  53.2%   3.9%
 8   2012  President  45.1%  52.8%   2.1%
 9   2012    Senator  30.6%  65.3%   4.1%
 10  2010   Governor  43.2%  43.7%  13.1%
 11  2008  President  43.8%  54.1%   2.1%
 12  2008    Senator  42.0%  42.0%  16.0%
 13  2006   Governor  46.7%  45.7%   7.6%
 14  2006    Senator  37.9%  58.1%   4.0%
 15  2004  President  47.6%  51.1%   1.3%
 16  2002   Governor  44.4%  33.5%  22.1%
 17  2002    Senator  49.5%  47.3%   1.0%
 18  2000  President  45.5%  47.9%   6.6%
 19  2000    Senator  43.3%  48.8%   7.9%
 20  1998   Governor  34.3%  28.1%  37.6%
 21  1996  President  35.0%  51.1%  13.9%
 22  1996    Senator  41.3%  50.3%

### 8. Reading an sql table

We have a database called births.db that consists of table called births.

In [16]:
from sqlalchemy import create_engine
  
# SQLAlchemy connectable
cnx = create_engine('sqlite:///births.db').connect()

In [30]:
df=pd.read_sql_table("births",columns=['year','month','day','gender','births'],con=cnx)
df.head()

Unnamed: 0,year,month,day,gender,births
0,1969,1,1.0,F,4046
1,1969,1,1.0,M,4440
2,1969,1,2.0,F,4454
3,1969,1,2.0,M,4548
4,1969,1,3.0,F,4548


### 9. Exporting Dataframe

The dataframe can be exported into ones desired format.

In [20]:
# Exporting the dataframe from last cell into different formats

df.to_csv("birthscsv.csv")
df.to_excel("birthsexcel.xlsx")
df.to_json("birthsjson.json")
df.to_xml("birthsxml.xml")
df.to_html("birthshtml.html")
df.to_sql("birthsql1",con=cnx)