# Data Import and Export

Documentation sources:

* https://www.tutorialspoint.com/python_pandas/


In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import read_csv

## I. Data import 

### Import from files

* Data can be imported from files using `Pandas.read_csv` which returns a DataFrame.
* Default field delimiter is `,` but the delimiter can be changed with `sep` argument.
* The number of header rows can be specified by `header` argument. 
* Rows for hierarchical multi-headers can be specified by `header = [level1, level2,...]`.
* Rows can be skipped with `skiprows` and `skipfooter` for starting and ending rows.
* The number of rows to be read can be limited by `nrows` argument.
* Table can be parsed chunck-by-chunk by specifying `chunksize`. Then an iterator is returned.
* Column names can be specified with `names` argument.
* Row names can be specified by `index_col` argument.
* Datatypes for each column can be specified by `dtype` argument. Use `str` and `numpy.xx` datatypes.
* Datatypes can be also forced by column converters specified by `converters` argument.
* Symbols representing **missing** values can be specified by `na_value` argument.
* Many other arguments specify details of CSV format and data conversions.

In [2]:
display(read_csv('realwage.csv', nrows = 5))
display(read_csv('realwage.csv', index_col = 0, nrows = 5))
display(read_csv('realwage.csv', index_col = 0, skiprows = 1, names = ['a', 'b', 'c', 'd', 'e'],  nrows = 5))
display(read_csv('realwage.csv', index_col = 0, dtype = {'value':float}, nrows = 5))
display(read_csv('realwage.csv', index_col = 0, converters = {'value': lambda x: round(float(x))}, nrows = 5))
display(read_csv('realwage.csv', index_col = 0, na_values = 'Annual', nrows = 5))

Unnamed: 0.1,Unnamed: 0,Time,Country,Series,Pay period,value
0,0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406
3,3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832


Unnamed: 0,Time,Country,Series,Pay period,value
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832


Unnamed: 0,a,b,c,d,e
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832


Unnamed: 0,Time,Country,Series,Pay period,value
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832


Unnamed: 0,Time,Country,Series,Pay period,value
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18101
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18756


Unnamed: 0,Time,Country,Series,Pay period,value
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,,17132.443
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,,18100.918
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,,17747.406
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,,18580.139
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,,18755.832


### Import from web

There are dedicated libraries for reading datastreams in the web:

* https://github.com/pydata/pandas-datareader
* https://pypi.org/project/fix-yahoo-finance/
* https://github.com/statfi/opendata
* https://github.com/opendatateam

But sometimes you have to write your own scraper:

* https://www.datacamp.com/community/tutorials/web-scraping-using-python
* https://pythonprogramminglanguage.com/web-scraping-with-pandas-and-beautifulsoup/
* https://towardsdatascience.com/an-introduction-to-web-scraping-with-python-bc9563fe8860

### Import from SQL database

* Data from databases can be imported with `read_sql_query` and `read_sql_table` functions.
* You need to setup `SQLAlchemy` connection.
* The query can be specified by `sql` argument.
* Row names can be specified by `index_col` argument.
* Large tables can be processed and sometimes fetched chunck-by-chunk by specifying `chunksize`. Then an iterator is returned. The function does not guarantee that the table is fetched chunck-by-chunk.
* Database schema can be specified by `schema` argument.

## II. Data export

* Data export can be done using the methods of DataFrame:

  * `df.to_csv('file', ...)` - export data as a CSV file
  * `df.to_html('file', ...)` - export data as an HTML table
  * `df.to_sql('table', connection)` - export data to SQL database or SQLight database
* Various extra parameters are used to control the format of the output.

In [14]:
df = read_csv('realwage.csv', index_col = 0)
print(df.head().to_csv())
print(df.head(2).to_html())

,Time,Country,Series,Pay period,value
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406000000003
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Time</th>
      <th>Country</th>
      <th>Series</th>
      <th>Pay period</th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>2006-01-01</td>
      <td>Ireland</td>
      <td>In 2015 constant prices at 2015 USD PPPs</td>
      <td>Annual</td>
      <td>17132.443</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2007-01-01</td>
      <td>Ireland</td>
      <td>In 2015 constant prices at 2015 USD PPPs</