# Importing Files

## Text Files

Open a text file in read-only mode:

In [2]:
file = open('filename.txt', 'r')

file.close() #close the file

Opening a text file by using a context manager construct, and printing the first two lines:

In [16]:
with open('filename.txt') as file:
    print(file.read()) # print only one line.
    print(file.read()) # print the next line.

Example text file...first line.
...second line.




## Flat Files with NumPy

Importing a csv file:

In [6]:
import numpy as np

In [None]:
file = 'filename.csv'
data = np.loadtxt(file, delimiter=',')

Importing a tab-delimited file:

In [None]:
file = 'filename.txt'
data = np.loadtxt(file, delimiter='\t')

Other possible arguments for _.loadtxt()_ method.

In [15]:
np.loadtxt

<function numpy.lib.npyio.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes')>

Datatype is float, unless another type passed to the function. 

In some cases, however, columns of a dataset may contain different datatypes such as floats and strings. To handle this, we use _np.genfromtxt()_ and pass _dtype=None_.

In [None]:
data = np.genfromtxt(file, delimiter=',', names=True, dtype=None)

If the delimiter is ',' names=True and dtype=None, then we can use another function without explicitly passing arguments other than the file name.

In [None]:
data = np.recfromcsv(file)

## Flat Files with Pandas

It is easier to import files using the pandas functions read_csv() and read_table().

In [None]:
import pandas as pd

file = 'filename.csv'
df = pd.read_csv(file)
print(df.head()) # Print the head of the DataFrame

## Excel Files with Pandas

In [None]:
import pandas as pd

file = 'filename.xlsx'
xls = pd.ExcelFile(file)

Importing sheets as data frames.

In [None]:
print(xls.sheet_names) # print sheet names of the Excel file.

df1 = xls.parse('Sheet1') 
df1 = xls.parse(0) # We can also use the index of the sheet instead of the sheet name

## Relational Databases

Creating a database engine: 

In [None]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///filename.sqlite') # Create an engine to connect to 'filename.sqlite'
print(engine.table_names()) # Print table names

# connection = engine.connect() # Open the engine connection (need to close)

with engine.connect() as connection: # Open the engine connection (no need to close...)
    tk = connection.execute('SELECT * FROM Table') # Query
    df = pd.DataFrame(tk.fetchall()) # Save results to a DataFrame. Specifying fetch size is also possible: e.g. "rs.fetchmany(size=5)" 
    df.columns = tk.keys() # Set the column names

Using pandas to get the same result with less lines of code:

In [None]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///filename.sqlite')
df = pd.read_sql_query('SELECT * FROM Table', engine)