# Importing data 



## Reading from plain text files 

### Reading the whole file at once

In [1]:
filename = '../data/plain text with several lines.txt'

# mode='r' to prevent writting into it. Mode='w' if you want to write
file = open(filename, mode='r')
text = file.read()
file.close()

print(text)

The Title

This is just a plain text file with several lines. The existence of this file is no other but practice reading its different lines.
This is a second line, just in case.
Here we go for a third one.



Using the file inside a context makes the reading of the file more concise and less cluttered. Outside of the context, the file is already closed:

In [2]:
with open(filename, 'r') as file:
    print(file.read())

The Title

This is just a plain text file with several lines. The existence of this file is no other but practice reading its different lines.
This is a second line, just in case.
Here we go for a third one.



We can read line by line:

In [5]:
with open(filename, 'r') as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

The Title



This is just a plain text file with several lines. The existence of this file is no other but practice reading its different lines.



## Reading flat files

Flat files are text files contaning records (row of fields or attributes). Tabular data.
They usually have a header, but its not mandatory.
The delimiter (character used to separate values) can be a comma (csv), tab or any other character.

In [7]:
titanic_filename = '../data/titanic_sub.csv'

with open(titanic_filename, 'r') as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked

1,0,3,male,22.0,1,0,A/5 21171,7.25,,S

2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C



### Reading numeric flat files using numpy

This method applies when the whole dataset to be read is numeric

In [11]:
import numpy as np 

filename='../data/mnist_kaggle_some_rows.csv'

data = np.loadtxt(filename, delimiter=',')

data

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [2., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [5., 0., 0., ..., 0., 0., 0.]])

In [10]:
# If we would like to skip the header and read a couple of columns only
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols=[0,2])

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [2., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [5., 0., 0., ..., 0., 0., 0.]])

### Importing flat files using pandas

The core of pandas is the DataFrame. A matrix has rows and columns. A dataframe has observations and variables.

In [13]:
import pandas as pd 

filename = '../data/cars.csv'

df = pd.read_csv(filename)
#df = pd.read_csv(filename, nrows=5, header=None, sep='')

In [14]:
df

Unnamed: 0,manufacturer_name,model_name,transmission,color,odometer_value,year_produced,engine_fuel,engine_has_gas,engine_type,engine_capacity,...,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,duration_listed
0,Subaru,Outback,automatic,silver,190000,2010,gasoline,False,gasoline,2.5,...,True,True,True,False,True,False,True,True,True,16
1,Subaru,Outback,automatic,blue,290000,2002,gasoline,False,gasoline,3.0,...,True,False,False,True,True,False,False,False,True,83
2,Subaru,Forester,automatic,red,402000,2001,gasoline,False,gasoline,2.5,...,True,False,False,False,False,False,False,True,True,151
3,Subaru,Impreza,mechanical,blue,10000,1999,gasoline,False,gasoline,3.0,...,False,False,False,False,False,False,False,False,False,86
4,Subaru,Legacy,automatic,black,280000,2001,gasoline,False,gasoline,2.5,...,True,False,True,True,False,False,False,False,True,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
38526,Chrysler,300,automatic,silver,290000,2000,gasoline,False,gasoline,3.5,...,True,False,False,True,True,False,False,True,True,301
38527,Chrysler,PT Cruiser,mechanical,blue,321000,2004,diesel,False,diesel,2.2,...,True,False,False,True,True,False,False,True,True,317
38528,Chrysler,300,automatic,blue,777957,2000,gasoline,False,gasoline,3.5,...,True,False,False,True,True,False,False,True,True,369
38529,Chrysler,PT Cruiser,mechanical,black,20000,2001,gasoline,False,gasoline,2.0,...,True,False,False,False,False,False,False,False,True,490
