# Importing Data in Python

* Local File
    * Plain Text, csv, tsv
* DataBase
    * SQLite, MongoDB
* Remote File
    * HTML, JSON, csv
* Excel File, MATLAB .m file
* Web API- Facebook or Google API

## Reading Text File

#### Without Context Manager
> We need to explicitly close the file

In [1]:

file = open('news.txt', 'r')
#r - read
#w - write

In [2]:
file.read()

'Why Infrastructure Is So Expensive \nA subway-style diagram of the major Roman roads\nBaby Bird from Time of Dinosaurs Found Fossilized\nMIT Gets $140M Pledge from Anonymous Donor (wsj.com)\nHow hackers abused satellites to stay under the radar (2015)\nFiduciary Rule Fight Brews While Bad Financial Advisers\nIn 1957, Five Men Agreed to Stand Under an Exploding Nuclear Bomb\n'

In [3]:
file.closed 
#check whether file is closed or not

False

In [4]:
file.close()

In [5]:
file.closed

True

#### With Context Manager
> No need to explicitly close the file

In [7]:
with open('news.txt', 'r') as file1:
    print(file1.readline())
    print(file1.readline())

Why Infrastructure Is So Expensive 

A subway-style diagram of the major Roman roads



## Read .csv File
> Comma separated value  
> row = record, column = feature

#### using numpy

In [8]:
import numpy as np
import pandas as pd

In [9]:
mnist_data = np.loadtxt('mnist.csv', dtype=float, comments='#', delimiter=',')

In [10]:
mnist_data

array([[ 5.,  0.,  0., ...,  0.,  0.,  0.],
       [ 4.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 3.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.]])

In [16]:
titanic_data = np.genfromtxt('titanic.csv', delimiter=',', dtype=None, skip_header=1)
#generic function to read the file`

In [17]:
titanic_data

array([( 1, b'1st', b'Male', b'Child', b'No', b'0'),
       ( 2, b'2nd', b'Male', b'Child', b'No', b'0'),
       ( 3, b'3rd', b'Male', b'Child', b'No', b'35'),
       ( 4, b'Crew', b'Male', b'Child', b'No', b'0'),
       ( 5, b'1st', b'Female', b'Child', b'No', b'0'),
       ( 6, b'2nd', b'Female', b'Child', b'No', b'0\xe2\x90\x8a'),
       ( 7, b'3rd', b'Female', b'Child', b'No', b'17'),
       ( 8, b'Crew', b'Female', b'Child', b'No', b'0'),
       ( 9, b'1st', b'Male', b'Adult', b'No', b'118'),
       (10, b'2nd', b'Male', b'Adult', b'No', b'154'),
       (11, b'3rd', b'Male', b'Adult', b'No', b'387'),
       (12, b'Crew', b'Male', b'Adult', b'No', b'670'),
       (13, b'1st', b'Female', b'Adult', b'No', b'4'),
       (14, b'2nd', b'Female', b'Adult', b'No', b'13'),
       (15, b'3rd', b'Female', b'Adult', b'No', b'89'),
       (16, b'Crew', b'Female', b'Adult', b'No', b'3'),
       (17, b'1st', b'Male', b'Child', b'Yes', b'5'),
       (18, b'2nd', b'Male', b'Child', b'Yes', b'11'),

#### using pandas

In [19]:
titanic = pd.read_csv('titanic.csv', sep=',')

In [20]:
titanic

Unnamed: 0,No,Class,Sex,Age,Survived,Freq
0,1,1st,Male,Child,No,0
1,2,2nd,Male,Child,No,0
2,3,3rd,Male,Child,No,35
3,4,Crew,Male,Child,No,0
4,5,1st,Female,Child,No,0
5,6,2nd,Female,Child,No,0␊
6,7,3rd,Female,Child,No,17
7,8,Crew,Female,Child,No,0
8,9,1st,Male,Adult,No,118
9,10,2nd,Male,Adult,No,154


## Excel and MATLAB .m File

### Excel Files

In [21]:
import pandas as pd

In [22]:
file = pd.ExcelFile('ExcelTest.xlsx')

In [23]:
file.sheet_names

['s1', 's2']

In [24]:
df1 = file.parse('s1')
df2 = file.parse('s2')

In [25]:
df1

Unnamed: 0,"Eldon Base for stackable storage shelf, platinum",Muhammed MacIntyre,3,-213.25,38.94,35,Nunavut,Storage & Organization,0.8
1,"1.7 Cubic Foot Compact ""Cube"" Office Refrigera...",Barry French,293,457.81,208.16,68.02,Nunavut,Appliances,0.58
2,"Cardinal Slant-D® Ring Binder, Heavy Gauge Vinyl",Barry French,293,46.7075,8.69,2.99,Nunavut,Binders and Binder Accessories,0.39
3,R380,Clay Rozendal,483,1198.971,195.99,3.99,Nunavut,Telephones and Communication,0.58
4,Holmes HEPA Air Purifier,Carlos Soltero,515,30.94,21.78,5.94,Nunavut,Appliances,0.5
5,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515,4.43,6.64,4.95,Nunavut,Office Furnishings,0.37
6,"Angle-D Binders with Locking Rings, Label Holders",Carl Jackson,613,-54.0385,7.3,7.72,Nunavut,Binders and Binder Accessories,0.38
7,"SAFCO Mobile Desk Side File, Wire Frame",Carl Jackson,613,127.7,42.76,6.22,Nunavut,Storage & Organization,
8,"SAFCO Commercial Wire Shelving, Black",Monica Federle,643,-695.26,138.14,35.0,Nunavut,Storage & Organization,
9,Xerox 198,Dorothy Badders,678,-226.36,4.98,8.33,Nunavut,Paper,0.38


In [26]:
df2

Unnamed: 0,10,Xerox 1980,Neola Schneider,807,-166.85,4.28,6.18,Nunavut,Paper,0.4
0,1,Advantus Map Pennant Flags and Round Head Tacks,Neola Schneider,807,-14.33,3.95,2.0,Nunavut,Rubber Bands,0.53
1,1,Holmes HEPA Air Purifier,Carlos Daly,868,134.72,21.78,5.94,Nunavut,Appliances,0.5
2,1,"DS/HD IBM Formatted Diskettes, 200/Pack - Staples",Carlos Daly,868,114.46,47.98,3.61,Nunavut,Computer Peripherals,0.71
3,1,"Wilson Jones 1"" Hanging DublLock® Ring Binders",Claudia Miner,933,-4.715,5.28,2.99,Nunavut,Binders and Binder Accessories,0.37
4,1,Ultra Commercial Grade Dual Valve Door Closer,Neola Schneider,995,782.91,39.89,3.04,Nunavut,Office Furnishings,0.53
5,1,"#10-4 1/8"" x 9 1/2"" Premium Diagonal Seam Enve...",Allen Rosenblatt,998,93.8,15.74,1.39,Nunavut,Envelopes,0.4
6,1,Hon 4-Shelf Metal Bookcases,Sylvia Foulston,1154,440.72,100.98,26.22,Nunavut,Bookcases,0.6
7,1,"Lesro Sheffield Collection Coffee Table, End T...",Sylvia Foulston,1154,-481.041,71.37,69.0,Nunavut,Tables,0.68
8,1,g520,Jim Radford,1344,-11.682,65.99,5.26,Nunavut,Telephones and Communication,0.59


### MATLAB Files

In [27]:
from scipy.io import loadmatdmat

In [28]:
X = loadmat('MatlabTest.mat')

In [29]:
X

{'__globals__': [],
 '__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Sun Jun 18 12:46:39 2017',
 '__version__': '1.0',
 'data': array([[['a', '1'],
         ['b', '2']]],
       dtype='<U1')}

In [30]:
X['data']

array([[['a', '1'],
        ['b', '2']]],
      dtype='<U1')