## Importing Different Files in Python

- Local File
 - Plain text, csv, tsv
- Database
 - SQLite, MongoDB
- Remote File
 - HTML, JSON, Csv
- Excel File, Matlab .m file
- Web API - Facebook or Google API

## Reading text file

- Without Context Manager
- With Context Manager

### Without Context Manager

In [35]:
file = open("2.1 news.txt.txt", 'r')

# mode - r -read | w - write

In [36]:
file.read()

'Why Infrastructure Is So Expensive \nA subway-style diagram of the major Roman roads\nBaby Bird from Time of Dinosaurs Found Fossilized\nMIT Gets $140M Pledge from Anonymous Donor (wsj.com)\nHow hackers abused satellites to stay under the radar (2015)\nFiduciary Rule Fight Brews While Bad Financial Advisers\nIn 1957, Five Men Agreed to Stand Under an Exploding Nuclear Bomb\n'

In [37]:
file.closed #it gives false because its opened

False

In [38]:
file.close() #we are manually closing it

### With Context Manager

In context manager, the file is automatically closed without our interpretation. To read everyline we have to run again n again

In [39]:
with open("2.1 news.txt.txt", 'r') as file1:
          print(file1.readline())
          print(file1.readline())
        

Why Infrastructure Is So Expensive 

A subway-style diagram of the major Roman roads



## Reading .csv File

we are reading .csv file by numpy as well as pandas

In [1]:
import numpy as np
import pandas as pd

In [41]:
mnist_data = np.loadtxt("3.2 mnist.csv.csv",dtype = float, comments = '#', delimiter = ',')

# when we actually check data,it's numeric and
# is present in the data which means it must be excluded. we have COMMENTS for that.
# delimiter is used to separate the data

In [42]:
mnist_data

array([[5., 0., 0., ..., 0., 0., 0.],
       [4., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [1., 0., 0., ..., 0., 0., 0.],
       [3., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.]])

In [43]:
#Let's see what if the data is string/non-numeric

In [18]:
titanic_data = np.genfromtxt("3.1 titanic.csv.csv", dtype = None, delimiter = ',', skip_header = 1,encoding = "utf8")

#dtype - none, data is a mixture of numbers and string
#skip_header will exclude the header title for appending in data

In [19]:
titanic_data

array([( 1, '1st', 'Male', 'Child', 'No', '0'),
       ( 2, '2nd', 'Male', 'Child', 'No', '0'),
       ( 3, '3rd', 'Male', 'Child', 'No', '35'),
       ( 4, 'Crew', 'Male', 'Child', 'No', '0'),
       ( 5, '1st', 'Female', 'Child', 'No', '0'),
       ( 6, '2nd', 'Female', 'Child', 'No', '0␊'),
       ( 7, '3rd', 'Female', 'Child', 'No', '17'),
       ( 8, 'Crew', 'Female', 'Child', 'No', '0'),
       ( 9, '1st', 'Male', 'Adult', 'No', '118'),
       (10, '2nd', 'Male', 'Adult', 'No', '154'),
       (11, '3rd', 'Male', 'Adult', 'No', '387'),
       (12, 'Crew', 'Male', 'Adult', 'No', '670'),
       (13, '1st', 'Female', 'Adult', 'No', '4'),
       (14, '2nd', 'Female', 'Adult', 'No', '13'),
       (15, '3rd', 'Female', 'Adult', 'No', '89'),
       (16, 'Crew', 'Female', 'Adult', 'No', '3'),
       (17, '1st', 'Male', 'Child', 'Yes', '5'),
       (18, '2nd', 'Male', 'Child', 'Yes', '11'),
       (19, '3rd', 'Male', 'Child', 'Yes', '13'),
       (20, 'Crew', 'Male', 'Child', 'Yes', '0'),


In [20]:
#Let's do the same through pandas library

In [21]:
titanic = pd.read_csv("3.1 titanic.csv.csv",sep = ',') #here delimiter is separator

In [22]:
titanic.head()

Unnamed: 0,No,Class,Sex,Age,Survived,Freq
0,1,1st,Male,Child,No,0
1,2,2nd,Male,Child,No,0
2,3,3rd,Male,Child,No,35
3,4,Crew,Male,Child,No,0
4,5,1st,Female,Child,No,0


## Excel and Matlab .m File

In [23]:
#we have two sheets in our excel

In [24]:
file1 = pd.ExcelFile('4.1 ExcelTest.xlsx.xlsx')

In [25]:
file1.sheet_names #it shows the sheets names in excel

['s1', 's2']

In [27]:
file1.parse('s1') #reads the data in s1 sheet

Unnamed: 0,"Eldon Base for stackable storage shelf, platinum",Muhammed MacIntyre,3,-213.25,38.94,35,Nunavut,Storage & Organization,0.8
1,"1.7 Cubic Foot Compact ""Cube"" Office Refrigera...",Barry French,293,457.81,208.16,68.02,Nunavut,Appliances,0.58
2,"Cardinal Slant-D® Ring Binder, Heavy Gauge Vinyl",Barry French,293,46.7075,8.69,2.99,Nunavut,Binders and Binder Accessories,0.39
3,R380,Clay Rozendal,483,1198.971,195.99,3.99,Nunavut,Telephones and Communication,0.58
4,Holmes HEPA Air Purifier,Carlos Soltero,515,30.94,21.78,5.94,Nunavut,Appliances,0.5
5,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515,4.43,6.64,4.95,Nunavut,Office Furnishings,0.37
6,"Angle-D Binders with Locking Rings, Label Holders",Carl Jackson,613,-54.0385,7.3,7.72,Nunavut,Binders and Binder Accessories,0.38
7,"SAFCO Mobile Desk Side File, Wire Frame",Carl Jackson,613,127.7,42.76,6.22,Nunavut,Storage & Organization,
8,"SAFCO Commercial Wire Shelving, Black",Monica Federle,643,-695.26,138.14,35.0,Nunavut,Storage & Organization,
9,Xerox 198,Dorothy Badders,678,-226.36,4.98,8.33,Nunavut,Paper,0.38


In [29]:
#import scipy lab for accessing matlab file
from scipy.io import loadmat 

In [30]:
x = loadmat('MatlabTest.mat')

In [32]:
x #this gives the info inside the file

{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Sun Jun 18 12:46:39 2017',
 '__version__': '1.0',
 '__globals__': [],
 'data': array([[['a', '1'],
         ['b', '2']]], dtype='<U1')}

In [34]:
x['data'] #results the data part

array([[['a', '1'],
        ['b', '2']]], dtype='<U1')

## SQLite - Relational Database

In [44]:
import sqlite3

In [46]:
conn = sqlite3.connect('SqliteTestDb.db') #we are connecting to the database

In [47]:
cur = conn.cursor() #Return a cursor for the connection.

In [48]:
cur.execute("select * from employees") #grabs all data under employees table

<sqlite3.Cursor at 0x8c3bdc0>

In [49]:
df = cur.fetchall()

In [51]:
for details in df:
    print(details)

(1, 'Adams', 'Andrew', 'General Manager', None, '1962-02-18 00:00:00', '2002-08-14 00:00:00', '11120 Jasper Ave NW', 'Edmonton', 'AB', 'Canada', 'T5K 2N1', '+1 (780) 428-9482', '+1 (780) 428-3457', 'andrew@chinookcorp.com')
(2, 'Edwards', 'Nancy', 'Sales Manager', 1, '1958-12-08 00:00:00', '2002-05-01 00:00:00', '825 8 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 2T3', '+1 (403) 262-3443', '+1 (403) 262-3322', 'nancy@chinookcorp.com')
(3, 'Peacock', 'Jane', 'Sales Support Agent', 2, '1973-08-29 00:00:00', '2002-04-01 00:00:00', '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com')
(4, 'Park', 'Margaret', 'Sales Support Agent', 2, '1947-09-19 00:00:00', '2003-05-03 00:00:00', '683 10 Street SW', 'Calgary', 'AB', 'Canada', 'T2P 5G3', '+1 (403) 263-4423', '+1 (403) 263-4289', 'margaret@chinookcorp.com')
(5, 'Johnson', 'Steve', 'Sales Support Agent', 2, '1965-03-03 00:00:00', '2003-10-17 00:00:00', '7727B 41 Ave', 'Calgary', 'A

In [52]:
#Sqlite file access by Pandas

In [53]:
df1 = pd.read_sql_query("select * from employees", conn)

#conn - we have already conected to database under conn variable

In [54]:
df1

Unnamed: 0,EmployeeId,LastName,FirstName,Title,ReportsTo,BirthDate,HireDate,Address,City,State,Country,PostalCode,Phone,Fax,Email
0,1,Adams,Andrew,General Manager,,1962-02-18 00:00:00,2002-08-14 00:00:00,11120 Jasper Ave NW,Edmonton,AB,Canada,T5K 2N1,+1 (780) 428-9482,+1 (780) 428-3457,andrew@chinookcorp.com
1,2,Edwards,Nancy,Sales Manager,1.0,1958-12-08 00:00:00,2002-05-01 00:00:00,825 8 Ave SW,Calgary,AB,Canada,T2P 2T3,+1 (403) 262-3443,+1 (403) 262-3322,nancy@chinookcorp.com
2,3,Peacock,Jane,Sales Support Agent,2.0,1973-08-29 00:00:00,2002-04-01 00:00:00,1111 6 Ave SW,Calgary,AB,Canada,T2P 5M5,+1 (403) 262-3443,+1 (403) 262-6712,jane@chinookcorp.com
3,4,Park,Margaret,Sales Support Agent,2.0,1947-09-19 00:00:00,2003-05-03 00:00:00,683 10 Street SW,Calgary,AB,Canada,T2P 5G3,+1 (403) 263-4423,+1 (403) 263-4289,margaret@chinookcorp.com
4,5,Johnson,Steve,Sales Support Agent,2.0,1965-03-03 00:00:00,2003-10-17 00:00:00,7727B 41 Ave,Calgary,AB,Canada,T3B 1Y7,1 (780) 836-9987,1 (780) 836-9543,steve@chinookcorp.com
5,6,Mitchell,Michael,IT Manager,1.0,1973-07-01 00:00:00,2003-10-17 00:00:00,5827 Bowness Road NW,Calgary,AB,Canada,T3B 0C5,+1 (403) 246-9887,+1 (403) 246-9899,michael@chinookcorp.com
6,7,King,Robert,IT Staff,6.0,1970-05-29 00:00:00,2004-01-02 00:00:00,590 Columbia Boulevard West,Lethbridge,AB,Canada,T1K 5N8,+1 (403) 456-9986,+1 (403) 456-8485,robert@chinookcorp.com
7,8,Callahan,Laura,IT Staff,6.0,1968-01-09 00:00:00,2004-03-04 00:00:00,923 7 ST NW,Lethbridge,AB,Canada,T1H 1Y8,+1 (403) 467-3351,+1 (403) 467-8772,laura@chinookcorp.com


## Fetch Remote File

In [2]:
html_url = 'http://www.google.com'
csv_url = 'https://github.com/edypraveen/data-science-ipython-notebooks/edit/master/data/churn.csv'
json_url = 'https://raw.githubusercontent.com/ankit25587/test/master/test.json'

In [3]:
import requests

In [4]:
response = requests.get(html_url) #get is used to get request

In [5]:
htmldata = response.text #to grab the entire details of website in txt format

In [6]:
htmldata

'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/logos/doodles/2018/india-independence-day-2018-5949872665526272.2-l.png" itemprop="image"><meta content="India Independence Day 2018" property="twitter:title"><meta content="Happy Independence Day, India! #GoogleDoodle" property="twitter:description"><meta content="Happy Independence Day, India! #GoogleDoodle" property="og:description"><meta content="summary_large_image" property="twitter:card"><meta content="@GoogleDoodles" property="twitter:site"><meta content="https://www.google.com/logos/doodles/2018/india-independence-day-2018-5949872665526272-2x.png" property="twitter:image"><meta content="https://www.google.com/logos/doodles/2018/india-independence-day-2018-5949872665526272-2x.png" property="og:image"><meta content="1150" property="og:image:width"><meta content="460" property="og:image:height"><title>Google<

In [7]:
#grabbing a particular string from website

In [10]:
from bs4 import BeautifulSoup

In [11]:
soup = BeautifulSoup(htmldata, 'html.parser')

In [12]:
soup.find('title') #displays the title

<title>Google</title>

In [13]:
soup.find('title').string #If we want only the string

'Google'

### .json file

In [14]:
res = requests.get(json_url)

In [19]:
res.text #grabs all data as text

'{\n  "firstName": "John",\n  "lastName": "Smith",\n  "isAlive": true,\n  "age": 25,\n  "address": {\n    "streetAddress": "21 2nd Street",\n    "city": "New York",\n    "state": "NY",\n    "postalCode": "10021-3100"\n  },\n  "phoneNumbers": [\n    {\n      "type": "home",\n      "number": "212 555-1234"\n    },\n    {\n      "type": "office",\n      "number": "646 555-4567"\n    },\n    {\n      "type": "mobile",\n      "number": "123 456-7890"\n    }\n  ],\n  "children": [],\n  "spouse": null\n}\n'

In [16]:
json_data = res.json() #loads json file

In [17]:
json_data['address'] #extracts the particular string information

{'streetAddress': '21 2nd Street',
 'city': 'New York',
 'state': 'NY',
 'postalCode': '10021-3100'}

In [18]:
json_data['phoneNumbers']

[{'type': 'home', 'number': '212 555-1234'},
 {'type': 'office', 'number': '646 555-4567'},
 {'type': 'mobile', 'number': '123 456-7890'}]

In [22]:
json_data['address']['streetAddress']

'21 2nd Street'