# Importing data to Python environment

## This notebook contains notes on how to import data to Python to process it and analyze it using various Data Science methodologies. The notebook demonstrates how one can import data from CSV files, JSON data, Excel files as well as HTML data.

### Importing data from CSV files. 

#### CSV file a comma separated file. Here we show how a CSV file named cal-house is imported into the Python environment.

In [2]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame

In [4]:
data = pd.read_csv("cal-house.csv")
data

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15,5612,1283,1015,472,1.4936,66900
1,-114.47,34.40,19,7650,1901,1129,463,1.8200,80100
2,-114.56,33.69,17,720,174,333,117,1.6509,85700
3,-114.57,33.64,14,1501,337,515,226,3.1917,73400
4,-114.57,33.57,20,1454,326,624,262,1.9250,65500
5,-114.58,33.63,29,1387,236,671,239,3.3438,74000
6,-114.58,33.61,25,2907,680,1841,633,2.6768,82400
7,-114.59,34.83,41,812,168,375,158,1.7083,48500
8,-114.59,33.61,34,4789,1175,3134,1056,2.1782,58400
9,-114.60,34.83,46,1497,309,787,271,2.1908,48100


#### Export a data to CSV format

In [10]:
data = DataFrame([[1,2,3],[4,5,6],[7,8,9]])
data.to_csv("data.csv")

#### CSV file created named data.csv

<img src="data.jpg">

#### Export to CSV with different delimiter

In [13]:
data.to_csv("data1.csv",sep="?")
data1 = pd.read_csv("data1.csv")
data1

Unnamed: 0,?0?1?2
0,0?1?2?3
1,1?4?5?6
2,2?7?8?9


#### Here you can see the delimiter is changed to a question mark and not a comma. Since the read_csv function is implemented to detect comma separated values and not other symbols, in order to display proper data where a delimiter is not a comma, do the following:

In [14]:
data1 = pd.read_csv("data1.csv",sep="?")
data1

Unnamed: 0.1,Unnamed: 0,0,1,2
0,0,1,2,3
1,1,4,5,6
2,2,7,8,9


#### If you want to read only a finite rows from the CSV data and not all, do the following:

Here, the nrows parameter is set to 2,hence only 2 rows are imported from the entire data.

In [17]:
data1 = pd.read_csv("data1.csv",sep="?",nrows=2)
data1

Unnamed: 0.1,Unnamed: 0,0,1,2
0,0,1,2,3
1,1,4,5,6


### Working with JSON files

#### A JSON file is an collection of object data more like a dictionary in Python, ie. key-value pairs. Following are some common functions for working with JSON Files.

### A Sample JSON data

In [19]:
json_data = """
{
"Name":"Dhrumil Mehta",
"Age":"21",
"Education":"Software Engineering",
"Hobbies":["Machine Learning","Reading","Entrepreneurship","Cooking","Football"]
}

"""

#### Importing JSON

In [20]:
import json

#### Loading the JSON data

In [23]:
data = json.loads(json_data)
data

{'Age': '21',
 'Education': 'Software Engineering',
 'Hobbies': ['Machine Learning',
  'Reading',
  'Entrepreneurship',
  'Cooking',
  'Football'],
 'Name': 'Dhrumil Mehta'}

####  Arranging JSON data into a Dataframe

In [25]:
dframe = DataFrame(data)
dframe

Unnamed: 0,Age,Education,Hobbies,Name
0,21,Software Engineering,Machine Learning,Dhrumil Mehta
1,21,Software Engineering,Reading,Dhrumil Mehta
2,21,Software Engineering,Entrepreneurship,Dhrumil Mehta
3,21,Software Engineering,Cooking,Dhrumil Mehta
4,21,Software Engineering,Football,Dhrumil Mehta


#### Dumping a data into a JSON format

In [26]:
sample = {"Country":"India","State":"Maharashtra","City":"Mumbai","Continent":"Asia"}
json.dumps(sample)

'{"Country": "India", "Continent": "Asia", "City": "Mumbai", "State": "Maharashtra"}'

### Working with HTML data

#### Here we learn how to work with HTML files. In order to work well with the HTML data, note that you have installed Python modules "Beautiful Soup" and "html5lib". Use commands "pip install beautifulsoup4" and "pip install html5lib" respectively to install the modules respectively.

#### Importing the necessary library

In [27]:
from pandas import read_html

In [28]:
url = "http://www.fdic.gov/bank/individual/failed/banklist.html"

In [29]:
dframe_list = pd.io.html.read_html(url)

In [30]:
dframe = dframe_list[0]
dframe

Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date,Updated Date
0,Washington Federal Bank for Savings,Chicago,IL,30570,Royal Savings Bank,"December 15, 2017","February 21, 2018"
1,The Farmers and Merchants State Bank of Argonia,Argonia,KS,17719,Conway Bank,"October 13, 2017","February 21, 2018"
2,Fayette County Bank,Saint Elmo,IL,1802,"United Fidelity Bank, fsb","May 26, 2017","July 26, 2017"
3,"Guaranty Bank, (d/b/a BestBank in Georgia & Mi...",Milwaukee,WI,30003,First-Citizens Bank & Trust Company,"May 5, 2017","March 22, 2018"
4,First NBC Bank,New Orleans,LA,58302,Whitney Bank,"April 28, 2017","December 5, 2017"
5,Proficio Bank,Cottonwood Heights,UT,35495,Cache Valley Bank,"March 3, 2017","March 7, 2018"
6,Seaway Bank and Trust Company,Chicago,IL,19328,State Bank of Texas,"January 27, 2017","May 18, 2017"
7,Harvest Community Bank,Pennsville,NJ,34951,First-Citizens Bank & Trust Company,"January 13, 2017","May 18, 2017"
8,Allied Bank,Mulberry,AR,91,Today's Bank,"September 23, 2016","September 25, 2017"
9,The Woodbury Banking Company,Woodbury,GA,11297,United Bank,"August 19, 2016","June 1, 2017"


#### Here, we import a data file from the FDIC govt. website. 

#### Check all the columns in the data

In [32]:
dframe.columns.values

array(['Bank Name', 'City', 'ST', 'CERT', 'Acquiring Institution',
       'Closing Date', 'Updated Date'], dtype=object)

### Working with Excel files

#### Make sure you have modules xlrd and openpyxl installed in your system. If not, install them using commands "pip install xlrd" and "pip install openpyxl" respectively. You'll also need a sample Excel file to work with.

In [34]:
xlsfile = pd.ExcelFile("cal-house.xlsx")

In [36]:
dframe = xlsfile.parse()
dframe

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15,5612,1283,1015,472,1.4936,66900
1,-114.47,34.40,19,7650,1901,1129,463,1.8200,80100
2,-114.56,33.69,17,720,174,333,117,1.6509,85700
3,-114.57,33.64,14,1501,337,515,226,3.1917,73400
4,-114.57,33.57,20,1454,326,624,262,1.9250,65500
5,-114.58,33.63,29,1387,236,671,239,3.3438,74000
6,-114.58,33.61,25,2907,680,1841,633,2.6768,82400
7,-114.59,34.83,41,812,168,375,158,1.7083,48500
8,-114.59,33.61,34,4789,1175,3134,1056,2.1782,58400
9,-114.60,34.83,46,1497,309,787,271,2.1908,48100


## Thanks for reading.