## <center>CODE PORTFOLIO : Data Import in Python </center>
- **Table of content**  

    [1) Read .csv using pandas                   ](#1\)-Read-.csv-using-pandas)  
    [2) Read .csv using csv module               ](#2\)-Read-.csv-using-csv-module)  
    [3) Read excel                               ](#3\)-Read-excel)  
    [4) Read delimited file                      ](#4\)-Read-delimited-file)  
    [5) Read specific rows/columns from input csv](#5\)-Read-specific-rows/columns-from-input-csv)  
    [6) Read Json file using json module         ](#6\)-Read-Json-file-using-json-module)  
    [7) Read Json file using pandas              ](#7\)-Read-Json-file-using-pandas)  
      
  
- **References**
     - Pandas documentation
     - Various online tutorials

In [1]:
# Importing required python libraries
## pandas - for data import as data wrangling
## csv    - to access data of a .csv file
## os     - for os utulities such as getcwd(), chdir() etc
## json   - to access json files

import pandas as pd 
import csv
import os
import json

In [2]:
## Defining workspace and input files

cwd = os.getcwd()
os.chdir(r'C:\Study\IUMSDS\Spring2019\Applied DS\data')

ip_csv   = 'farmers-markets.csv'
ip_excel = 'Farmers_market.xlsx'
ip_delim = 'income.txt'
ip_json  = 'Alphavantage_Json.txt'

### 1) Read .csv using pandas

In [3]:
# Using pd.read_csv() to read the .csv file and load it a pandas data frame

## encoding='unicode_escape' used as the input file has some special characters.
## header=None : if there is no header in the input file
## names=['col1','col2','col3'] : to have header in the data frame if header=None

## Reference : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

df = pd.read_csv(ip_csv, encoding='unicode_escape')

print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')

Loaded the .csv into a dataframe of (8665, 61) dimension.


### 2) Read .csv using csv module 

In [4]:
## csv reader object is required to read .csv using csv module

try:
    with open(ip_csv,'r') as csv_obj:
        csv_reader=csv.reader(csv_obj,delimiter=',')
        next(csv_obj)                                   ##This skips one line so the header is skipped with this code
        for i in csv_reader:
            print(i)
            break                                       ##Stopping the loop after reading first record
except Exception as e:
    print("Error occurred : {}".format(e))
    

['1012063', ' Caledonia Farmers Market Association - Danville', 'https://sites.google.com/site/caledoniafarmersmarket/', 'https://www.facebook.com/Danville.VT.Farmers.Market/', '', '', '', '', 'Danville ', 'DANVILLE', 'Caledonia', 'Caledonia', 'Vermont', '5828', '06/08/2016 to 10/12/2016', 'Wed: 9:00 AM-1:00 PM;', '', '', '', '', '', '', '-72.140305', '44.411013', '', 'Y', 'Y', 'N', 'Y', 'N', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'N', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'N', 'N', 'Y', 'Y', 'Y', 'Y', 'Y', 'N', 'Y', 'Y', 'Y', 'N', 'Y', 'N', 'Y', 'N', 'N', '6/28/16 12:10']


### 3) Read excel

In [5]:
## Using pd.read_excel() to read a sheet from an excel file and load it a pandas data frame

## Reference : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel

df = pd.read_excel(ip_excel, sheet_name='Sheet2')
print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')

Loaded the .csv into a dataframe of (8665, 61) dimension.


### 4) Read delimited file 

In [6]:
## Using pd.read_csv to read a flat file with delimiter
## Note : pd.read_table() is depricated 

## Reference : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv

df = pd.read_csv(ip_delim, sep='|')

print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')

Loaded the .csv into a dataframe of (4, 8) dimension.


### 5) Read specific rows/columns from input csv

In [7]:
# Using pd.read_csv() with following arguments to read specific rows/columns only

## nrows = 5          : to read only first 5 rows
## usecols = (1,4,7)  : to read rows at index 1,4, and 7 only
## skiprows = 5       : to skip 5 rows

df = pd.read_csv(ip_csv, encoding='unicode_escape', nrows=5)
print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')

df = pd.read_csv(ip_csv, encoding='unicode_escape', usecols = (1,4,7))
print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')

df = pd.read_csv(ip_csv, encoding='unicode_escape', skiprows = 5 )
print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')


Loaded the .csv into a dataframe of (5, 61) dimension.
Loaded the .csv into a dataframe of (8665, 3) dimension.
Loaded the .csv into a dataframe of (8660, 61) dimension.


### 6) Read Json file using json module

In [8]:
## JSON = JavaScript Object Notation
## JSON FUNCTIONS:
#### json.loads(<json formatted string>) => To load a json string and convert it in python objects
#### json.dumps(<json formatted string>, indent=2, sort_keys=True) => To dump a json string in a variable
    
#### json.load(<json file>)=> To load a json file and convert it in python objects


with open(ip_json,'r') as json_data:
    data=json_data.read()
    #print(data)
    json_info = json.loads(data)  ##json.loads converts the json data in python objects and loads it in json_info
    print('type json_info: ' , type(json_info), 'Length', len(json_info))
    print('type json_info["Meta Data"]: ' ,type(json_info["Meta Data"]), 'len : ', len(json_info["Meta Data"]))
    print('\n===================================\n')
    
    print('type json_info["Time Series (Daily)"]', type(json_info['Time Series (Daily)']), 'len : ', len(json_info['Time Series (Daily)']))
    print(json_info['Meta Data']['2. Symbol'])
    print(json_info["Time Series (Daily)"]["2018-08-15"])
    ## json.dumps() dumps the data in the variable and indent indents the data
    todays_numbers=json.dumps(json_info["Time Series (Daily)"]["2018-08-15"], indent=2)
    print(todays_numbers)

type json_info:  <class 'dict'> Length 2
type json_info["Meta Data"]:  <class 'dict'> len :  5


type json_info["Time Series (Daily)"] <class 'dict'> len :  100
MSFT
{'1. open': '108.4900', '2. high': '108.9850', '3. low': '106.8200', '4. close': '107.6600', '5. volume': '29947318'}
{
  "1. open": "108.4900",
  "2. high": "108.9850",
  "3. low": "106.8200",
  "4. close": "107.6600",
  "5. volume": "29947318"
}


### 7) Read Json file using pandas

In [9]:
## Using pd.read_json() to load json file in a data frame

## Reference : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html#pandas.read_json

df = pd.read_json(ip_json)
print(f'Loaded the .csv into a dataframe of {df.shape} dimension.')

Loaded the .csv into a dataframe of (105, 2) dimension.
